Detailed Notes
Focus for this episode: We continue looking into issues related to bringing in new data to our destination.
This tutorial walks through the creation of a database synchronization tool in Python. It continues our Python and SQL Tutorials with a useful application that provides a deeper look at SQL structures and commands.
Repository For Code: git clone [email protected]:rbroadhead/dbsync.git
Transcript Text
[Music] well hello and welcome back we're continuing our series where we're looking at our sql sync application which is python and uh with this we're gonna learn some python which we've been working through as well as some sql now last time around we start digging into rows we can take tables that don't exist in our source and create them in our destination we can do the same with columns so if we've got tables that have extra columns in the source and they are not in the target then we can handle that as well now we're moving into data so we're going to actively work on synchronizing rows at this point now the way we're going to work it with rows if we're going to start with a truncate and what that does is that's going to empty out our data so if we go look at let's see let's make sure so here in my local ones i've got some records in here because i've pulled stuff across what i want to do is i want to be able to and let me yeah let me get rid of i'm gonna get rid of a couple tables uh so if we go to local my sequel and we look at tutorial i wanna take the uh breeders horses and owners i'm going to take those out because it's such a large amount of uh data involved in those uh let's see so for it's missing columns missing columns synchros okay so i want to go up to sync rows and for table and tables list and then i'm going to say if table not in uh let's see what is this gonna be owners trainers oops look at that horses so i really wouldn't know i'm not normally going to want to do this but i'm going to do it here because let's see and i do all the cleanups okay there we go so i'm going to do is i'm going to say if the table is not one of these then it's going to go through and i'm going to truncate it so i'm going to empty the table in the destination i'm just going to say you know what i'm going to take everything that exists in the source and i'm going to dump it into the uh the destination now i'm going to start with my insert so sort of a typical insert line for sql so we're going to insert into like for example app user and we're going to take each of the columns much as we've seen before so we're going to go through we're going to do a describe so we get each of our columns we've seen that before now what we're going to do here is we're going to have this little thing to say it's the first row if it's first row then all we do is we're going to do the name surrounded by these little ticks in case it's some sort of reserved word now notice our base is the insert into table or whatever the table name and then our parentheses so we get basically this far so insert into like in this case table name space open paren and then we're going to go through each of the columns and we're going to surround them by ticks and if it's not the first column you know if first otherwise we're going to then precede it with a comma so the first one we come in we're going to do here and that's where we finish and then when we come down to the next column it's going to say it's not the first column so we're going to precede it with this comma space and we're going to do that all the way out and then we're going to get out to here and we're going to close that parentheses values keyword and then open the parentheses for the values we're going to put in there and now what we're going to do is we're going to go through uh let's see oh i'll show you this max rose in a second um and it's because this is because i had some issues recording so i came through this first time now what we're gonna do and uh i've added some stuff so the first thing we want to do is we just want to get the data so what we would want to do is we would say like let's do address is we could just say select star from address and that's going to get us all the records now in this case i don't think i have any uh oh because i didn't do that right maybe let's see do i have any records in here i do and so i get my three the problem is if i have a large number of rows like i don't know whatever large number is if i won't have more than 10 or more than 20 or more than 100 or a thousand or a million whatever it happens to be like in this case for owners trainers and horses um i've got like hundreds of thousands of rows so actually let's do this let's just go ahead and keep those in there and do that and so what i'm doing is i'm going to pass in the source the destination the list of tables and this max rose and so if max rose is greater than zero if it's zero i'm just going to do select star i'm going to get all the rows from the table if it's not then i'm only going to get the rows i'm gonna get a certain number of rows now this is where it gets a little tricky uh let's see if i look at uh well let's look i think i have a lot from horseselect oops count star from horses if i do that yeah so there's a thousand records in there so i can do and this is in my local this is my destination so if i do that so here's all my records but what i can do is i can come in here and i can do a limit and let's say i do a limit to 50 then i'm only going to get 50 rows we've seen this before if you went through the sequel stuff now the next thing i want to do is this is going to just give me the first 50. so instead what i want to do is i want to do an order by and i'm going to assume its id and so what i'm going to do is i'm going to say order by id first column is what i'm always going to do i'm hoping the first column is the primary key and i'm going to do descending so what that says is now i'm going to get the in this case the top 50 ids and now i'm going to get those top 50 ids and so now these are the most recent effectively these are the most recent ones that i've pulled in now i could do something more complicated i could have something that maybe i've got some consistency so i can find when the creation date was or updated date but even here you can see date created is not it's null in all of these cases um let's see so now i can with doing this let's go back over here to the source i can actually limit how many records i bring across what i did most recently was a thousand so if i come down here max rose is a thousand yep and so i'm going to send that in and so that means oops sinks rose so that means if uh since it's greater than zero it's going to come through and it's just going to give the first thousand rows let's do something a little let's do 10 000. that'll probably still be fairly quick oops it's not the sink so here's the thing so i select star from and then all i'm going to do is i'm going to walk through each row and because i'm doing a select star i'm just going to go through essentially the same way i did with the columns where i do you know a value and then if it's a second or beyond it's comma space i'm going to do the same thing with the values except for here instead of tick marks i'm going to do singles uh single ticks unless i get a nun back in which case i'm going to do a null which we've seen this before when we were fixing um [Music] somewhere else we saw this somewhere else i don't remember where but we have seen this so apologies if i forget specifically but since what you do is going to say hey if it's none then i can't really bring this so i need to give it a null and then it'll allow me to insert that record so if i do here uh oh it's a duplicate key so let me get a different key just because i don't think there's a five yet uh oh there's a five let's get let's do like a 100 okay so that worked but if i do 101 this is what we ran into before there we go okay so we're moving our way through here we build up our values we close it off with our uh close prints and our semicolon uh i'm going to insert i'm going to print the insert statement just cuz for like debugging purposes and let's see now what i can do is i can come through it's going to connect to my data it's going to go in and now it's going to go blow out a bunch of inserts uh because i did what ten thousand and so i'm going to get the most 10 recent based on order by that first row whatever it happens to be so i'm going to get the most recent up to 10 000 of like in this case breeders and it's gonna do the same things for horses and owners so it's gonna take a second like see this is why we need to do this limit or probably want to do this limit because it's going to take a while because this is a a very brute force approach to bringing our data over and again it's stepping on stuff because we're truncating a table and then we're adding this on now what we could do is we could for example we could store the last the maximum last id from that table and then say hey let's just grab everything that's greater than the id from the last time we ran it so we're only gonna get new records and there's i mean there's other things we could do we could go in and we could do like a record by record compare but that would be painfully slow for anything that's got more than maybe dozens or hundreds of records once you start getting to thousands particularly because we're reaching out to different databases and doing this one at a time it could be very time consuming now a faster way to do this would be to instead of like we're doing here we're making all these individual calls we could have one gargantuan statement and do this every um like put together a hundred uh inserts and then execute all 100 in one shot and there's some things like that we could do to speed this up uh but right now we're gonna keep it just sort of uh sort of a brute force approach uh we're getting there we're getting through our horses at this point i can't remember what else we have in there see if it's going this way they'll probably look up type then owners let's see it may just be going in we'll see what order it's going through the tables but we're gonna go through we're gonna do this and we're gonna have when we're done so we can look here like i think it's probably already done all dates oop let's do that oh it's deciding it wants to hang for a second oh probably because it's doing all this okay all dates data yeah because it's hammering this thing right now yeah so i've brought my dates across uh that's not super impressive but if you looked at like breeders uh opening tab yeah now you're gonna see so here's all of them and we can see down here if you look i've got my 10 000 records because that's my limit and so now it's pulling all this data in if i want to do a small database then i can do this and keep this down to you know 100 or a thousand or something like that so i can pull like effectively a test across now there are going to be some issues that we're going to run into and we will talk about these as we get further on oh here we go now it's speeding up um one of them is actually the primary one that we have to worry about are foreign key relationships when we try to truncate data it's not going to allow us to do that when there are foreign key relationships around we've got to actually empty tables out in a certain certain order or go through deletes or disable foreign keys and then build them back after we've populated the data we also have to worry about when we do partials like this where we're only doing you know 10 000 rows we have to worry about the situation where we pull partial data in one table and partial data in another but they don't have like one table it's referring to ids that don't exist because they're not part of the partial that was pulled in another so while this may seem pretty simple and straightforward to this point as far as building tables columns and data we're really not there yet also with that we may have to worry about what if we have a change in columns from source to destination and so we're going to tackle some of those next time around uh but for right now at least we've got something that just gives us the more in the basic sense it does do our synchronization of data columns and uh tables so that being said is a good stopping point we'll come back around next time but as always go out there and have yourself a great day a great week and we will talk to you next time you
Transcript Segments
[Music]
well hello and welcome back we're
continuing our series where we're
looking at
our sql sync application which is python
and uh with this we're gonna learn some
python which we've been working through
as well as some sql
now last time around
we start digging into rows we can
take tables that don't exist in our
source and create them in our
destination we can do the same with
columns so if we've got tables
that
have extra columns in the source
and they are not in the target then we
can handle that as well
now we're moving into
data so we're going to actively work on
synchronizing rows at this point
now the way we're going to work it
with rows if we're going to start with a
truncate
and what that does is that's going to
empty out our data so if we go look
at
let's
see let's make sure
so here in my local ones i've got some
records in here
because i've pulled stuff across
what i want to do is i want to be able
to and let me
yeah let me get rid of i'm gonna get rid
of a couple tables uh so if we go to
local my sequel
and we look at tutorial i wanna take the
uh breeders horses and owners i'm going
to take those out because it's such a
large amount of uh data involved in
those
uh let's see so for it's missing columns
missing columns synchros okay so i want
to go up to sync rows
and for table and tables list
and then i'm going to say
if
table
not in
uh let's see what is this gonna be
owners
trainers
oops look at that
horses
so i really
wouldn't know i'm not normally going to
want to do this but i'm going to do it
here because
let's see and i do all the cleanups okay
there we go
so i'm going to do is i'm going to say
if the table is not one of these
then it's going to go through
and i'm going to truncate it so i'm
going to empty the table in the
destination i'm just going to say you
know what i'm going to take everything
that exists in the source
and i'm going to dump it into
the uh the destination now i'm going to
start with my insert so sort of a
typical insert
line for sql so we're going to insert
into
like for example app user and we're
going to take each of the columns much
as we've seen before
so we're going to go through we're going
to do a describe so we get each of our
columns we've seen that before
now what we're going to do here is we're
going to have this little thing to say
it's the first row if it's first row
then all we do is we're going to do the
name
surrounded by these little ticks in case
it's some sort of reserved word now
notice our base is the insert into table
or whatever the table name
and then our parentheses so we get
basically
this far
so insert into like in this case table
name space
open paren and then we're going to go
through each of the columns and we're
going to surround them by ticks and if
it's not the first column
you know if first otherwise
we're going to then
precede it with a comma so the first one
we come in we're going to do here
and that's where we finish and then when
we come down to the next column it's
going to say it's not the first column
so we're going to precede it with this
comma space
and we're going to do that all the way
out and then we're going to get out to
here
and we're going to close that
parentheses
values keyword and then open the
parentheses for the values we're going
to put in there
and now what we're going to do
is we're going to go through
uh let's see oh
i'll show you this max rose in a second
um and it's because this is because i
had some issues recording so
i came through this first time now what
we're gonna do and uh i've added some
stuff so the first thing we want to do
is we just want to get the data
so what we would want to do is we would
say like
let's do address
is we could just say select star
from address
and that's going to get us all the
records now in this case i don't think i
have any uh
oh
because i didn't do that right maybe
let's see do i have any records in here
i do and so i get my three the problem
is if i have a large number of rows like
i don't know whatever large number is if
i won't have more than 10 or more than
20 or more than 100 or a thousand or a
million whatever it happens to be
like in this case
for owners trainers and horses
um i've got like hundreds of thousands
of rows so actually let's do this let's
just go ahead and keep those in there
and
do that
and so what i'm doing is i'm going to
pass in the source
the destination the list of tables and
this max rose
and so if max rose is greater than zero
if it's zero i'm just going to do select
star i'm going to get all the rows from
the table
if it's not then i'm only going to get
the rows i'm gonna get a certain number
of rows now this is where it gets a
little tricky uh let's see if i look at
uh well let's look i think i have a lot
from horseselect
oops
count star from horses
if i do that
yeah so there's a thousand records in
there
so i can do
and this is in my local this is my
destination so if i do that
so here's all my records but what i can
do is i can come in here
and i can do a limit
and let's say i do a limit to 50
then i'm only going to get 50 rows
we've seen this before if you went
through the sequel stuff now the next
thing i want to do is this is going to
just give me the first 50.
so instead what i want to do is i want
to do an order by and i'm going to
assume its id
and so what i'm going to do is i'm going
to say order by id
first column is what i'm always going to
do i'm hoping the first column is the
primary key and i'm going to do
descending so what that says is now i'm
going to get the
in this case the top 50 ids
and
now i'm going to get those top 50 ids
and so now these are the most recent
effectively these are the most recent
ones that i've pulled in
now i could do something more
complicated i could have something that
maybe i've got some consistency so i can
find when the creation date was
or updated date but even here you can
see date created is not
it's null in all of these cases
um let's see so
now i can with doing this let's go back
over here to the source i can actually
limit how many records i bring across
what i did
most recently was a thousand so if i
come down here
max rose is a thousand yep and so i'm
going to send that in
and so that means oops sinks rose so
that means if
uh since it's greater than zero it's
going to come through and it's just
going to give the first thousand rows
let's do something a little let's do 10
000. that'll probably still be fairly
quick
oops it's not the sink
so here's the thing so i select star
from and then all i'm going to do is i'm
going to walk through each row
and because i'm doing a select star i'm
just going to go through essentially the
same way i did with the columns
where i do you know a value and then if
it's a second or beyond it's comma space
i'm going to do the same thing with the
values except for here instead of tick
marks i'm going to do singles
uh single ticks unless
i get a nun back in which case i'm going
to do a null which we've seen this
before when we were fixing um
[Music]
somewhere else
we saw this somewhere else i don't
remember where but we have seen this so
apologies if i forget specifically but
since what you do is going to say hey if
it's none
then i can't really bring
this
so i need to give it a null and then
it'll allow me to insert that record so
if i do here
uh oh it's a duplicate key so let me get
a different key just because
i don't think there's a five yet
uh oh there's a five let's get let's do
like a 100
okay so that worked but if i do 101
this is what we ran into before
there we go okay so we're moving our way
through here
we build up our values we close it off
with our uh close prints and our
semicolon uh i'm going to insert i'm
going to print the insert statement just
cuz
for like debugging purposes and let's
see
now what i can do
is i can come through
it's going to connect to my data
it's going to go in and now it's going
to go blow out a bunch of inserts
uh because i did what ten thousand
and so i'm going to get the most 10
recent
based on
order by that first row whatever it
happens to be
so i'm going to get the most recent up
to 10 000 of like in this case breeders
and it's gonna do the same things for
horses and owners
so it's gonna take a second like see
this is why we need to do this limit or
probably want to do this limit because
it's going to take a while because this
is a
a
very brute force approach
to bringing our data over and again it's
stepping on stuff because we're
truncating a table and then we're adding
this on
now what we could do is we could for
example
we could store
the last the maximum
last id from that table and then say hey
let's just grab everything that's
greater than
the id from the last time we ran it so
we're only gonna get new records
and there's i mean there's other things
we could do we could go in and we could
do like a record by record compare
but that would be
painfully slow for anything that's got
more than maybe dozens or hundreds of
records once you start getting to
thousands
particularly because we're reaching out
to different databases and doing this
one at a time
it could be
very time consuming
now a faster way to do this would be
to instead of like we're doing here
we're making all these individual calls
we could have one gargantuan statement
and do this every
um
like put together a hundred
uh inserts and then execute all 100 in
one shot and there's some things like
that we could do to speed this up
uh but right now we're gonna keep it
just sort of uh
sort of a brute force approach
uh we're getting there we're getting
through our horses at this point
i can't remember what else we have in
there
see if it's going this way they'll
probably look up type then owners let's
see it may just be going in
we'll see what order it's going through
the tables
but we're gonna go through we're gonna
do this
and we're gonna have when we're done so
we can look here like i think it's
probably already done all dates
oop
let's do that oh
it's deciding it wants to hang for a
second
oh probably because it's doing all this
okay
all dates data
yeah because it's hammering this thing
right now
yeah so i've brought my dates across uh
that's not super impressive but if you
looked at like breeders
uh
opening tab
yeah now you're gonna see so here's all
of them and we can see down here if you
look i've got my 10 000 records because
that's my limit
and so now it's pulling all this data in
if i want to do a small database then i
can do this and keep this down to you
know 100 or
a thousand or something like that so i
can pull
like effectively a test across now
there are going to be some issues that
we're going to run into
and we will talk about these as we get
further on oh here we go now it's
speeding up
um
one of them is actually the primary one
that we have to worry about are foreign
key relationships
when we try to truncate data
it's not going to allow us to do that
when there are foreign key relationships
around we've got to actually empty
tables out in a certain certain order or
go through deletes or
disable foreign keys and then build them
back after we've
populated the data
we also have to worry about when we do
partials like this where we're only
doing you know 10 000 rows
we have to worry about the situation
where we pull partial data in one table
and partial data in another but they
don't have
like one table it's referring to ids
that don't exist
because they're not part of the partial
that was pulled in another
so
while this may seem pretty simple and
straightforward to this point as far as
building tables columns and data
we're really not there yet also with
that
we may have to worry about what if we
have a change in columns
from source to destination and so we're
going to tackle some of those next time
around uh but for right now at least
we've got something that just gives us
the more in the basic sense it does
do our synchronization of
data columns and uh tables
so that being said is a good stopping
point we'll come back around next time
but as always go out there and have
yourself a great day a great week and we
will talk to you
next time
you