Detailed Notes
Focus for this episode: We continue our thought process and options for pulling over limited, but valid data.
This tutorial walks through the creation of a database synchronization tool in Python. It continues our Python and SQL Tutorials with a useful application that provides a deeper look at SQL structures and commands.
Repository For Code: git clone [email protected]:rbroadhead/dbsync.git
Transcript Text
thank you [Music] well hello and welcome back we are continuing what has become a little bit of a slog of a tutorial of my Sequel and python last episode was a little bit painful as we were sort of doing some research on the fly so I jumped ahead a little bit and continued into my research for those of you that uh were yelling at me a little bit as you were looking at what we were going through um I'll take a look at this and we'll show you what we came up with uh I do want to flip back real quick so one of the things we did is we're we're into this synchronization of rose and we're in a situation where we have tables that can be synchronized whatever you know if you want to do 100 rows you take the 100 rows those are good we can set that aside and we have that when we do the SMART empty sink is maybe what we call Root tables those awesome we can just pull our data in and we're off and running but now the non-root tables is where it gets a little more tricky because what we have to do is we're gonna have to go through those and figure out what the references are and last time we got a little bit stuck on that but it exists in the information schema key column usage and so I'm going to go ahead and I'm going to do a star first for this just so you can see it all for posterity's sake I guess and I'm just going to go all the way through uh yep let's reconnect here and it's going to take a second and so what you're going to see with these is you'll see what your your schema is it's got what the constraint name is which is pretty darn important um the catalog uh we don't really need that one uh table schema because this is constraint schema versus table schema and then you have uh so table schema table name column name and then you have a reference table schema reference table name now what we're going to do with these is if you remember last time around we were going through uh let's see here we go uh we're going to go through each of these and we're going through the referential constraints and let me do that um [Music] um where is referential constraints there we go so we're going through these and we were finding our constraint names and we were saying okay we know that they're referencing this other table but how do we build that out we need to know the column the table and the column in both the source and the destination so column so if table a column one is actually a foreign key reference to table B column D we'll call it then or column four then what you need to do is build the foreign key relationship based on those four pieces of data and we talked about that last time where we were looking we had sort of like this we're trying to build out this query here which is basically uh from the we're trying to get from our table our parent table all of the values where that column exists in the other one and so let me do this so a little bit here so if we've got like table a and we have table B then and if we have this column let's say and let's just say it's ID and here it's going to be FK the foreign K ID so basically when we're getting dable uh table B data what we need to do is instead of or I'm sorry when we're getting the table a data is that right nope we're getting a table B data we only need to fill it in with data where the ID is equal to the ID and table a and so we do this in this big nasty query kind of thing because what we're going to do is we're going to say from our table which is the reference table name and this is from here so from our key column usage table we've got a reference table reference column and then a column and a table and so in this case we have contract performer ID refers to performer performer ID so in this case we're going to take performer which we got from the referenced table name where performer ID which is the column name is in and then we're going to do select distinct column name from the parent so if I take that and get rid of those just to show you in this case let me get rid of this and then what we're going to see is this is the data we're actually going to pull because it's the only data that really matters because the other columns that are in uh the other data that is in the in this case the performer table we don't need because contract is where we're looking at so we only have these three fields that care we care about from contract and if we do um it's like star from contract there's probably only three rows in there uh in contract oh okay so there's a couple there but we only need the distinct ones but if we look at performers we're going to see that there's probably a butt load of sorry that's a technical term a lot of rows that are in performer and we only need three because we only need to match them up to contract at this point now it's not necessarily because it could be a there could be information that we need from performer elsewhere but from contract we know that we're not limiting it on um number of rows we're limiting it based on what are the rows that exist in performer so in this case we're going to build this thing out and so we're going to come into here and then what we're going to do with each of these for the non-root tables we're going to see if it exists so if we've got that table then what we need to do is for each of the rows and this is that referential constraints let me go back to that one so I can sort of show you what we're going to do with this so this one what we're going to do for each one of these rows we're going to come through to this constraint name which is a constraint name is one uh did I actually build that out right oh here we go and oh I may need to oh because that's the constraint name that's the table name that's the parent I may need to okay maybe I didn't get that filled all the way out let's take a look at that so I'm going to come through and I'm going to say hey here's a in this case here's a table here's something that I need to deal with reference table name blah blah blah what I really need here is the constraint name so what I can probably do here is for each of these is come through let's see load parent data let me see if I've got it up there I may have to uh parent data oh or is that below okay let's see parent table data parent table data okay all right so that's right so if I look up at parent table data then what I'm going to do with each of those is I'm going to give it a table and a parent which I probably don't really need oh and so this is going to say for that table I need to see if it's loaded because now what I need to do is I need to walk in and say okay for each of these tables reference table reference table where's that at oh it's over here so for the reference table for each of those I'm going to have to go check to see if it is a root that has already been loaded or not and so what I can do here is I've got this imported tables so what I'm going to do is I'm going to say uh and so for here for example as I load them I go into imported tables I actually add it in so I'm going to do is I'm going to be passing that in and I probably should pass it back out uh [Music] yeah let's do that so imported whoop tables equals so I'm going to have to pass that back out which means for load parent table data which is here one of the things I want to do is I want to do return whoops imported is that imported rows that should be imported tables up imported tables and then imported tables so keep that the same name so he's going to pass it back in so I need to do here is I'm going to say here I'm going to change this up a little bit because I'm going to say um I want to go through okay so I'm gonna have to go through its parents and so here I'm going to say go through each of those tables um I'm probably going to be easier to for each of these uh let's see if it's less than zero oh I don't need that so I want to get its parents so this is I want to do this one I'm going to do curse oh I need cursor source um so what I want to do for this is I'm going to go through and I'm going to say for each of the tables um FK rows what is FK rows that is fetchall so here uh let's see oh maybe we'll get rid of this I really just need here for each of these I'm going to do I'll probably do it this way so I'm going to do four uh parent in FK Rose and then I'm gonna do if parent zero so if the reference table name is imported rows then it's gonna uh if it's not in imported rows then I need to import it so I'm going to have to do load parent table data I'm going to do imported oh it's 94 rows supported tables and then imported tables equals that and I'm going to give it a table which is oops which is now parent zero oh that's going to be parents because I already have a parent up there and that's going to be that load parent table parent zero blah blah blah oh and then it's going to be the table oh all right now we have to adjust this a little bit let me look at this wait a minute so I'm going to come through here and I'm going to say for each of these I need to check it for the parent which I can probably oh okay let's let's clean this up a little bit because what I can do here is I can say let's see so if it's greater than zero okay then rows and rows oh here we go I'm doing that so I'm going to load the parent data so the parent data basically needs to go through this same thing so let me see if I can get this right this time so he's going to go through the same thing here and why don't I send it parent um apparent is a row oh so I'm going to back it back out I don't know why I need the table there though that's a good question because if I load the parent table data I can probably do it with just the parent and the imported tables because now I'm basically going to do actually I can do it like this um actually let's simplify it even a little bit more so okay I'm going to say if uh parent not in imported rows then I'm going to load the parent else load data for this table and that is going to be I'm going to build out this little thing and this query so this is going to be the uh smart load SQL equals this big thing but first what we need is this guy and actually we could probably oh we don't need that guy oh we do because we need it with the so that's what the constraint name there we go okay so here first we need to uh well first we need to get the columns and tables and that's going to be this is the uh we'll call it the call SQL let's just call it Source SQL and this is going to be where where that equals and let's just put it like here and this is going to be and we need the constraint name which is here out of FK rows that's going to be Row one and then so for each of these so for that we're going to come in and we're going to say let's go execute it and we want to execute it on the source which is fine so let's do here Source SQL and uh this will be we're going to call this Source row equals fetchall and then all we need here uh this is going to be did we bring the did we grab the no we didn't so let's go ahead and grab with this the constraint schema uh constraint schema because then there we go because what we need here is we're going to be uh where am I building that here so uh let's do this schema let's do just temp temp schema uh yeah we'll leave it the way it is so we're gonna come in here and we're going to say select from and then this is going to be um and let's go ahead and do these so the first thing we get is uh row zero oops which is because it's the constraint schema it's going to get really complicated so we're going to actually end up pausing here in a second but uh so here so you got the constraint schema Dot um actually I think I can do the whole thing like this let's see what happens dot the reference table name is Row one and I want the reference to column name again will be uh I'll do this to protect that because we can and that's going to be a row reference column name is zero one two and then it sort of says so let me do this speed this up a little bit so performer ID is the column name which is three oh column three and this is column four and we've got to come back over here to where we did this thing and we do it like this let's see if this will work uh how far do we get uh from that that yeah there we go so that should give us our smart load and then what we're going to do is we're going to execute that uh smart load SQL we're going to have to go ahead and execute that and then use that to insert into the other we're going to build that out and insert into the target row so that's going to be non-trivial in itself and loading the parent is basically going to be the same thing um so we may find a way to to clean this up a little bit but it's basically going to be go load the parent and then possibly load the parent possibly load the parent so let me find a way to clean this up but that is a little Cliffhanger for the next time around because we've gone a little long so I want to keep it short I know this is a little bit of trudging through stuff but this is really some of the but this is really probably the most complicated piece I think as you're seeing of the work we're going to do in the synchronization and once it's done it's going to be pretty stinking cool that being said I hope your day is Pretty stinking cool go out there and have yourself a great day a great week and we will talk to you next time foreign
Transcript Segments
thank you
[Music]
well hello and welcome back we are
continuing what has become a little bit
of a slog of a tutorial of my Sequel and
python
last episode
was a little bit painful as we were sort
of doing some research on the fly so I
jumped ahead a little bit and continued
into my research for those of you that
uh were
yelling at me a little bit as you were
looking at what we were going through
um I'll take a look at this and we'll
show you what we came up with uh I do
want to flip back real quick so one of
the things we did is we're we're into
this synchronization of rose
and we're in a situation where we have
tables that can be synchronized whatever
you know if you want to do 100 rows you
take the 100 rows
those are good we can set that aside and
we have that when we do the SMART empty
sink is maybe what we call Root tables
those awesome we can just pull our data
in and we're off and running
but now the non-root tables is where it
gets a little more tricky because what
we have to do is we're gonna have to go
through those and figure out what the
references are
and last time we got a little bit stuck
on that
but it exists in the information schema
key column usage
and so I'm going to go ahead and I'm
going to do a star first for this just
so you can see it all
for posterity's sake I guess and I'm
just going to go all the way through uh
yep
let's reconnect here
and it's going to take a second and so
what you're going to see with these is
you'll see what your your schema is it's
got what the constraint name is which is
pretty darn important
um the catalog uh we don't really need
that one uh table schema because this is
constraint schema versus table schema
and then you have uh so table schema
table name column name and then you have
a reference table schema reference table
name now what we're going to do with
these is if you remember last time
around we were going through
uh let's see here we go
uh we're going to go through each of
these
and we're going through the referential
constraints and let me do that
um
[Music]
um
where is referential constraints
there we go
so we're going through these and we were
finding our constraint names and we were
saying okay we know that they're
referencing this other table
but how do we build that out we need to
know the column the table and the column
in both the source and the destination
so column so if table a column one is
actually a foreign key reference to
table B column
D we'll call it then or column four
then what you need to do is build the
foreign key relationship based on those
four pieces of data and we talked about
that last time
where we were looking we had sort of
like this we're trying to build out this
query here which is basically uh from
the we're trying to get from our table
our parent table
all of the values where that column
exists in the other one and so let me do
this so a little bit here so if we've
got like table
a and we have table B
then and if we have this column let's
say and let's just say it's ID and here
it's going to be FK the foreign K ID so
basically when we're getting dable uh
table B data what we need to do is
instead of or I'm sorry when we're
getting the table a data
is that right nope we're getting a table
B data we only need to fill it in with
data
where the ID is equal to the ID and
table a
and so we do this in this big
nasty query kind of thing because what
we're going to do is we're going to say
from our table which is the reference
table name and this is from here
so from our
key column usage table
we've got a reference table reference
column and then a column and a table
and so in this case we have contract
performer ID refers to performer
performer ID so in this case we're going
to take performer which we got from the
referenced table name
where performer ID which is the column
name is in and then we're going to do
select distinct column name
from
the parent so if I take that
and get rid of those just to show you in
this case
let me get rid of this
and then what we're going to see is this
is the data we're actually going to pull
because it's the only data that really
matters because the other columns that
are in
uh the other data that is in the in this
case the performer table we don't need
because contract is where we're looking
at so we only have these three fields
that care we care about from contract
and if we do
um
it's like star from
contract
there's probably only three rows in
there uh in contract
oh okay so there's a couple there but we
only need the distinct ones but if we
look at performers
we're going to see that there's probably
a butt load of sorry that's a technical
term a lot of
rows that are in performer and we only
need three because we only need to match
them up to contract at this point now
it's not necessarily because it could be
a
there could be information that we need
from performer elsewhere but from
contract we know that we're not limiting
it on
um
number of rows we're limiting it based
on what are the rows that exist in
performer
so in this case we're going to build
this thing out
and so we're going to come into here and
then what we're going to do with each of
these for the non-root tables
we're going to see if it exists so if
we've got that table then what we need
to do
is for each of the rows and this is that
referential constraints let me go back
to that one so I can sort of show you
what we're going to do with this
so this one what we're going to do for
each one of these rows we're going to
come through to this constraint name
which is a constraint name is one
uh did I actually build that out right
oh here we go
and oh I may need to
oh because that's the constraint name
that's the table name that's the parent
I may need to okay maybe I didn't get
that filled all the way out let's take a
look at that so I'm going to come
through and I'm going to say hey
here's a
in this case here's a table here's
something that I need to deal with
reference table name blah blah blah what
I really need here is the constraint
name
so what I can probably do here is for
each of these
is come through let's see load parent
data let me see if I've got it up there
I may have to
uh parent data
oh or is that below
okay let's see
parent table data
parent table data
okay all right so that's right so if I
look up at
parent
table data
then what I'm going to do with each of
those is I'm going to give it a table
and a parent which I probably don't
really need
oh and so this is going to say for that
table I need to see if it's loaded
because now what I need to do is I need
to walk in and say okay
for each of these tables
reference table reference table where's
that at oh it's over here
so for the reference table
for each of those I'm going to have to
go check to see if it is a root that has
already been loaded or not
and so what I can do here
is I've got this imported tables
so what I'm going to do is I'm going to
say
uh and so for here for example as I load
them I go into imported tables I
actually add it in so I'm going to do is
I'm going to be passing that in
and
I probably should pass it back out
uh
[Music]
yeah let's do that
so imported
whoop
tables equals so I'm going to have to
pass that back out which means for load
parent table data
which is here
one of the things I want to do is I want
to do return whoops
imported is that imported rows that
should be imported tables up
imported tables
and then imported tables
so keep that the same name so he's going
to pass it back in
so I need to do here is I'm going to say
here
I'm going to change this up a little bit
because I'm going to say
um I want to go through
okay
so I'm gonna have to go through its
parents
and so here I'm going to say go through
each of those tables
um I'm probably going to be easier to
for each of these
uh let's see if it's less than zero oh I
don't need that
so I want to get its parents so this is
I want to do this one I'm going to do
curse oh I need cursor source
um
so what I want to do for this is I'm
going to go through and I'm going to say
for each of the tables
um
FK rows what is FK rows that is fetchall
so here
uh let's see oh maybe we'll get rid of
this
I really just need here for each of
these I'm going to do
I'll probably do it this way
so I'm going to do four uh parent
in
FK Rose
and then I'm gonna do if
parent
zero so if the reference table name is
imported rows then
it's gonna uh if it's not
in imported rows then I need to import
it so I'm going to have to do load
parent table
data
I'm going to do imported
oh it's 94 rows supported tables
and then imported tables
equals that
and I'm going to give it a table which
is oops which is now parent zero
oh that's going to be parents
because I already have a parent up there
and that's going to be
that
load parent table parent zero blah blah
blah oh and then it's going to be the
table
oh all right now we have to adjust this
a little bit let me look at this wait a
minute so I'm going to come through here
and I'm going to say for each of these I
need to check it for the parent which I
can probably
oh okay let's let's clean this up a
little bit because what I can do here is
I can say let's see so if it's greater
than zero okay then rows and rows oh
here we go
I'm doing that so I'm going to load the
parent data
so the parent data basically needs to go
through this same thing
so let me see if I can get this right
this time
so he's going to go through the same
thing here
and why don't I send it parent
um
apparent is a row oh so I'm going to
back it back out I don't know why I need
the table there though
that's a good question
because if I load the parent table data
I can probably do it with just the
parent
and the imported tables
because now I'm basically going to do
actually I can do it like this
um
actually let's simplify it even a little
bit more so okay I'm going to say if
uh parent
not in
imported rows
then I'm going to load the parent
else
load data for this table
and that is going to be
I'm going to build out this little thing
and this query
so this is going to be the uh
smart load SQL equals
this big thing
but first what we need is
this guy
and actually we could probably
oh we don't need that guy
oh we do because we need it with the so
that's what the constraint name
there we go okay
so here
first we need to uh
well
first we need to get the columns
and tables
and that's going to be this is the uh
we'll call it the call SQL let's just
call it Source SQL
and this is going to be
where
where that
equals
and let's just put it like here and
this is going to be
and we need the constraint name which is
here
out of FK rows that's going to be Row
one
and then so for each of these so for
that we're going to come in and we're
going to say let's go execute it
and we want to execute it on the source
which is fine
so let's do here
Source SQL
and uh this will be we're going to call
this Source row
equals fetchall
and then all we need here uh this is
going to be
did we bring the did we grab the no we
didn't so let's go ahead and grab
with this
the constraint schema
uh constraint schema
because then
there we go
because what we need here is we're going
to be
uh where am I building that
here so
uh
let's do this schema let's do just temp
temp schema
uh
yeah we'll leave it the way it is so
we're gonna come in here and we're going
to say select from
and then this is going to be
um and let's go ahead and do these
so the first thing we get is uh row
zero oops which is because it's the
constraint schema
it's going to get really complicated so
we're going to actually end up pausing
here in a second but uh so here so you
got the constraint schema Dot
um actually I think I can do the whole
thing like this let's see what happens
dot the reference table name
is Row one
and I want the reference to column name
again will be uh
I'll do this
to protect that because we can and
that's going to be a row reference
column name is zero one two
and then it sort of says so let me do
this
speed this up a little bit so performer
ID is the column name
which is three
oh
column three
and this is
column four
and we've got to come back over here to
where we did this thing
and we do it like this let's see if this
will work uh how far do we get uh from
that that yeah there we go
so that should give us our smart load
and then what we're going to do is we're
going to execute that
uh smart
load SQL we're going to have to go ahead
and
execute that and then use that to insert
into the other we're going to build that
out and insert into the target row so
that's going to be non-trivial in itself
and loading the parent is basically
going to be the same thing
um
so we may find a way to to clean this up
a little bit but it's basically going to
be go load the parent and then possibly
load the parent possibly load the parent
so let me find a way to clean this up
but that is a little Cliffhanger for the
next time around because we've gone a
little long so I want to keep it short
I know this is a little bit of trudging
through stuff but this is really some of
the but this is really probably the most
complicated piece I think as you're
seeing of the work we're going to do in
the synchronization and once it's done
it's going to be pretty stinking cool
that being said I hope your day is
Pretty stinking cool go out there and
have yourself a great day a great week
and we will talk to you next time
foreign