Detailed Notes
Focus for this episode: This starts a multi-part look into creating a hierarchy tree of related tables and how to pull related data into as part of a limited data sync.
This tutorial walks through the creation of a database synchronization tool in Python. It continues our Python and SQL Tutorials with a useful application that provides a deeper look at SQL structures and commands.
Repository For Code: git clone [email protected]:rbroadhead/dbsync.git
Transcript Text
thank you [Music] well hello and welcome back we're continuing our series of my Sequel and python tutorials in this episode we are going to start digging into one of our problem child kinds of things that we're running into which is basically what we want to do is we're going to be able to we need to walk the foreign key hierarchy because what we're going to need to do is find places where there is related data that we're going to have to deal with and this particularly deals with part two here how to import related data for a table because Max Rose is not always going to work if you set it to bring everything you may get it to work but even then what you can run into is if you have constraints that exist so unless you like blow away all your constraints and then rebuild them afterwards which is a possibility but a little bit of a pain and can be very time consuming instead what we're going to do is as we are pulling data and this is going to be both as we delete data but then also as we pull data we're going to need to find stuff that is related for example if you have a lookup and uh you know it's let's just say it's got three values yes no one maybe and each of those values are you know relates to one for yes no for two two for no and three for maybe and you've got another table that utilizes that data you need to be able to bring in the related data so let's say that you only bring in one row and there's another table that uses that lookup and it brings in its first row if it uses no as a value two is the value when you bring in the first row of your lookup table and it only brings in one then you're going to end up in an issue because you're going to have this data that does not have its proper lookup or its related value and so we're going to need to do is figure out how to determine you know in that case is to go through that uh you know call it like a parent or maybe a core type table figure out all the values that it uses and then make sure that those values get pulled in from the lookup table now there are some ways we could get around this kind of stuff we can do things like identify tables that are that are lookups for example you may have something where you've got a customer or contact and you turn around and you say oh this contact hits the address table so I have to make sure that for all my contacts I pull in the right address information well one thing you could do is you could just pull in all of the addresses and then only utilize the ones that you know that exist in the contact you could actually even if you wanted sort of brute force it figure out you know bring everything in from the address fill it all out in the contact and then delete everything in address that's not being used in contact but then it gets sort of it it really isn't that much simpler because at the end of the day what we really have to do is be able to sort of walk that foreign key hierarchy we need to figure out what our core tables are so sort of look at that this way is we're going to be like so a determine base or core tables and then we're going to have to do grab related data whoops for the we'll call them look up tables so in order to do this let's let's talk in more work so what we want to do is we're going to go back and look at if you remember we used the schema information schema referential constraints to figure out all of our indexes and foreign key relationships so if we come back here and we're going to use this query and take a look at that in this case I'm going to go with this project over here because if I do the tutorial there's only like there's only like two that exist let's see if I use tutorial yeah we only have two uh if we use this other one then now I've got a bunch and this gives us just a little more data to work with now what we want to look in order by here so what we've got is our constraints we've got these names which are really in this case not as important because what we want to do is we're going to go through each of these and actually I wonder if this would show how did I set this up in tutorial oh okay I did um so we'll keep it we'll really simplify it now so what you're going to do is you're going to see here in your constraints that table app user references address and then address references lookup type so there is this in this case it's very simply there's this like app user needs address needs lookup type and so what we need to do is figure out how do we build that out essentially now what we need to do is we need to find the the leaf data first if we if we want to build out our hierarchy um I guess well and I guess we can go either way you can either start from find this guy which would be a leaf your lookup type or start here and find your your base tables because what's going to end up happening is you're basically going to say for each table you're going to say hey if this table which would be here if I have a table name that references something so if I have an entry where my that table exists then that means it references something and so I can't pull that data in without first pulling in the leaf data and so what we could do and this is sort of to get us an approach is what we're going to do is we're going to do we're going to go down to the leaf tables we're going to pull in all the records up to the row number that we give it for those Leafs and then as we go up to the next level we're going to pull it in and this is where I'm thinking through a little bit we can pull in data but only where the data exists on the leaf and that may be what we want to do I got to really think about that now actually you know what let's flip that so here's what we're going to do uh I'll go back here just so we have these little bit of notes so when we do the max rows thing what we're going to do is we're going to apply Max rows only to core base tables so if nothing refers to anything else everything's a core table we only grab the you know if it's 100 records we only have to grab 100 records per row or per table and we're done however so we're gonna we're gonna get those but instead we're going to need to do is we're going to need to say okay if we're going to pull 100 first we're going to need to take everywhere that that refers to a child and make sure that the child records are there for that table which makes it a little interesting because that means what we have to do is we're going to go here and let's say we do 100 rows so we're gonna get 100 rows for uh app user but within those we're gonna that is sort of like our start and then we're going to use say so grab the address rows needed for the app user 100 but for that we need to do grab the look up type rows needed for address and so we need to do is the first thing we're going to do is we're going to go through select let's look at this uh in this specific case so what we would do is we're going to come through here and we're going to say so for the first thing what we need to do is just to figure out this hierarchy is we need to say we're going to find this and this would be app user where that and table name equals app user oh lowercase so we get in this case is we get our one row and what we're going to do is we're going to have this starting thing so this is going to be the uh let's do it this way we'll go in here and we're going to have where we do sync rows let me do this a little differently because what we're going to do is we're going to say okay this is going to be we're going to call this we're going to call this smart sink Rose and this is going to get really nasty really fast because first thing we're going to do okay so we have our Target tables so what we're going to need is here we're going to be let's say um imported tables because what we're going to need to do let's see if we got that is that right uh yeah we can do it that way because what we're going to have to do is we're going to go through each of these tables let's see oh we do a drop first so we're going to do since we're doing it this way so this is empty table smart we're going to call it smart empty sink the rows because what we're going to have to do is first with each of these uh which we can probably get away with this here because we can truncate oh because we're dropping our keys so we'll just we don't have to worry about our Empty Tables so we're going to go through and we're going to empty everything that's pretty easy to do that's a simple thing but now we're going to go and import data and here we're not going to drop the foreign Keys anymore instead we're going to do this check to see if this is referenced in a foreign key so uh let's see so this thing says it is you know if I come in through I'm going to say for each row that I get here it's going to have to do something now here I was dropping foreign keys I don't want to do that what I want to do instead is I'm going to say this so if a table so if uh length of FK rows we'll just say it's less than one then we can go right ahead and load data else whoops and let's just say uh we'll just do print FK rows just so we can do it um well let's sit this way print load data for and this will be uh the table name so table and then if not then we need to uh we'll do pause and load for child tables data for children and then in this case what I need to do is let me go back and grab that uh when I do the SMART stink here we go so we're doing each of these for each row uh smart empty sink then we're gonna do it for the children but here what we're going to do whoops wrong key uh let's see so for each of those I'm gonna so I can do the but what I really want to do is um I want the which table do I want I want the referenced table name so this is going to be referenced table name because now what I need to do is this is just going to be FK row 0. and we're going to do it like that for now so what we're going to do is we're going to come through here and we're going to basically say hey either we load data for our table and let's go back to mate so let's do where was that smart sync okay come right in here smart oh smart empty sink Rose and quit because that'll give us a little bit to work with and if we look whoop before we're just doing print feature Flags I don't think I need that so if we come in here it doesn't matter what I give it and it's going to come in here and it's going to say none type so when I come into smart empty sync 423 it's going to be because it's empty where'd we go here oh uh wait in Maine oh dbsync 423 oh because I don't have the target table it's my mistake that's right I need to do uh whatever I do I need to do I need to do compare tables no matter what so that we need to do something I missed earlier Okay so so I'm going to go into each of these and now I'm seeing this uh oh because I'm doing this for a couple of databases so I need to go back to my settings and do oh because I did kickinator H goals so I had my three okay so what I'm doing here is we're going to see where data clean it's going to go through each of those it's going to do our constraints um yes I already cleaned all of that stuff out so I'm not gonna have to re-synchronize my data but that's okay and so now we can see here where we're getting some of these we're going to get a load data but then we're going to come back here and we'll get some of these we're going to have to do our data sync first and what we're going to end up doing is turn this into a little bit of a self-referential function so that we'll be able to walk down low the basically walk the tree and get to the bottom load it and then work our way back up and I think that's going to stop this time I just want to get started we're definitely getting into some uh it's going to get pretty complicated fairly quickly but I want to step through it still in some bite-sized chunks and start with the idea that hey we're going to come in here and we're going to look at so if we look for this one we are going to look at what's referenced we're going to find first tables that do not have any references those are free we're ready to go with those but if they aren't then we're going to have to walk that walk that path walk down that tree and see where that goes and that's we're going to pick up next time so go out there and have yourself a great day a great week and we will talk to you next time foreign
Transcript Segments
thank you
[Music]
well hello and welcome back we're
continuing our series of my Sequel and
python tutorials in this episode we are
going to start digging into one of our
problem child kinds of things that we're
running into which is basically what we
want to do is we're going to be able to
we need to walk the foreign key
hierarchy because what we're going to
need to do is
find places where there is related data
that we're going to have to deal with
and this particularly deals with part
two here how to import related data for
a table because Max Rose is not always
going to work
if you set it to bring everything you
may get it to work but even then what
you can run into is
if you have constraints that exist
so unless you like blow away all your
constraints and then rebuild them
afterwards which is a possibility
but a little bit of a pain and can be
very time consuming instead what we're
going to do is as we are pulling data
and this is going to be both as we
delete data but then also as we pull
data we're going to need to find stuff
that is related for example if you have
a lookup
and uh you know it's let's just say it's
got three values yes no one maybe and
each of those values are you know
relates to one for yes no for two two
for no and three for maybe
and you've got another table that
utilizes that data
you need to be able to bring in the
related data so let's say that you only
bring in one row
and there's another table that uses that
lookup and it brings in its first row if
it uses no as a value two is the value
when you bring in the first row of your
lookup table and it only brings in one
then you're going to end up in an issue
because you're going to have this
data that does not have its proper
lookup or its related value
and so we're going to need to do is
figure out how to determine
you know in that case
is to go through that uh you know call
it like a parent or maybe a core type
table
figure out all the values that it uses
and then make sure that those values get
pulled in from the lookup table
now there are some ways we could get
around this kind of stuff we can do
things like identify tables that are
that are lookups for example
you may have something where you've got
a customer or contact and you turn
around and you say oh this contact hits
the address table so I have to make sure
that for all my contacts I pull in the
right address information well one thing
you could do is you could just pull in
all of the addresses and then only
utilize the ones that you know that
exist in the contact you could actually
even if you wanted sort of brute force
it
figure out you know bring everything in
from the address
fill it all out in the contact and then
delete everything in address that's not
being used in contact but then it gets
sort of it it really isn't that much
simpler because at the end of the day
what we really have to do is be able to
sort of walk that foreign key hierarchy
we need to figure out what our core
tables are
so sort of look at that this way is
we're going to be like so a
determine base or core tables
and then we're going to have to do grab
related data whoops
for the we'll call them look up tables
so in order to do this let's let's talk
in more work
so what we want to do is we're going to
go back and look at if you remember we
used the schema information schema
referential constraints to figure out
all of our
indexes and foreign key relationships so
if we come back here
and we're going to use this query and
take a look at that in this case I'm
going to go with this project over here
because if I do the tutorial there's
only like there's only like two that
exist let's see if I use tutorial
yeah we only have two uh if we use this
other one then now I've got a bunch and
this gives us just a little more data to
work with
now what we want to look in order by
here
so what we've got is our constraints
we've got these names which are really
in this case not as important
because what we want to do
is we're going to go through each of
these
and actually I wonder if this would show
how did I set this up in tutorial oh
okay I did
um so we'll keep it we'll really
simplify it now so what you're going to
do is you're going to see here in your
constraints that table
app user
references address
and then address references lookup type
so there is this in this case it's very
simply there's this like app user needs
address
needs
lookup type
and so what we need to do is figure out
how do we
build that out essentially
now what we need to do is we need to
find the
the leaf data first
if we if we want to build out our
hierarchy
um I guess well and I guess we can go
either way you can either start from
find this guy which would be a leaf your
lookup type
or start here and find your your base
tables because what's going to end up
happening is you're basically going to
say for each table you're going to say
hey if
this table
which would be here
if I have a table name that references
something so if I have an entry where my
that table
exists then that means it references
something
and so I can't pull that data in without
first pulling in the leaf data and so
what we could do and this is sort of to
get us an approach is what we're going
to do is we're going to do we're going
to go down to the leaf tables we're
going to pull in all the records
up to the row number that we give it for
those Leafs
and then as we go up to the next level
we're going to pull it in
and this is where I'm thinking through a
little bit we can pull in data but only
where the data exists on the leaf
and that may be what we want to do I got
to really think about that now actually
you know what let's flip that so here's
what we're going to do
uh I'll go back here just so we have
these little bit of notes so when we do
the max rows thing
what we're going to do is we're going to
apply Max
rows only to
core base tables
so if nothing refers to anything else
everything's a core table we only grab
the you know if it's 100 records we only
have to grab 100 records per row or per
table and we're done
however so we're gonna we're gonna get
those but instead we're going to need to
do
is we're going to need to say okay if
we're going to pull 100 first
we're going to need to take everywhere
that that refers to a child and make
sure that the child records are there
for that table which makes it a little
interesting because that means what we
have to do
is we're going to go here and let's say
we do 100 rows so we're gonna get 100
rows
for uh app user
but within those we're gonna that is
sort of like our start
and then we're going to use say so
grab the
address rows
needed
for the
app user 100
but for that we need to do grab
the look up type
rows
needed for address
and so we need to do is the first thing
we're going to do is we're going to go
through select let's look at this
uh in this specific case so what we
would do is we're going to come through
here
and we're going to say
so for the first thing what we need to
do is just to figure out this hierarchy
is we need to say we're going to find
this and this would be app user
where that
and
table name equals
app user
oh
lowercase
so we get in this case is we get our one
row and what we're going to do is we're
going to have this starting thing so
this is going to be the uh let's do it
this way we'll go in here and we're
going to have
where we do sync rows
let me do this a little differently
because what we're going to do is we're
going to say
okay this is going to be we're going to
call this we're going to call this smart
sink Rose
and this is going to get really nasty
really fast
because first thing we're going to do
okay so we have our Target
tables
so what we're going to need is here
we're going to be let's say
um imported tables because what we're
going to need to do
let's see if we got that is that right
uh
yeah we can do it that way
because what we're going to have to do
is we're going to go through each of
these
tables let's see oh we do a drop first
so we're going to do
since we're doing it this way so this is
empty table smart
we're going to call it smart empty sink
the rows
because what we're going to have to do
is first with each of these
uh which we can probably get away with
this
here
because we can truncate oh because we're
dropping our keys so we'll just we don't
have to worry about our Empty Tables so
we're going to go through and we're
going to empty everything
that's pretty easy to do that's a simple
thing but now
we're going to go and import data
and here we're not going to drop the
foreign Keys anymore
instead we're going to do this check to
see if this is referenced in a foreign
key so
uh let's see
so this thing says it is you know if I
come in through I'm going to say for
each row
that I get here
it's going to have to do something now
here I was dropping foreign keys I don't
want to do that what I want to do
instead is I'm going to say this so if a
table so if
uh length
of FK rows
we'll just say it's less than one
then
we can
go right ahead
and
load data
else
whoops
and let's just say uh
we'll just do print FK rows just so we
can do it
um well let's sit this way
load data
for
and this will be uh
the table name so table
and then if not then we need to
uh we'll do pause and
load for child tables
data for children
and then in this case
what I need to do is let me go back and
grab that
uh when I do the SMART stink here we go
so we're doing each of these
for each row
uh smart empty sink then we're gonna do
it for the children but here what we're
going to do
whoops
wrong key uh let's see so for each of
those I'm gonna so I can do the but what
I really want to do is
um
I want the which table do I want I want
the referenced table name
so this is going to be
referenced table name
because now what I need to do is this is
just going to be FK row 0.
and we're going to do it like that for
now
so what we're going to do is we're going
to come through here
and we're going to basically say hey
either
we load data for our table and let's go
back to mate so let's do
where was that smart sync
okay come right in here
smart oh
smart empty sink Rose
and quit
because that'll give us a little bit to
work with
and if we look whoop
before we're just doing print feature
Flags
I don't think I need that
so if we come in here
it doesn't matter what I give it
and it's going to come in here and it's
going to say none type so when I come
into smart empty sync 423
it's going to be because it's empty
where'd we go here
oh uh wait
in Maine
oh dbsync 423
oh because I don't have the target table
it's my mistake that's right I need to
do
uh whatever I do I need to do
I need to do compare tables no matter
what
so that we need to do
something I missed earlier Okay so
so I'm going to go into each of these
and now I'm seeing this uh
oh because I'm doing this for a couple
of databases
so I need to go back to my settings
and do oh because I did kickinator H
goals so I had my three okay
so what I'm doing here is we're going to
see where data clean
it's going to go through each of those
it's going to do our constraints
um
yes I already cleaned all of that stuff
out so I'm not gonna have to
re-synchronize my data but that's okay
and so now we can see here where we're
getting some of these we're going to get
a load data
but then we're going to come back here
and we'll get some of these we're going
to have to do our data sync first and
what we're going to end up doing is turn
this into a little bit of a
self-referential function so that we'll
be able to walk down low the
basically walk the tree and get to the
bottom load it and then work our way
back up and I think that's going to stop
this time I just want to get started
we're definitely getting into some uh
it's going to get pretty complicated
fairly quickly
but I want to step through it still in
some bite-sized chunks and start with
the idea that hey we're going to come in
here
and we're going to look at
so if we look for this one
we are going to look at what's
referenced we're going to find first
tables that do not have any references
those are free we're ready to go with
those but if they aren't then we're
going to have to walk that walk that
path walk down that tree and see where
that goes and that's we're going to pick
up next time so go out there and have
yourself a great day a great week and we
will talk to you next time
foreign