Detailed Notes
Focus for this episode: We wrap up our data synchronization issues and the final changes needed to make this whole application work. Now there is an application we can use to synchronize and quickly spin up development databases.
This tutorial series walks through the creation of a database synchronization tool in Python. It continues our Python and SQL Tutorials with a useful application that provides a deeper look at SQL structures and commands.
Repository For Code: git clone [email protected]:rbroadhead/dbsync.git
Transcript Text
foreign [Music] well hello and welcome back we are finally getting to the end of our database sync and basically I was walking through a lot of troubleshooting the last couple episodes really came down to one or two changes that I want to talk through to sort of get us over the hump uh as it were so the first thing that we were running into if I can find the right reference show uh here we go referential load so if you remember or you can look back to the prior a couple of episodes one of the things we're doing is we're going through and we we first pull all the data for the tables that don't have any references so we just know and those are based on our number of rows either if it's set to zero it's just going to pull everything otherwise it'll be you know if we set it to 100 it's going to it's only going to pull 100 rows out of that table now when we go to tables that have references to other tables it's not that clean and so what we're doing is we're saying let's get all the data for the tables we know the ones where we have like you know X number of rows let's get that stuff pulled over and then we're going to go through and we're going to where we don't have a table loaded if something references it then we're going to build the data based on the primary Keys within that table now the trick about that is we can have a table reference actually three or four other tables so we were building this thing that was called imported tables and what we weren't doing was checking it quite enough so we're running in a couple of situations where two tables would reference the same table and when the first one came through it was awesome when the second one came through it wasn't first checking whether that table had been imported and it was causing some issues because technically we would have gotten and we did we got all of the correct rows the first time through because what we do is we would build if you remember we would build that query based on all of the tables that were referenced and then that's how we pull the data in so it was easy for us to you know catch some have some extra slip through so the simplest thing to do that is I just say hey since we keep we use referential load to load it first thing I did I said hey if the table that we're going to load is already in imported tables then we just kick imported tables back out we're done we don't do anything because we're technically we're done so we that allows us to not double enter a table because we were getting it loaded properly we're getting it populated for example somewhere down here we're going to see where yeah we're appending and we're good so we are saying hey this thing got you know got built one way or another or here we can just do a straight load of the table data and we're still we're making sure that we get that table in there if we get it into imported tables twice we really don't care if it's in that array twice what we don't want to do is if it's there once we don't want to checking you know we don't want to try to load again so putting this check up here should and did in all the cases I was running through did fix that now another thing that I was running into which is really a change in how you uh it's to address something that should not happen one of the things I ran into that we talked about well that got me wondering about how we were doing enumerated types back several episodes one of the things we're running into is that enumerated types weren't coming across properly we're getting things where it wasn't it was telling us there's basically a a size constraint for a value that we're putting into an enumerated row so an enumerated values could be things so like we could do like enum and let's say it's uh and this is it's not going to look exactly like that but let's say we had an enome that was like grape apple Orange and so that would mean that in that column we should see grape apple or orange but I've run into some situations where there would be for example an empty string or we had some sort of an a new an enum where like let's say we had uh some large let's say some long name and then later we came back and said no we don't want that what we really want is just these three well the problem is it not necessarily it's not necessarily going to go back and fix that so you may still have even though you changed the enum when you change the enumerator type it is not necessarily going to like it's not going to like cause a trigger or something that's going to break the data and say Hey you have to remove the fields that have that value so there are ways to get invalid enum values into data and that was causing some issues and so the easiest way to fix that is to cheat and so that's where we go back and use our where is it uh I'll find one here we use our try and accept so basically what we're going to do there is say hey I'm going to try to execute this but I'm not going to bother if it doesn't I also ran into this in a couple of there's some odd combinations of a table SQL that was getting pulled across I couldn't figure out how they built it in the first place because it was technically illegal like if I just take this the pure SQL it does not like it so in those cases what I do is I just say hey if I can't generate it if I get an exception trying to execute it instead of bailing out on the script entirely what I'm going to do is I'm just going to print whatever it was I was trying to run and then I'm just going to continue on so what I would end up getting somewhere back here maybe is stuff like uh let's see if I can find one down here but I think I got them all clean so it's probably not going to show up anymore but I would get yeah unfortunately I didn't steal that oh here like here's one so I think it didn't like this for some reason and it just kicked this out I'm not sure why I did but uh so for example it didn't run this one so whatever the enume was uh oh like here they had this and I believe this is source and this is destination so destination had this thing that didn't value that was some weird value that we had in there and it said hey I'm not going to allow that so when we tried to alter it for some reason it didn't do it maybe the trigger actually kicked in in this case or it could have been the I don't know the null wouldn't have been a problem but for some reason it didn't like it so if I wanted I could I can see that on my output and I can go in and try to manually fix that if I need to um you know it depends sometimes that's going to be an issue sometimes it's not but it does allow us to for the most part then push everything across and then we can always go back and do the fixes and that allows us now to do it based on uh max number of rows as opposed to just having to do everything and sort of Slam at home by you know shutting off triggers and constraints and in so doing we now have a script that will run through and we can do whatever we need to with it as far as tables or triggers or store procedures or functions and let's see so we don't really need these notes anymore and so we have is we've got our little feature flag thing so if we come in we can pick whichever values we want to do which is tables columns indexes rows row data synchronized procedures we've got a couple we missed I think we missed so we need that Dash f for our functions so we'll do that uh let's see so t c d I p r ows F functions and TR would be whatever TR was what was TR is oh triggers let's do this now we have our are sort of our documentation of sorts and it's going to go in and do these and it does them in uh a bit of an order you know we do have a logical order to it and it's going to come through for each of the or actually it goes down here we're going to come through here and depending on what we have we're going to do these for each of our list of databases within our settings.py file we went ahead and pulled that in so we're going to we pulled our source our Target our databases so we've got a database list here and it's just going to be a list of names and so you do need that database to exist so you can't just you know say I've got a Dev database you can't just push that when it's not there you're going to have to actually do you know create database depth so it does assume that much it assumes at least that you've gone in and created your database but once you've got it it's going to fly through and you can you know play around with it I would definitely not create a database you know somewhere that you are worried about it right now and it's put it on a local maybe initially and use it to see if the database the data is coming across properly connect to it maybe if you've got unit tests or something to run against it that will help and once you've done that then you've got you know this thing you can plug it into however you want to all of these functions are pretty straight moderately straightforward like I said we've got the uh the tries that we've added now so it's a little more likely to go through it's not going to just bail out like it did before I have adjusted a little bit some of the sum of the output that is here so you're not going to see quite as much um basically just because it was a lot to see you see we've got a lot of like debug related stuff that is still in there but after all of that uh debugging and walking through and coding and things like that it turned out that it was actually a little bit of a an invalid database I was pulling across and that was what was really you know biting me is that I had something in a state that it should not have gotten to and it's in case basically the referential integrity was not exactly where it was supposed to be but if you do this you should be able to get it as long as you make sure you know that you should be able to keep your referential integrity but that's again make sure that you're doing things like grabbing your uh your eye your indexes your triggers and pull those things across and you should be in good shape if you want a like a development version a very quick and clean database is you can probably get away with just if you just do tables columns and indexes depends on whether or not you know if you've got functions or procedures so you could come in and do like if you wanted to do tables so do like a dash T Dash C uh columns indexes Dash I then you could build out a sort of like a development a clean or empty development database pretty quick if you needed to do your procedures and your functions add your dash P enter Dash f and then you're you've got a pretty in pretty good shape at that point so what we've got is something that allows you to get in there quick make a database now we actually allow you to push data into it so it's not just a shell and you've got a pretty good example of a lot of python to work with here as well not to mention we got pretty deep into the world of um let me go back over here uh we did so we got pretty deep into especially the information schema in MySQL so there's a lot of stuff here if you're ever curious about how to find tables or indexes or columns or values or all that kind of stuff that's the the metadata this will give you a really good start you can jump into any one of the the functions and your pro you know like say function itself missing functions I'm going to be able to see right here that this is roughly how I can get a list of functions obviously you know you may there's some things in here like we've got our schemas already defined you may get more than you wanted but you do a select star and you can start working your way through it and get down to the specific query that you need for your information this has been updated in Version Control so if you check the links you'll be able to get the latest you can play around with it you will have to set up your settings file you can see where we did that what it looked like initially and how we just moved stuff over you'll also be able to see it if you look up here uh when it does a connect somewhere in here uh oh maybe not let's see oh so I go right in because I give it the config and if I go to here where is I think I've already just yeah so I'm just pulling it straight in because I've already set up most of the config so you can see that in examples you can also see that you need to see it on the code or take a look back at a couple tutorials early on we set that up so you can see what needs to be done to configure your source and your target database make sure you do not confuse the two because that can cause you all sorts of not fun things to happen that being said we can finally wrap this one up so we will continue throwing out uh various tutorials and stuff like that I think this gives us a good hopefully you've had a really good uh sir crash course into SQL particularly my Sequel and python using those building some scripts and finding kind of you know useful little application maybe that you can build and now you have something that you can you can build on top of this and see where your see where your skills take you that being said we'll let you get back to it so go out there and have yourself a great day a great week and we will talk to you next time [Music]
Transcript Segments
foreign
[Music]
well hello and welcome back we are
finally getting to the end of our
database sync and basically I was
walking through a lot of troubleshooting
the last couple episodes really came
down to
one or two changes that I want to talk
through to sort of get us over the hump
uh as it were so the first thing that we
were running into if I can
find the right reference show
uh here we go referential load
so if you remember or you can look back
to the prior a couple of episodes one of
the things we're doing is we're going
through and we
we first pull all the data for the
tables that don't have any references so
we just know and those are based on our
number of rows either if it's set to
zero it's just going to pull everything
otherwise it'll be you know if we set it
to 100 it's going to it's only going to
pull 100 rows out of that table
now when we go to tables that have
references to other tables it's not that
clean
and so what we're doing is we're saying
let's get all the data for the tables we
know the ones where we have like you
know X number of rows let's get that
stuff pulled over and then we're going
to go through and we're going to where
we don't have a table loaded
if something references it then we're
going to build the data based on the
primary Keys within that table
now the trick about that is we can have
a table reference actually three or four
other tables
so we were building this thing
that was called imported tables
and what we weren't doing was checking
it quite enough so we're running in a
couple of situations where
two tables would reference the same
table
and when the first one came through it
was awesome when the second one came
through
it wasn't first checking whether that
table had been imported and it was
causing some issues because technically
we would have gotten and we did we got
all of the correct rows the first time
through
because what we do is we would build if
you remember we would build that query
based on all of the tables that were
referenced
and then that's how we pull the data in
so it was easy for us to you know catch
some have some extra slip through so the
simplest thing to do that is I just say
hey
since we keep we use referential load to
load it first thing I did I said hey if
the table that we're going to load is
already in imported tables then we just
kick imported tables back out we're done
we don't do anything because we're
technically we're done
so we that allows us to not double enter
a table because we were getting it
loaded properly we're getting it
populated
for example somewhere down here we're
going to see where yeah
we're appending
and we're good so we are saying hey this
thing got you know got built one way or
another or here we can just do a
straight load of the table data and
we're still we're making sure that we
get that table in there if we get it
into imported tables twice we really
don't care if it's in that array twice
what we don't want to do is if it's
there once we don't want to checking you
know we don't want to try to load again
so putting this
check up here
should and did in all the cases I was
running through did fix that now another
thing that I was running into which is
really a change in how you uh it's to
address something that should not happen
one of the things I ran into that we
talked about well that got me wondering
about how we were doing enumerated types
back several episodes
one of the things we're running into is
that enumerated types weren't coming
across properly we're getting things
where it wasn't it was telling us
there's basically a a size constraint
for a value that we're putting into an
enumerated row so an enumerated values
could be things so like we could do like
enum
and let's say it's uh
and this is it's not going to look
exactly like that but let's say we had
an enome that was like grape apple
Orange
and so that would mean that in that
column we should see grape apple or
orange
but I've run into some situations where
there would be for example an empty
string
or
we had some sort of an a new an enum
where
like let's say we had uh
some large let's say some long name
and then later we came back and said no
we don't want that what we really want
is just these three well the problem is
it not necessarily it's not necessarily
going to go back and fix that so you may
still have even though you changed the
enum
when you change the enumerator type it
is not necessarily going to like it's
not going to like cause a trigger or
something that's going to break the data
and say Hey you have to remove the
fields that have that value so there are
ways to get invalid
enum values into data and that was
causing some issues and so the easiest
way to fix that is to cheat and so
that's where we go back and use our
where is it uh I'll find one here
we use our try and accept
so basically what we're going to do
there is say hey I'm going to try to
execute this
but I'm not going to bother if it
doesn't I also ran into this in a couple
of there's some odd
combinations of a table SQL that was
getting pulled across I couldn't figure
out how they built it in the first place
because it was technically
illegal like if I just take this the
pure SQL it does not like it
so in those cases what I do is I just
say hey if I can't generate it if I get
an exception trying to execute it
instead of bailing out on the script
entirely what I'm going to do is I'm
just going to print whatever it was I
was trying to run and then I'm just
going to continue on
so what I would end up getting somewhere
back here maybe is stuff like uh let's
see
if I can find one down here but I think
I got them all clean so it's probably
not going to show up anymore
but I would get
yeah
unfortunately I didn't steal that
oh here like here's one
so I think it didn't like this for some
reason
and
it just kicked this out
I'm not sure why I did but uh so for
example it didn't run this one so
whatever the enume was
uh oh like here they had this
and I believe this is source and this is
destination so destination had this
thing that didn't value that was some
weird value that we had in there and it
said hey I'm not going to allow that so
when we tried to alter it for some
reason it didn't do it maybe the trigger
actually kicked in in this case or it
could have been the I don't know the
null wouldn't have been a problem but
for some reason it didn't like it so if
I wanted I could I can see that on my
output and I can go in and try to
manually fix that if I need to
um you know it depends sometimes that's
going to be an issue sometimes it's not
but it does allow us to for the most
part then push everything across and
then we can always go back and do the
fixes
and that allows us now to do it based on
uh max number of rows as opposed to just
having to do everything and sort of Slam
at home by you know shutting off
triggers and constraints
and in so doing we now have a script
that will run through
and we can do whatever we need to with
it as far as tables or triggers or store
procedures or functions and
let's see so we don't really need these
notes anymore
and so we have is we've got our little
feature flag thing so if we come in
we can pick whichever values we want to
do which is
tables columns indexes rows row data
synchronized procedures we've got a
couple we missed I think we missed so we
need that Dash f for our functions
so we'll do that
uh let's see so t c
d
I
p
r
ows F functions and TR would be whatever
TR was what was TR is oh triggers
let's do this
now we have our
are sort of our documentation of sorts
and it's going to go in and do these and
it does them in uh a bit of an order
you know we do have a logical order to
it and it's going to come through for
each of the or actually it goes down
here we're going to come through here
and depending on what we have
we're going to do these for each of our
list of databases within our
settings.py file we went ahead and
pulled that in so we're going to we
pulled our source our Target our
databases so we've got a database list
here
and it's just going to be a list of
names and so you do need that database
to exist so you can't just
you know say I've got a Dev database you
can't just push that when it's not there
you're going to have to actually do you
know create database depth so it does
assume that much it assumes at least
that you've gone in and created your
database
but once you've got it
it's going to fly through and you can
you know play around with it I would
definitely not create a database you
know somewhere that you are worried
about it right now and it's put it on a
local maybe initially and use it to see
if the database the data is coming
across properly connect to it maybe if
you've got unit tests or something to
run against it that will help
and once you've done that then you've
got you know this thing you can plug it
into however you want to
all of these functions are pretty
straight moderately straightforward like
I said we've got the uh the tries that
we've added now so it's a little more
likely to go through it's not going to
just bail out like it did before I have
adjusted a little bit some of the
sum of the output that is here so you're
not going to see quite as much
um basically just because it was a lot
to see you see we've got a lot of like
debug related stuff that is still in
there
but after all of that uh debugging and
walking through and coding and things
like that it turned out that it was
actually a little bit of a an invalid
database I was pulling across
and that was what was really you know
biting me is that I had something in a
state that it should not have gotten to
and it's in case basically the
referential integrity was not exactly
where it was supposed to be but if you
do this you should be able to get it as
long as you make sure you know that you
should be able to keep your referential
integrity but that's again make sure
that you're doing things like grabbing
your uh your eye your indexes your
triggers
and pull those things across and you
should be in good shape if you want a
like a development version a very quick
and clean database is you can probably
get away with just if you just do tables
columns
and indexes depends on whether or not
you know if you've got functions or
procedures so you could come in and do
like if you wanted to do tables so do
like a dash T Dash C
uh columns indexes Dash I then you could
build out a
sort of like a development a clean or
empty development database pretty quick
if you needed to do your procedures and
your functions add your dash P enter
Dash f
and then you're you've got a pretty in
pretty good shape at that point
so what we've got is something that
allows you to get in there quick make a
database now we actually allow you to
push data into it so it's not just a
shell
and you've got a pretty good example of
a lot of python to work with here as
well not to mention we got pretty deep
into
the world of
um let me go back over here uh we did so
we got pretty deep into especially the
information schema in MySQL so there's a
lot of stuff here if you're ever curious
about how to find
tables or indexes or columns or values
or all that kind of stuff that's the the
metadata this will give you a really
good start you can jump into any one of
the the functions and your pro you know
like say function itself missing
functions I'm going to be able to see
right here that this is roughly how I
can get a list of functions
obviously you know you may there's some
things in here like we've got our
schemas already defined you may get more
than you wanted but you do a select star
and you can start working your way
through it and get down to the specific
query that you need for your information
this has been updated in Version Control
so if you check the links you'll be able
to get the latest you can play around
with it you will have to set up your
settings file you can see where we did
that what it looked like initially and
how we just moved stuff over you'll also
be able to see it if you look up here
uh when it does a connect somewhere in
here
uh oh maybe not let's see
oh so I go right in because I give it
the config and
if I go to here
where is I think I've already just yeah
so I'm just pulling it straight in
because I've already set up most of the
config so you can see that in examples
you can also see that
you need to see it on the code or take a
look back at a couple tutorials early on
we set that up so you can see what needs
to be done to configure your source and
your target database make sure you do
not confuse the two because that can
cause you all sorts of not fun things to
happen
that being said
we can finally wrap this one up so we
will continue throwing out uh various
tutorials and stuff like that I think
this gives us a good hopefully you've
had a really good uh sir crash course
into SQL particularly my Sequel and
python using those building some scripts
and finding kind of you know useful
little application maybe that you can
build and now you have something that
you can you can build on top of this and
see where your see where your skills
take you
that being said we'll let you get back
to it so go out there and have yourself
a great day a great week and we will
talk to you next time
[Music]