Detailed Notes
Focus for this episode: We wrap up our first pass at index creation and synchronization.
This tutorial walks through the creation of a database synchronization tool in Python. It continues our Python and SQL Tutorials with a useful application that provides a deeper look at SQL structures and commands.
Repository For Code: git clone [email protected]:rbroadhead/dbsync.git
Transcript Text
[Music] well hello and welcome back we're continuing our series looking at sql and python so combined into this database synchronization thing and last episode we wrapped up we're getting into multi-column indexes and looking at how to sync those so for example in this case we've got one it's called horse underscore b underscore o which is on the horses table and it ties breeder and owner together and so what we need to be able to do is what we're going to want to be able to create is effectively this line is we want to be able to do create index give it the name on the table and then we're going to give it the two what do i call it owner breeder i'm sorry owner yeah and we're going to give it the two columns now in order to do so what we're going to do and this is where we paused last time because we went a little bit long so we're going to do is we're coming through each of the source indexes we're going to grab the name which would be horse underscore b underscore o and then we've got this destinations dest names array and we're going to say if it's not in there if the name is if we haven't already found this column or this index then we're keeping track of index names so we're going to create an array called index names which already exists up there so let's do it this way i should be able to do it this way let's see let's try it like that and it doesn't like in a pin somewhere there um list indices must be indexed oh well let's do that and if i do that then i should still be okay nope oh because it's a list indices so if i do that oh which is fine because what we do is i'm going to create this and i'm going to add it okay i'm good uh sorry it's a little bit sidetracked there so i've got this dictionary which is going to be keyed off of the index name and then for each of those i'm going to have an array that is going to be each of the columns so i'm grabbing this is what that idx row 4 is or at index number 4 that is the column name so what i'm going to do is then so i can look here and i can see when i'm done print out idx names and so i can see here that i have one and it's got two different columns so basically for each one of those i'm going to do four name in idx names and in this case let's leave that one there for now what i want to do is i don't want it to be here what i want to do is for each of the names in index names i am going to do for column name in name so now this is that array i am going to do uh let's do let's see nate let's do column string let's do column list oh i can probably do uh i may be able to do this a little easier because i can probably say take uh so for each of the names i'm going to do i want to do idx names name and let's see what that does and let's convert that to a string i don't know if it's gonna even like that but let's see what happens because i think what i want to do is actually do that join because i bet i can do that so if i do this uh here we go okay so he's going to come in and he said it's not quite what i want so i want i think i can just do join can i do that let's see what that looks like uh see join takes exactly one argument let's try this oops nope let's go look up our join real quick because i think that's what we want to do so we're going to do a python join i just forget how to do it right which sometimes happens okay so join is going to be oh uh i do it something and then join it okay so i should be able to take um comma dot join this let's see what that looks like oh no it doesn't like that um interesting so that's not quite going to get us what we need so i'm going to have to do this a little more interesting okay and we have to go with my original approach which is basically going to be okay so then uh for each of the names then i'm going to do 4 call name in whoops idx names well let's do this call name let's just see what it does because i think it's not going to like that or it's not going to care because think name is yeah so it takes that whole so it can't be a name it can be in idx names name so the dictionary is not callable so it is a dictionary object so let's do so what i'm missing is something so let's use python iterate dictionary i need to write through dictionary here we go oh so here's what we're going to do for here's what we want to do we want to do for that dot keys and it is a function yep okay so we're gonna take for each of the keys then we're gonna do uh for column name in idx names and that's going to be a name so now we should see this work uh it is not because we're saying that this is not callable so this is on 102. so let's do a string oops wait it's not that way come on mistake darn it that's not it so what happens when you put the wrong stuff in there so now let's try it okay there we go let's see so it came in and it gave us two columns so that's gonna be column name so we're gonna do um let's do column let's do it this way we'll build it out like we normally have so column name equals nothing and then we're going to say if column name equals nothing then column name equals um i call it column names equals column name so if he's empty and we're good there we go else then column name equals column names plus comma plus column name oh and for each of these um we want to go ahead and do our little brackety thing so we're going to do this and do a little that just to protect our name and so then uh in that case he's gonna do that plus that and then when we're done this is gonna be this can be column names and let's see how that looks did i save it hopefully i did there we go so create index on there like that let's see if that works and so if we come through here and run it now we're going to do through each one we're going to create it and we're going to quit so let's see if we can do that we created the index so now if we go to our local and tutorial and we show indexes boom so if we look here this is our two row well let's just take the whole thing so if we look at the whole thing now if we look at that and then we go back and look at that from our source uh here we should see effectively the same thing uh oops that's great i don't want to create index i want show indexes and we're gonna see here let's do it this way so what we're going to see is the exceptions or the cardinality but we can see here oh but we can see here there's a difference between a primary versus an id so we're probably going to want to look into that but they're all b trees uh there we've got yeses here for the nulls see the collation is a for everybody column names are the same index sequence index is the same so we're good and then non-unique is the same so the one thing we may want to look at is the uh primary uh is that right yeah so key name being primary versus id so we would just want to be able to select our what our primary keys are that we wanted to set for indexes but now what we have is we do have the ability to go in and take any index that exists in our source we're now going to run through that and that will take care of it so we can actually run our whole database i'm not going to worry about syncing rows because well let's do that i don't think it matters that much so if i do like up to ten thousand rows and i sync it and then sync the columns which i probably want to send columns first to be fair and i probably want to do indexes last because there may be situations where there's something a little funky and we've got to have the data correct so let's see if that works if i run that through then it looks like we are getting some inserts here and hopefully it's not going to blow up and we will see that we went in oh i've got this extra little print stuff that i don't need um so let's see those men index is missing uh where did i just start start miss so if i go into missing indexes let's clean that up i've got an extra print somewhere in here i don't need prior column because i'm not using it here we go index names there we go so now if i use it it should run through blow out all my data and boom i'm ready to go so i'm going to kick this off i'm going to set it back to zero which will now get all of my data should all be synced including everything um all of my uh you know indexes and stuff like that we do want to look at um i think i've got a couple that i want to double check like i've got some unique and a couple other indexes that are worth double checking to make sure that they get created properly and we will run into those next time around because i think this is a good time to wrap this one up as our database is getting loaded so i'll get to as always you know you can check this out there's links in the show notes so you can hop on to github and take a look at some of this code um i don't think all the data's there but it's you can poke it at whatever pointed at whatever database you want to use and synchronize it to ideally like a local copy of it and see how that goes for you there are other issues that we still have to deal with we are not done with the complexities of doing such a thing particularly and i think we'll probably tackle this next time uh well it depends on how fast we can get through the indexes we'll probably do one more take a look at indexes unique and such and then um we're going to try to dig into some of this some of the issues that we can run into when we limit our max rows and what can cause problems there and some of our cleanup that being said we'll wrap this one up so go out there and have yourself a great day a great week and we will talk to you next time you
Transcript Segments
[Music]
well hello and welcome back we're
continuing our series looking at sql and
python so combined into this database
synchronization thing
and last episode we wrapped up
we're getting into
multi-column indexes and looking at how
to sync those so for example
in this case we've got one it's called
horse underscore b underscore o
which is on the horses table and it ties
breeder and owner together
and so what we need to be able to do is
what we're going to want to be able to
create
is effectively
this line is we want to be able to do
create index give it the name on the
table and then we're going to give it
the two
what do i call it owner breeder i'm
sorry owner yeah
and we're going to give it the two
columns
now
in order to do so
what we're going to do and this is where
we
paused last time because we went a
little bit long
so we're going to do is we're coming
through each of the source indexes
we're going to grab the name which would
be horse underscore b underscore o
and then we've got this destinations
dest names
array
and we're going to say if it's not in
there
if the name is if we haven't already
found this
column
or this index
then we're keeping track of index names
so we're going to create an array called
index names which already exists up
there
so let's do it this way i should be able
to do it this way let's see
let's try it like that
and it doesn't like in a pin somewhere
there
um
list indices must be indexed oh
well let's do that
and if i do that then i should still be
okay
nope oh because it's a list indices so
if i do that
oh
which is fine because what we do is i'm
going to create this
and i'm going to add it okay i'm good
uh sorry it's a little bit sidetracked
there so
i've got this dictionary which is going
to be keyed off of the index name and
then for each of those i'm going to have
an array that is going to be
each of the
columns so i'm grabbing this is what
that idx row 4 is
or at index number 4 that is the column
name so what i'm going to do is then so
i can look here and i can see
when i'm done print out idx names and so
i can see here
that i have one
and it's got two different columns so
basically for each one of those i'm
going to do four name in idx names
and in this case
let's leave that one there for now what
i want to do is i don't want it to be
here what i want to do is for each of
the names in index names i am going to
do for
column name
in
name
so now this is that array
i am going to do uh let's do let's see
nate let's do
column
string let's do column list
oh i can probably do
uh i may be able to do this a little
easier because i can probably say take
uh so for each of the names i'm going to
do
i want to do
idx names
name
and let's see what that does
and let's convert that to a string i
don't know if it's gonna even like that
but let's see what happens because i
think what i want to do is actually do
that join
because i bet i can do that so if i do
this
uh here we go okay
so he's going to come in and he said
it's not quite what i want
so i want
i think i can just do
join
can i do that
let's see what that looks like
uh see join takes exactly one argument
let's try this
oops nope
let's go look up our join real quick
because i think that's what we want to
do so we're going to do a python join i
just forget how to do it right which
sometimes happens
okay so join is going to be oh
uh i do it something and then join it
okay
so i should be able to take um
comma
dot join this
let's see what that looks like
oh no it doesn't like that
um
interesting
so that's not quite going to get us what
we need so i'm going to have to do this
a little more interesting okay and we
have to go with my original approach
which is basically going to be okay so
then
uh for each of the names then i'm going
to do 4
call name in
whoops
idx names
well let's do this
call name let's just see what it does
because i think
it's not going to like that or it's not
going to care because think name is
yeah so it takes that whole
so it can't be a name it can be in
idx names
name
so the dictionary is not callable so it
is a dictionary object
so let's do
so
what i'm missing is something so let's
use python
iterate dictionary
i need to write through dictionary here
we go
oh so here's what we're going to do for
here's what we want to do
we want to do
for that dot keys
and it is a function yep okay so we're
gonna take for each of the keys
then we're gonna do uh for column name
in
idx names
and that's going to be
a name
so now
we should see this work
uh it is not because we're saying that
this is not callable so this is on
102.
so let's do a string
oops wait it's not that way come on
mistake
darn it that's not it
so what happens when you put the wrong
stuff in there so now let's try it
okay there we go
let's see so it came in and it gave us
two columns so
that's gonna be column name so we're
gonna do
um
let's do column let's do it this way
we'll build it out like we normally have
so column name
equals nothing
and then we're going to say if
column name equals nothing
then
column name equals
um
i call it column names
equals
column name
so if he's empty and we're good
there we go
else
then column name equals column names
plus
comma
plus column name oh and for each of
these
um
we want to go ahead and do our little
brackety thing so we're going to do this
and do a little
that just to protect our name and so
then
uh in that case he's gonna do that
plus
that
and then when we're done this is gonna
be this can be column names
and let's see how that looks
did i save it hopefully i did
there we go
so create index on there
like that let's see if that works
and so if we come through here and run
it
now we're going to do through each one
we're going to create it
and we're going to quit so let's see if
we can do that
we created the index so now if we go to
our local
and
tutorial
and we show indexes boom
so if we look here
this is our two row well let's just take
the whole thing
so if we look at the whole thing now
if we look at that
and then we go back and look at that
from our source
uh
here
we should see effectively the same thing
uh oops that's great i don't want to
create index
i want show indexes
and we're gonna see here
let's do it this way
so what we're going to see is
the exceptions or the cardinality
but we can see here oh but we can see
here there's a difference between a
primary versus an id
so we're probably going to want to look
into that but they're all b trees
uh there we've got yeses here for the
nulls
see the collation is a for everybody
column names are the same
index sequence index is the same
so we're good and then non-unique is the
same
so the one thing we may want to look at
is the
uh
primary uh is that right
yeah so key name being primary versus id
so we would just want to be able to
select our
what our primary keys are that we wanted
to set for indexes but now what we have
is we do have the ability to go in
and take any index that exists in our
source
we're now going to run through that and
that will take care of it so we can
actually run our whole database
i'm not going to worry about syncing
rows because
well let's do that
i don't think it matters that much so if
i do like up to ten thousand rows
and i sync it and then sync the columns
which
i probably want to send columns first
to be fair
and i probably want to do
indexes
last
because there may be situations where
there's something a little funky and
we've got to have the data correct
so let's see if that works
if i run that through
then it looks like
we are getting some inserts here
and hopefully it's not going to blow up
and we will see
that we went in
oh i've got this extra little print
stuff that i don't need
um so let's see those men index is
missing
uh where did i just start start miss so
if i go into missing indexes
let's clean that up i've got an extra
print somewhere in here
i don't need prior column because i'm
not using it
here we go index names
there we go
so now if i use it
it should run through
blow out all my data
and boom i'm ready to go so
i'm going to kick this off
i'm going to set it back to zero which
will now get all of my data
should all be synced including
everything um
all of my uh
you know indexes and stuff like that we
do want to look at
um i think i've got a couple that i want
to double check like i've got some
unique and a couple other indexes that
are worth double checking to make sure
that they get created properly
and we will run into those next time
around because i think this is a good
time to wrap this one up as our database
is getting loaded so i'll get to as
always you know you can check this out
there's links in the show notes so you
can hop on to github and take a look at
some of this code
um i don't think all the data's there
but it's you can poke it at whatever
pointed at whatever database you want to
use
and synchronize it to
ideally like a local
copy of it and see how that goes for you
there are other issues that we still
have to deal with we are not done with
the
complexities of doing such a thing
particularly
and i think we'll probably tackle this
next time
uh well it depends on how fast we can
get through
the indexes we'll probably do one more
take a look at indexes unique and such
and then um we're going to try to dig
into some of this some of the issues
that we can run into
when we limit our max rows and what can
cause problems there and some of our
cleanup
that being said we'll wrap this one up
so go out there and have yourself a
great day a great week
and we will talk to you
next time
you