Detailed Notes
After our series of building a python application to synchronize databases at the schema and data level we have lessons learned. There are areas where we need to be intentional as we decide how to maintain and enhance our databases. This is part 1.
Learn More at https://develpreneur.com
Transcript Text
thank you [Music] so this presentation is a little bit of a actually as we talked earlier about it it's almost a little bit of a retrospective I guess in a sense but it's also more like a probably more like a Lessons Learned in recent months now the uh the Youtube tutorials there was a there's a series that was python and then there was a series that was SQL and in the last two months if you haven't been following there's a those sort of got combined so it was turned into actually a tool that I had for years thought I wanted to build mostly for myself which would be one that is allows you to point essentially to two databases and do some sort of synchronization and we'll talk about some of the issues and why that's useful and things like that as we get into this and so that's what I'm going to talk about today and this is so this came out of uh the Project's still going the the tutorials are still getting done uh I'm getting close to the sort of wrapping it up or getting some of the key things done and it'll just be you know clean up and stuff like that but it's it's uh the short of it is it's a python application that uses uh that works with mySQL databases and is built to synchronize between two different servers and sort of talk a little bit about that and some of the lessons learned and some of these are things that are I think useful to consider particularly in the world where you've got a like a development database versus a train of staging or training or testing versus production and moving things from one to another and making sure that you get all of the potential changes and differences that you need to you know properly promoted up the system to production eventually so we'll start out with a little bit about the the problem definition go a little more to that and then we're going to talk about some of the key things that are key areas of databases talking about tables store procedures and functions indexes and keys and then data itself within that you know there's some other things that are out there like triggers and things and constraints and things like that but they more or less fall under you know some of the same considerations of either store procedure and functions or indexes and keys so the application goal the whole reason that this came about is that it is not uncommon I think all of us have probably run into a situation where we need sort of like a a shadow or uh mirrored database for testing or development purposes and I say Shadow versus like a replication because sometimes it's you know it's nice and we can completely replicate our production development data or production database to development and so it's an Apples to Apples 100 the same kind of comparison in some places even do like you know nightly um basically synchronizations where they just basically dump the production database and then load it into a development place somewhere and that's not always possible that it sometimes is but particularly when you get large databases or production data that is uh has some sort of you know requirements for people you know for it to be secured and things like that you don't want that data all over the place it's just more stuff that you have to secure so if you can't use the production database there are things you can do like you know you can create some test records or random data but there is sort of a need or a use for uh production we'll call production data because sometimes it's not useful to have uh like if your users in the database are a user one user two user three user four you really need to have um a variation in like lengths and types so it may be um you know so instead you've got like Jim Smith and then you have uh babalusky Smith Brown Jones or you know with hyphenated names and things like that that are more representative of production data that feels like looks like what it looks like in production and so when you combine the needs of data in particular with ongoing development then there's a lot of other considerations because the if it's ongoing development then there may be changes to the table definitions you may have new columns or different columns or different types or you may be if you've got stuff that is uh using store procedures or functions there may be you know steady changes to that code there may be relationships and indexes and things like that that are built as part of your development efforts and those things need to be changed or or like promoted up or sometimes you want to roll back because you want to be able to say I'm testing something or validating something and it works fine in my development environment but it does it when it's in production and so I want a way to sort of you know pull pieces of production and use that to create a a valid test environment and it's not trivial like I said these databases could be huge they can be very complex and sometimes size alone is just too much you know if you've got terabytes of data doing a nightly restore down to a the development machine can just be way too time consuming and require a lot of resources that you you know you don't have it's like you may not have an enough of a development resources to have a multi-terabyte database as part of what you you have there so dumps and refreshes are um useful and sometimes are exactly what you need and in a lot of cases that's sort of what we we work with but they're not always feasible we also have situations where we don't have a way to spin up a development environment you basically have to point to the production database because they don't have scripts they don't have history they don't have anything for you to create a a small data a uh a clean or a fresh database all they have is you just pull the whole thing down that stuff could be really time consuming and annoying and then you know it's just general like if you didn't create and maintain and use proper processes building your database then move creating a fresh or copy of that version can be very difficult and that's part of what you know all of that is what sort of falls under the goals of this application some of the things problems that we want to solve so let's start at the table level some of the things we have to think about if we want to match up tables from one environment to another and it's really if you think through it it's it for the most part is fairly logical and common sensical um you have to worry about columns you want to worry about whether the you know you have new columns so I started with uh three feet like I had a uh let's say an address in production and realized that I needed to add somewhere along the way we were like oh let's add State all right let's add country and I didn't get that you know percolated down to all my development areas because it was something that was a a hot fix or like you know Michael's working on it and he pushed all that up but my local Dev database I didn't get that picked up and maybe we don't have quite the sequel uh you know Version Control for me to be able to see that so I I need a way to know hey there's a new column um and this could include not only like a new column but then changes to existing ones so the address column goes from 40 characters to 80 characters I need to know that or maybe we went from uh you know more complicated maybe we went from address type was a four letter character field and now it's been turned into an integer that's actually an ID look up into another table which can that sort of gets into indexes and foreign Keys you know maybe there's some Performance Tuning stuff that was done so we had to create a couple indexes to help you know utilize the help with Performance Tuning uh and there may be foreign Keys you know for example maybe we uh we're normalizing stuff so we had some data and we looked at it and there was really only three values that were there but they were big and clunky to record so instead we said hey let's just pull that out to a lookup table you know across a little foreign foreign key reference that says you know each of those now is just a number in there instead of you know some big long string and then the the interesting one that becomes a challenge and one of those things that you almost have to punt on would be like column name changes because if I change an a column from address to address one it's I mean you can because basically what's going to happen is you're probably going to look at it and say oh column address got deleted and address one got inserted from a logic point of view you may need to know that address is address or address one is what address used to be from a synchronization point of view then uh there is really no such thing as a change every change in that case would be a you know from a column name because that's the only way that you can identify a column is by name because otherwise it could be you know the same type and size and all that could just be two could be two completely different columns so in that case you would need to understand that A change is really it's a the delete in an ad as opposed to a direct change so you're normally basically that that leads to with tables it is really three things there is deletes so this column doesn't exist or should not exist in my local table there's ads this column doesn't exist in my local table and it should and the other thing is changes which would be the column with this name exists in my table but it has mismatch it has a different type or size or constraints or default values so I need to deal with that procedures and functions are a little more complicated a little a little a little in some ways more complicated some less procedures and functions typically and particularly in this case we're talking about my sequel the name and the owner are unique for the schema so and that would be things like so I've got a function that is part of a schema and by owner I mean really it's the schema so if I've got my developer nor schema I could have a add one function that is the development or add one but then I could also have uh milash Consulting database and it could have an add one and those are two different store procedures or functions so that's sort of how our uh how we match from one to another within a within a database server within schemas and then when you're looking at differences from your source to your target one of the things is you know sort of obvious I think is the signature you may have um if it's a different function name or a different store procedure name then it's the same thing we ran into with columns and a table it's not really uh you didn't change the name you dropped one and you created one with this different name so there's no you don't have to in the synchronization link the the prior and the current names but the signatures can change so that add one procedure I mentioned maybe initially it had one variable or and then later I have two variables uh maybe now instead of a return there's an out value out variable um maybe the type change so instead of bringing in a store uh varchar I'm bringing in an integer and sizes you know instead of uh four characters now it's been opened up to 10 characters so signature can change significantly um likewise is your returns uh like for example particularly for a function is that the value and the type may be different so you may have a different um you know maybe before it was an integer and then you realize oh wait it needs to be afloat you know or something like that and then of course there's the source code itself that can change the you know actually what that function or procedures does and then there's permissions that really applies to all the things that could be tables and everything else but say who can who owns it who can run it who can delete it insert it update it things of that nature so these are all considerations you need to think about when you're syncing these things up now for Simplicity the nice thing is that most of what we need to do is not really so much um a compare other than when you get into like signatures and data types and values because those may be the same name and there's a lit and it's just basically I guess and even then you don't have to necessarily compare it because what you could do is just say all right I'm going to if there's any difference which I guess would be a compare if there's any difference then I'm just going to take the source um definition of it and make that the target definition [Music]
Transcript Segments
thank you
[Music]
so this presentation is a little bit of
a actually as we talked earlier about it
it's almost a little bit of a
retrospective I guess in a sense but
it's also more like a probably more like
a Lessons Learned
in recent months now
the uh the Youtube tutorials there was a
there's a series that was python
and then there was a series that was SQL
and in the last two months if you
haven't been following there's a those
sort of got combined so it was turned
into actually a tool that I had for
years thought I wanted to build mostly
for myself
which would be one that is allows you to
point essentially to two databases and
do some sort of synchronization
and we'll talk about some of the issues
and why that's useful and things like
that as we get into this
and so that's what I'm going to talk
about today and this is so this came out
of uh the Project's still going the the
tutorials are still getting done uh I'm
getting close to the sort of wrapping it
up or getting some of the key things
done and it'll just be you know clean up
and stuff like that
but it's it's uh the short of it is it's
a python application that uses uh that
works with mySQL databases and is built
to synchronize between two different
servers and sort of talk a little bit
about that and some of the lessons
learned and some of these are things
that are
I think useful to consider particularly
in the world where you've got a like a
development database versus a train of
staging or training or testing versus
production and moving things from one to
another and making sure that you get all
of the potential changes and differences
that you need to you know properly
promoted up the system to production
eventually
so we'll start out with a little bit
about the the problem definition go a
little more to that and then we're going
to talk about some of the key things
that are key areas of databases talking
about tables store procedures and
functions indexes and keys and then data
itself within that you know there's some
other things that are out there like
triggers and things and constraints and
things like that but they more or less
fall under you know some of the same
considerations of either store procedure
and functions or indexes and keys
so the application goal
the whole reason that this came about is
that
it is not uncommon I think all of us
have probably run into a situation where
we need sort of like a a shadow or uh
mirrored database for testing or
development purposes and I say Shadow
versus like a replication because
sometimes it's you know it's nice and we
can completely replicate our production
development data or production database
to development and so it's an Apples to
Apples 100 the same kind of comparison
in some places even do like you know
nightly
um basically synchronizations where they
just basically dump the production
database and then load it into a
development place somewhere
and that's not always possible that it
sometimes is but particularly when you
get large databases or production data
that is uh has some sort of you know
requirements for people you know for it
to be secured and things like that you
don't want that data all over the place
it's just more stuff that you have to
secure
so if you can't use the production
database there are things you can do
like you know you can create some test
records or random data but
there is sort of a need or a use for
uh production we'll call production data
because sometimes it's not useful to
have uh like if your users in the
database are a user one user two user
three user four you really need to have
um a variation in like lengths and types
so it may be
um you know so instead you've got like
Jim Smith and then you have uh babalusky
Smith Brown Jones or you know with
hyphenated names and things like that
that are
more representative of production data
that feels like looks like what it looks
like in production
and so when you combine the needs of
data in particular with ongoing
development then there's a lot of other
considerations because the if it's
ongoing development then there may be
changes to the table definitions you may
have new columns or different columns or
different types or you may be if you've
got stuff that is uh using store
procedures or functions there may be you
know steady changes to that code there
may be relationships and indexes and
things like that that are built as part
of your development efforts and those
things need to be
changed or or like promoted up or
sometimes you want to roll back because
you want to be able to say
I'm testing something or validating
something and it works fine in my
development environment but it does it
when it's in production and so I want a
way to sort of you know pull pieces of
production and use that to create a
a valid test environment and it's not
trivial like I said these databases
could be huge they can be very complex
and sometimes size alone is just too
much you know if you've got terabytes of
data doing a nightly restore down to a
the development machine can just be way
too time consuming and require a lot of
resources that you you know you don't
have it's like you may not have an
enough of a
development resources to have a
multi-terabyte database as part of what
you you have there
so dumps and refreshes are
um useful and sometimes are exactly what
you need and in a lot of cases that's
sort of what we we work with but they're
not always feasible we also have
situations where we don't have a way to
spin up a development environment you
basically have to point to the
production database because they don't
have scripts they don't have history
they don't have anything for you to
create a a small data a uh a clean or a
fresh database all they have is you just
pull the whole thing down that stuff
could be really time consuming and
annoying
and then you know it's just general like
if you didn't create and maintain and
use proper processes building your
database then move creating a fresh or
copy of that version can be very
difficult and that's part of what you
know all of that is what sort of falls
under the goals of this application some
of the things problems that we want to
solve
so let's start at the
table level some of the things we have
to think about if we want to match up
tables from one environment to another
and
it's really if you think through it it's
it for the most part is fairly logical
and common sensical
um you have to worry about columns you
want to worry about whether the you know
you have new columns so I started with
uh three feet like I had a uh let's say
an address in production and realized
that I needed to add somewhere along the
way we were like oh let's add State all
right let's add country and I didn't get
that you know percolated down to all my
development areas because it was
something that was a a hot fix or like
you know Michael's working on it and he
pushed all that up but my local Dev
database I didn't get that picked up and
maybe we don't have quite the sequel uh
you know Version Control for me to be
able to see that
so I I need a way to know hey there's a
new column
um and this could include not only like
a new column but then changes to
existing ones so the address column goes
from 40 characters to 80 characters I
need to know that or maybe we went from
uh you know more complicated maybe we
went from address type was a four letter
character field and now it's been turned
into an integer that's actually an ID
look up into another table
which can that sort of gets into indexes
and foreign Keys you know maybe there's
some Performance Tuning stuff that was
done so we had to create a couple
indexes to help you know utilize the
help with Performance Tuning uh and
there may be foreign Keys you know for
example maybe we uh we're normalizing
stuff so we had
some data and we looked at it and there
was really only three values that were
there but they were big and clunky to
record so instead we said hey let's just
pull that out to a lookup table you know
across a little foreign foreign key
reference that says you know each of
those now is just a number in there
instead of you know some big long string
and then the the interesting one that
becomes a challenge and one of those
things that you almost have to punt on
would be like column name changes
because if I change an a column from
address to address one
it's I mean you can because basically
what's going to happen is you're
probably going to look at it and say oh
column address got deleted and address
one got inserted
from a logic point of view you may need
to know that address is address or
address one is what address used to be
from a synchronization point of view
then
uh there is really no such thing as a
change every change in that case would
be a you know from a column name because
that's the only way that you can
identify a column is by name because
otherwise it could be you know the same
type and size and all that could just be
two could be two completely different
columns
so in that case you would need to
understand that A change is really it's
a the delete in an ad as opposed to a
direct change
so you're normally basically that that
leads to with tables it is
really three things there is deletes so
this column doesn't exist or should not
exist in my local table
there's ads this column doesn't exist in
my local table and it should and the
other thing is changes which would be
the column with this name exists in my
table but it has mismatch it has a
different type or size or constraints or
default values so I need to deal with
that
procedures and functions
are a little more complicated a little a
little a little in some ways more
complicated some less
procedures and functions typically and
particularly in this case we're talking
about my sequel
the name and the owner are unique for
the schema so
and that would be things like so I've
got a function that is part of a schema
and by owner I mean really it's the
schema so if I've got my developer nor
schema I could have a add one function
that is the development or add one but
then I could also have uh milash
Consulting database and it could have an
add one and those are two different
store procedures or functions
so that's sort of how our uh how we
match from one to another within a
within a database server within schemas
and then when you're looking at
differences from your source to your
target one of the things is you know
sort of obvious I think is the signature
you may have
um if it's a different function name or
a different store procedure name then
it's the same thing we ran into with
columns and a table it's not really uh
you didn't change the name you dropped
one and you created one with this
different name so there's no you don't
have to in the synchronization link the
the prior and the current names
but the signatures can change so that
add one procedure I mentioned maybe
initially it had one variable or and
then later I have two variables uh maybe
now instead of a return there's an out
value out variable
um maybe the type change so instead of
bringing in a store uh varchar I'm
bringing in an integer
and sizes you know instead of uh four
characters now it's been opened up to 10
characters so signature can change
significantly
um likewise is your returns uh like for
example particularly for a function is
that the value and the type may be
different so you may have a different
um
you know maybe before it was an integer
and then you realize oh wait it needs to
be afloat you know or something like
that
and then of course there's the source
code itself that can change the you know
actually what that function or
procedures does
and then there's permissions that really
applies to all the things that could be
tables and everything else but say who
can who owns it who can run it who can
delete it insert it update it things of
that nature
so these are all considerations you need
to think about when you're syncing these
things up now for Simplicity the nice
thing is that most of what we need to do
is not really so much
um a compare other than when you get
into like signatures and data types and
values because those may be the same
name and there's a lit and it's just
basically I guess and even then you
don't have to necessarily compare it
because what you could do is just say
all right I'm going to if there's any
difference
which I guess would be a compare if
there's any difference then I'm just
going to take the source
um definition of it and make that the
target definition
[Music]