📺 Develpreneur YouTube Episode

Video + transcript

SQL Sync Lessons Learned Part 1

2022-12-20 •Youtube

Detailed Notes

After our series of building a python application to synchronize databases at the schema and data level we have lessons learned. There are areas where we need to be intentional as we decide how to maintain and enhance our databases. This is part 1.

Learn More at https://develpreneur.com

Transcript Text
thank you
[Music]
so this presentation is a little bit of
a actually as we talked earlier about it
it's almost a little bit of a
retrospective I guess in a sense but
it's also more like a probably more like
a Lessons Learned
in recent months now
the uh the Youtube tutorials there was a
there's a series that was python
and then there was a series that was SQL
and in the last two months if you
haven't been following there's a those
sort of got combined so it was turned
into actually a tool that I had for
years thought I wanted to build mostly
for myself
which would be one that is allows you to
point essentially to two databases and
do some sort of synchronization
and we'll talk about some of the issues
and why that's useful and things like
that as we get into this
and so that's what I'm going to talk
about today and this is so this came out
of uh the Project's still going the the
tutorials are still getting done uh I'm
getting close to the sort of wrapping it
up or getting some of the key things
done and it'll just be you know clean up
and stuff like that
but it's it's uh the short of it is it's
a python application that uses uh that
works with mySQL databases and is built
to synchronize between two different
servers and sort of talk a little bit
about that and some of the lessons
learned and some of these are things
that are
I think useful to consider particularly
in the world where you've got a like a
development database versus a train of
staging or training or testing versus
production and moving things from one to
another and making sure that you get all
of the potential changes and differences
that you need to you know properly
promoted up the system to production
eventually
so we'll start out with a little bit
about the the problem definition go a
little more to that and then we're going
to talk about some of the key things
that are key areas of databases talking
about tables store procedures and
functions indexes and keys and then data
itself within that you know there's some
other things that are out there like
triggers and things and constraints and
things like that but they more or less
fall under you know some of the same
considerations of either store procedure
and functions or indexes and keys
so the application goal
the whole reason that this came about is
that
it is not uncommon I think all of us
have probably run into a situation where
we need sort of like a a shadow or uh
mirrored database for testing or
development purposes and I say Shadow
versus like a replication because
sometimes it's you know it's nice and we
can completely replicate our production
development data or production database
to development and so it's an Apples to
Apples 100 the same kind of comparison
in some places even do like you know
nightly
um basically synchronizations where they
just basically dump the production
database and then load it into a
development place somewhere
and that's not always possible that it
sometimes is but particularly when you
get large databases or production data
that is uh has some sort of you know
requirements for people you know for it
to be secured and things like that you
don't want that data all over the place
it's just more stuff that you have to
secure
so if you can't use the production
database there are things you can do
like you know you can create some test
records or random data but
there is sort of a need or a use for
uh production we'll call production data
because sometimes it's not useful to
have uh like if your users in the
database are a user one user two user
three user four you really need to have
um a variation in like lengths and types
so it may be
um you know so instead you've got like
Jim Smith and then you have uh babalusky
Smith Brown Jones or you know with
hyphenated names and things like that
that are
more representative of production data
that feels like looks like what it looks
like in production
and so when you combine the needs of
data in particular with ongoing
development then there's a lot of other
considerations because the if it's
ongoing development then there may be
changes to the table definitions you may
have new columns or different columns or
different types or you may be if you've
got stuff that is uh using store
procedures or functions there may be you
know steady changes to that code there
may be relationships and indexes and
things like that that are built as part
of your development efforts and those
things need to be
changed or or like promoted up or
sometimes you want to roll back because
you want to be able to say
I'm testing something or validating
something and it works fine in my
development environment but it does it
when it's in production and so I want a
way to sort of you know pull pieces of
production and use that to create a
a valid test environment and it's not
trivial like I said these databases
could be huge they can be very complex
and sometimes size alone is just too
much you know if you've got terabytes of
data doing a nightly restore down to a
the development machine can just be way
too time consuming and require a lot of
resources that you you know you don't
have it's like you may not have an
enough of a
development resources to have a
multi-terabyte database as part of what
you you have there
so dumps and refreshes are
um useful and sometimes are exactly what
you need and in a lot of cases that's
sort of what we we work with but they're
not always feasible we also have
situations where we don't have a way to
spin up a development environment you
basically have to point to the
production database because they don't
have scripts they don't have history
they don't have anything for you to
create a a small data a uh a clean or a
fresh database all they have is you just
pull the whole thing down that stuff
could be really time consuming and
annoying
and then you know it's just general like
if you didn't create and maintain and
use proper processes building your
database then move creating a fresh or
copy of that version can be very
difficult and that's part of what you
know all of that is what sort of falls
under the goals of this application some
of the things problems that we want to
solve
so let's start at the
table level some of the things we have
to think about if we want to match up
tables from one environment to another
and
it's really if you think through it it's
it for the most part is fairly logical
and common sensical
um you have to worry about columns you
want to worry about whether the you know
you have new columns so I started with
uh three feet like I had a uh let's say
an address in production and realized
that I needed to add somewhere along the
way we were like oh let's add State all
right let's add country and I didn't get
that you know percolated down to all my
development areas because it was
something that was a a hot fix or like
you know Michael's working on it and he
pushed all that up but my local Dev
database I didn't get that picked up and
maybe we don't have quite the sequel uh
you know Version Control for me to be
able to see that
so I I need a way to know hey there's a
new column
um and this could include not only like
a new column but then changes to
existing ones so the address column goes
from 40 characters to 80 characters I
need to know that or maybe we went from
uh you know more complicated maybe we
went from address type was a four letter
character field and now it's been turned
into an integer that's actually an ID
look up into another table
which can that sort of gets into indexes
and foreign Keys you know maybe there's
some Performance Tuning stuff that was
done so we had to create a couple
indexes to help you know utilize the
help with Performance Tuning uh and
there may be foreign Keys you know for
example maybe we uh we're normalizing
stuff so we had
some data and we looked at it and there
was really only three values that were
there but they were big and clunky to
record so instead we said hey let's just
pull that out to a lookup table you know
across a little foreign foreign key
reference that says you know each of
those now is just a number in there
instead of you know some big long string
and then the the interesting one that
becomes a challenge and one of those
things that you almost have to punt on
would be like column name changes
because if I change an a column from
address to address one
it's I mean you can because basically
what's going to happen is you're
probably going to look at it and say oh
column address got deleted and address
one got inserted
from a logic point of view you may need
to know that address is address or
address one is what address used to be
from a synchronization point of view
then
uh there is really no such thing as a
change every change in that case would
be a you know from a column name because
that's the only way that you can
identify a column is by name because
otherwise it could be you know the same
type and size and all that could just be
two could be two completely different
columns
so in that case you would need to
understand that A change is really it's
a the delete in an ad as opposed to a
direct change
so you're normally basically that that
leads to with tables it is
really three things there is deletes so
this column doesn't exist or should not
exist in my local table
there's ads this column doesn't exist in
my local table and it should and the
other thing is changes which would be
the column with this name exists in my
table but it has mismatch it has a
different type or size or constraints or
default values so I need to deal with
that
procedures and functions
are a little more complicated a little a
little a little in some ways more
complicated some less
procedures and functions typically and
particularly in this case we're talking
about my sequel
the name and the owner are unique for
the schema so
and that would be things like so I've
got a function that is part of a schema
and by owner I mean really it's the
schema so if I've got my developer nor
schema I could have a add one function
that is the development or add one but
then I could also have uh milash
Consulting database and it could have an
add one and those are two different
store procedures or functions
so that's sort of how our uh how we
match from one to another within a
within a database server within schemas
and then when you're looking at
differences from your source to your
target one of the things is you know
sort of obvious I think is the signature
you may have
um if it's a different function name or
a different store procedure name then
it's the same thing we ran into with
columns and a table it's not really uh
you didn't change the name you dropped
one and you created one with this
different name so there's no you don't
have to in the synchronization link the
the prior and the current names
but the signatures can change so that
add one procedure I mentioned maybe
initially it had one variable or and
then later I have two variables uh maybe
now instead of a return there's an out
value out variable
um maybe the type change so instead of
bringing in a store uh varchar I'm
bringing in an integer
and sizes you know instead of uh four
characters now it's been opened up to 10
characters so signature can change
significantly
um likewise is your returns uh like for
example particularly for a function is
that the value and the type may be
different so you may have a different
um
you know maybe before it was an integer
and then you realize oh wait it needs to
be afloat you know or something like
that
and then of course there's the source
code itself that can change the you know
actually what that function or
procedures does
and then there's permissions that really
applies to all the things that could be
tables and everything else but say who
can who owns it who can run it who can
delete it insert it update it things of
that nature
so these are all considerations you need
to think about when you're syncing these
things up now for Simplicity the nice
thing is that most of what we need to do
is not really so much
um a compare other than when you get
into like signatures and data types and
values because those may be the same
name and there's a lit and it's just
basically I guess and even then you
don't have to necessarily compare it
because what you could do is just say
all right I'm going to if there's any
difference
which I guess would be a compare if
there's any difference then I'm just
going to take the source
um definition of it and make that the
target definition
[Music]
Transcript Segments
10.7

thank you

18.89

[Music]

27.66

so this presentation is a little bit of

30.9

a actually as we talked earlier about it

33.239

it's almost a little bit of a

35.12

retrospective I guess in a sense but

37.5

it's also more like a probably more like

39.6

a Lessons Learned

41.04

in recent months now

44.239

the uh the Youtube tutorials there was a

49.26

there's a series that was python

52.26

and then there was a series that was SQL

54.239

and in the last two months if you

55.92

haven't been following there's a those

57.539

sort of got combined so it was turned

60.42

into actually a tool that I had for

62.539

years thought I wanted to build mostly

66.24

for myself

68.159

which would be one that is allows you to

70.5

point essentially to two databases and

73.02

do some sort of synchronization

75.119

and we'll talk about some of the issues

78.299

and why that's useful and things like

80.04

that as we get into this

81.96

and so that's what I'm going to talk

82.979

about today and this is so this came out

85.14

of uh the Project's still going the the

87.54

tutorials are still getting done uh I'm

91.08

getting close to the sort of wrapping it

94.619

up or getting some of the key things

96.24

done and it'll just be you know clean up

98.22

and stuff like that

99.9

but it's it's uh the short of it is it's

103.259

a python application that uses uh that

106.799

works with mySQL databases and is built

109.14

to synchronize between two different

111.24

servers and sort of talk a little bit

113.28

about that and some of the lessons

114.42

learned and some of these are things

116.04

that are

117.72

I think useful to consider particularly

120.659

in the world where you've got a like a

123.06

development database versus a train of

125.159

staging or training or testing versus

127.86

production and moving things from one to

132.239

another and making sure that you get all

135

of the potential changes and differences

137.459

that you need to you know properly

139.5

promoted up the system to production

141.72

eventually

143.64

so we'll start out with a little bit

145.86

about the the problem definition go a

147.72

little more to that and then we're going

148.8

to talk about some of the key things

150.36

that are key areas of databases talking

153.18

about tables store procedures and

155.28

functions indexes and keys and then data

158.459

itself within that you know there's some

161.58

other things that are out there like

163.08

triggers and things and constraints and

164.94

things like that but they more or less

167.28

fall under you know some of the same

169.379

considerations of either store procedure

171.599

and functions or indexes and keys

175.019

so the application goal

177.48

the whole reason that this came about is

181.26

that

182.16

it is not uncommon I think all of us

184.44

have probably run into a situation where

185.879

we need sort of like a a shadow or uh

189.9

mirrored database for testing or

192.18

development purposes and I say Shadow

194.28

versus like a replication because

196.2

sometimes it's you know it's nice and we

198.9

can completely replicate our production

202.099

development data or production database

204.84

to development and so it's an Apples to

207.659

Apples 100 the same kind of comparison

210.3

in some places even do like you know

213.42

nightly

215.34

um basically synchronizations where they

217.019

just basically dump the production

218.159

database and then load it into a

220.2

development place somewhere

222.239

and that's not always possible that it

225.84

sometimes is but particularly when you

227.34

get large databases or production data

230.7

that is uh has some sort of you know

233.159

requirements for people you know for it

234.599

to be secured and things like that you

236.58

don't want that data all over the place

237.959

it's just more stuff that you have to

240

secure

241.86

so if you can't use the production

243.42

database there are things you can do

244.799

like you know you can create some test

247.379

records or random data but

251.159

there is sort of a need or a use for

255.36

uh production we'll call production data

258.18

because sometimes it's not useful to

260.1

have uh like if your users in the

262.44

database are a user one user two user

264.3

three user four you really need to have

268.02

um a variation in like lengths and types

271.08

so it may be

272.94

um you know so instead you've got like

274.32

Jim Smith and then you have uh babalusky

277.979

Smith Brown Jones or you know with

280.44

hyphenated names and things like that

282.12

that are

283.44

more representative of production data

287.46

that feels like looks like what it looks

290.759

like in production

292.5

and so when you combine the needs of

295.259

data in particular with ongoing

298.08

development then there's a lot of other

299.759

considerations because the if it's

302.1

ongoing development then there may be

303.84

changes to the table definitions you may

306.54

have new columns or different columns or

308.16

different types or you may be if you've

310.5

got stuff that is uh using store

312.72

procedures or functions there may be you

315.3

know steady changes to that code there

318.18

may be relationships and indexes and

320.34

things like that that are built as part

322.38

of your development efforts and those

326.28

things need to be

327.68

changed or or like promoted up or

331.62

sometimes you want to roll back because

333.479

you want to be able to say

334.979

I'm testing something or validating

336.78

something and it works fine in my

338.94

development environment but it does it

340.919

when it's in production and so I want a

343.32

way to sort of you know pull pieces of

345.96

production and use that to create a

349.74

a valid test environment and it's not

352.86

trivial like I said these databases

355.02

could be huge they can be very complex

357.02

and sometimes size alone is just too

360.36

much you know if you've got terabytes of

362.28

data doing a nightly restore down to a

366.18

the development machine can just be way

368.4

too time consuming and require a lot of

371.479

resources that you you know you don't

373.62

have it's like you may not have an

375.96

enough of a

377.9

development resources to have a

380.22

multi-terabyte database as part of what

382.56

you you have there

384.3

so dumps and refreshes are

387.66

um useful and sometimes are exactly what

391.5

you need and in a lot of cases that's

393.24

sort of what we we work with but they're

394.979

not always feasible we also have

397.139

situations where we don't have a way to

401.28

spin up a development environment you

404.1

basically have to point to the

406.74

production database because they don't

408.36

have scripts they don't have history

409.919

they don't have anything for you to

412.259

create a a small data a uh a clean or a

417.12

fresh database all they have is you just

419.52

pull the whole thing down that stuff

421.86

could be really time consuming and

423.539

annoying

425.22

and then you know it's just general like

427.62

if you didn't create and maintain and

430.08

use proper processes building your

431.58

database then move creating a fresh or

436.38

copy of that version can be very

439.02

difficult and that's part of what you

441.539

know all of that is what sort of falls

444.12

under the goals of this application some

446.039

of the things problems that we want to

447.66

solve

449.88

so let's start at the

451.62

table level some of the things we have

454.199

to think about if we want to match up

457.44

tables from one environment to another

460.74

and

462.06

it's really if you think through it it's

464.039

it for the most part is fairly logical

466.56

and common sensical

469.44

um you have to worry about columns you

471.12

want to worry about whether the you know

472.62

you have new columns so I started with

474.78

uh three feet like I had a uh let's say

477.66

an address in production and realized

480.9

that I needed to add somewhere along the

482.58

way we were like oh let's add State all

484.919

right let's add country and I didn't get

487.74

that you know percolated down to all my

490.44

development areas because it was

492.479

something that was a a hot fix or like

494.94

you know Michael's working on it and he

496.979

pushed all that up but my local Dev

498.72

database I didn't get that picked up and

501

maybe we don't have quite the sequel uh

503.58

you know Version Control for me to be

506.039

able to see that

507.539

so I I need a way to know hey there's a

511.44

new column

513.06

um and this could include not only like

515.039

a new column but then changes to

516.539

existing ones so the address column goes

519.659

from 40 characters to 80 characters I

522

need to know that or maybe we went from

524.7

uh you know more complicated maybe we

526.86

went from address type was a four letter

530.399

character field and now it's been turned

532.68

into an integer that's actually an ID

534.66

look up into another table

537.12

which can that sort of gets into indexes

539.7

and foreign Keys you know maybe there's

541.32

some Performance Tuning stuff that was

543.72

done so we had to create a couple

545.54

indexes to help you know utilize the

549.66

help with Performance Tuning uh and

552

there may be foreign Keys you know for

553.44

example maybe we uh we're normalizing

556.38

stuff so we had

558.06

some data and we looked at it and there

559.8

was really only three values that were

561.72

there but they were big and clunky to

563.279

record so instead we said hey let's just

566.459

pull that out to a lookup table you know

568.74

across a little foreign foreign key

570.66

reference that says you know each of

572.7

those now is just a number in there

574.14

instead of you know some big long string

577.82

and then the the interesting one that

580.74

becomes a challenge and one of those

582.3

things that you almost have to punt on

583.98

would be like column name changes

586.26

because if I change an a column from

589.26

address to address one

592.26

it's I mean you can because basically

595.14

what's going to happen is you're

596.22

probably going to look at it and say oh

597.42

column address got deleted and address

600.24

one got inserted

603.18

from a logic point of view you may need

606.18

to know that address is address or

608.459

address one is what address used to be

610.98

from a synchronization point of view

613.68

then

615.72

uh there is really no such thing as a

618.06

change every change in that case would

620.76

be a you know from a column name because

622.74

that's the only way that you can

624.2

identify a column is by name because

627.72

otherwise it could be you know the same

629.82

type and size and all that could just be

631.68

two could be two completely different

633.12

columns

634.26

so in that case you would need to

635.88

understand that A change is really it's

637.62

a the delete in an ad as opposed to a

641.399

direct change

645.779

so you're normally basically that that

648.24

leads to with tables it is

652.44

really three things there is deletes so

655.74

this column doesn't exist or should not

658.38

exist in my local table

661.98

there's ads this column doesn't exist in

664.68

my local table and it should and the

666.779

other thing is changes which would be

669.079

the column with this name exists in my

672

table but it has mismatch it has a

673.92

different type or size or constraints or

676.2

default values so I need to deal with

678.899

that

680.66

procedures and functions

683.36

are a little more complicated a little a

686.339

little a little in some ways more

688.14

complicated some less

690.36

procedures and functions typically and

693.06

particularly in this case we're talking

694.26

about my sequel

695.88

the name and the owner are unique for

698.94

the schema so

701.88

and that would be things like so I've

703.92

got a function that is part of a schema

706.62

and by owner I mean really it's the

708.06

schema so if I've got my developer nor

710.94

schema I could have a add one function

715.1

that is the development or add one but

717.959

then I could also have uh milash

720.72

Consulting database and it could have an

723

add one and those are two different

725.339

store procedures or functions

728.1

so that's sort of how our uh how we

731.64

match from one to another within a

733.98

within a database server within schemas

737.76

and then when you're looking at

739.56

differences from your source to your

741.959

target one of the things is you know

744.06

sort of obvious I think is the signature

745.5

you may have

747.42

um if it's a different function name or

750.18

a different store procedure name then

752.1

it's the same thing we ran into with

754.079

columns and a table it's not really uh

757.62

you didn't change the name you dropped

759.839

one and you created one with this

762.42

different name so there's no you don't

764.1

have to in the synchronization link the

767.22

the prior and the current names

770.339

but the signatures can change so that

773.16

add one procedure I mentioned maybe

776.22

initially it had one variable or and

779.459

then later I have two variables uh maybe

781.8

now instead of a return there's an out

783.66

value out variable

786.48

um maybe the type change so instead of

788.16

bringing in a store uh varchar I'm

791.339

bringing in an integer

793.2

and sizes you know instead of uh four

795.3

characters now it's been opened up to 10

797.7

characters so signature can change

799.8

significantly

802.56

um likewise is your returns uh like for

805.68

example particularly for a function is

807.899

that the value and the type may be

809.399

different so you may have a different

812.579

um

813.54

you know maybe before it was an integer

815.7

and then you realize oh wait it needs to

817.2

be afloat you know or something like

818.82

that

819.66

and then of course there's the source

821.22

code itself that can change the you know

823.5

actually what that function or

824.82

procedures does

827.399

and then there's permissions that really

830.16

applies to all the things that could be

831.839

tables and everything else but say who

833.82

can who owns it who can run it who can

836.94

delete it insert it update it things of

839.519

that nature

840.959

so these are all considerations you need

843.24

to think about when you're syncing these

845.399

things up now for Simplicity the nice

848.279

thing is that most of what we need to do

850.44

is not really so much

853.079

um a compare other than when you get

856.019

into like signatures and data types and

858.06

values because those may be the same

859.68

name and there's a lit and it's just

863.04

basically I guess and even then you

864.839

don't have to necessarily compare it

866.04

because what you could do is just say

867.6

all right I'm going to if there's any

870.24

difference

871.38

which I guess would be a compare if

873.06

there's any difference then I'm just

874.8

going to take the source

877.5

um definition of it and make that the

880.5

target definition

882.88

[Music]