📺 Develpreneur YouTube Episode

Video + transcript

SQL Sync Part 8

2022-09-08 •Youtube

Detailed Notes

Focus for this episode: We continue looking into issues related to bringing in new data to our destination.

This tutorial walks through the creation of a database synchronization tool in Python. It continues our Python and SQL Tutorials with a useful application that provides a deeper look at SQL structures and commands.

Repository For Code: git clone [email protected]:rbroadhead/dbsync.git

Transcript Text
[Music]
well hello and welcome back we're
continuing our series where we're
looking at
our sql sync application which is python
and uh with this we're gonna learn some
python which we've been working through
as well as some sql
now last time around
we start digging into rows we can
take tables that don't exist in our
source and create them in our
destination we can do the same with
columns so if we've got tables
that
have extra columns in the source
and they are not in the target then we
can handle that as well
now we're moving into
data so we're going to actively work on
synchronizing rows at this point
now the way we're going to work it
with rows if we're going to start with a
truncate
and what that does is that's going to
empty out our data so if we go look
at
let's
see let's make sure
so here in my local ones i've got some
records in here
because i've pulled stuff across
what i want to do is i want to be able
to and let me
yeah let me get rid of i'm gonna get rid
of a couple tables uh so if we go to
local my sequel
and we look at tutorial i wanna take the
uh breeders horses and owners i'm going
to take those out because it's such a
large amount of uh data involved in
those
uh let's see so for it's missing columns
missing columns synchros okay so i want
to go up to sync rows
and for table and tables list
and then i'm going to say
if
table
not in
uh let's see what is this gonna be
owners
trainers
oops look at that
horses
so i really
wouldn't know i'm not normally going to
want to do this but i'm going to do it
here because
let's see and i do all the cleanups okay
there we go
so i'm going to do is i'm going to say
if the table is not one of these
then it's going to go through
and i'm going to truncate it so i'm
going to empty the table in the
destination i'm just going to say you
know what i'm going to take everything
that exists in the source
and i'm going to dump it into
the uh the destination now i'm going to
start with my insert so sort of a
typical insert
line for sql so we're going to insert
into
like for example app user and we're
going to take each of the columns much
as we've seen before
so we're going to go through we're going
to do a describe so we get each of our
columns we've seen that before
now what we're going to do here is we're
going to have this little thing to say
it's the first row if it's first row
then all we do is we're going to do the
name
surrounded by these little ticks in case
it's some sort of reserved word now
notice our base is the insert into table
or whatever the table name
and then our parentheses so we get
basically
this far
so insert into like in this case table
name space
open paren and then we're going to go
through each of the columns and we're
going to surround them by ticks and if
it's not the first column
you know if first otherwise
we're going to then
precede it with a comma so the first one
we come in we're going to do here
and that's where we finish and then when
we come down to the next column it's
going to say it's not the first column
so we're going to precede it with this
comma space
and we're going to do that all the way
out and then we're going to get out to
here
and we're going to close that
parentheses
values keyword and then open the
parentheses for the values we're going
to put in there
and now what we're going to do
is we're going to go through
uh let's see oh
i'll show you this max rose in a second
um and it's because this is because i
had some issues recording so
i came through this first time now what
we're gonna do and uh i've added some
stuff so the first thing we want to do
is we just want to get the data
so what we would want to do is we would
say like
let's do address
is we could just say select star
from address
and that's going to get us all the
records now in this case i don't think i
have any uh
oh
because i didn't do that right maybe
let's see do i have any records in here
i do and so i get my three the problem
is if i have a large number of rows like
i don't know whatever large number is if
i won't have more than 10 or more than
20 or more than 100 or a thousand or a
million whatever it happens to be
like in this case
for owners trainers and horses
um i've got like hundreds of thousands
of rows so actually let's do this let's
just go ahead and keep those in there
and
do that
and so what i'm doing is i'm going to
pass in the source
the destination the list of tables and
this max rose
and so if max rose is greater than zero
if it's zero i'm just going to do select
star i'm going to get all the rows from
the table
if it's not then i'm only going to get
the rows i'm gonna get a certain number
of rows now this is where it gets a
little tricky uh let's see if i look at
uh well let's look i think i have a lot
from horseselect
oops
count star from horses
if i do that
yeah so there's a thousand records in
there
so i can do
and this is in my local this is my
destination so if i do that
so here's all my records but what i can
do is i can come in here
and i can do a limit
and let's say i do a limit to 50
then i'm only going to get 50 rows
we've seen this before if you went
through the sequel stuff now the next
thing i want to do is this is going to
just give me the first 50.
so instead what i want to do is i want
to do an order by and i'm going to
assume its id
and so what i'm going to do is i'm going
to say order by id
first column is what i'm always going to
do i'm hoping the first column is the
primary key and i'm going to do
descending so what that says is now i'm
going to get the
in this case the top 50 ids
and
now i'm going to get those top 50 ids
and so now these are the most recent
effectively these are the most recent
ones that i've pulled in
now i could do something more
complicated i could have something that
maybe i've got some consistency so i can
find when the creation date was
or updated date but even here you can
see date created is not
it's null in all of these cases
um let's see so
now i can with doing this let's go back
over here to the source i can actually
limit how many records i bring across
what i did
most recently was a thousand so if i
come down here
max rose is a thousand yep and so i'm
going to send that in
and so that means oops sinks rose so
that means if
uh since it's greater than zero it's
going to come through and it's just
going to give the first thousand rows
let's do something a little let's do 10
000. that'll probably still be fairly
quick
oops it's not the sink
so here's the thing so i select star
from and then all i'm going to do is i'm
going to walk through each row
and because i'm doing a select star i'm
just going to go through essentially the
same way i did with the columns
where i do you know a value and then if
it's a second or beyond it's comma space
i'm going to do the same thing with the
values except for here instead of tick
marks i'm going to do singles
uh single ticks unless
i get a nun back in which case i'm going
to do a null which we've seen this
before when we were fixing um
[Music]
somewhere else
we saw this somewhere else i don't
remember where but we have seen this so
apologies if i forget specifically but
since what you do is going to say hey if
it's none
then i can't really bring
this
so i need to give it a null and then
it'll allow me to insert that record so
if i do here
uh oh it's a duplicate key so let me get
a different key just because
i don't think there's a five yet
uh oh there's a five let's get let's do
like a 100
okay so that worked but if i do 101
this is what we ran into before
there we go okay so we're moving our way
through here
we build up our values we close it off
with our uh close prints and our
semicolon uh i'm going to insert i'm
going to print the insert statement just
cuz
for like debugging purposes and let's
see
now what i can do
is i can come through
it's going to connect to my data
it's going to go in and now it's going
to go blow out a bunch of inserts
uh because i did what ten thousand
and so i'm going to get the most 10
recent
based on
order by that first row whatever it
happens to be
so i'm going to get the most recent up
to 10 000 of like in this case breeders
and it's gonna do the same things for
horses and owners
so it's gonna take a second like see
this is why we need to do this limit or
probably want to do this limit because
it's going to take a while because this
is a
a
very brute force approach
to bringing our data over and again it's
stepping on stuff because we're
truncating a table and then we're adding
this on
now what we could do is we could for
example
we could store
the last the maximum
last id from that table and then say hey
let's just grab everything that's
greater than
the id from the last time we ran it so
we're only gonna get new records
and there's i mean there's other things
we could do we could go in and we could
do like a record by record compare
but that would be
painfully slow for anything that's got
more than maybe dozens or hundreds of
records once you start getting to
thousands
particularly because we're reaching out
to different databases and doing this
one at a time
it could be
very time consuming
now a faster way to do this would be
to instead of like we're doing here
we're making all these individual calls
we could have one gargantuan statement
and do this every
um
like put together a hundred
uh inserts and then execute all 100 in
one shot and there's some things like
that we could do to speed this up
uh but right now we're gonna keep it
just sort of uh
sort of a brute force approach
uh we're getting there we're getting
through our horses at this point
i can't remember what else we have in
there
see if it's going this way they'll
probably look up type then owners let's
see it may just be going in
we'll see what order it's going through
the tables
but we're gonna go through we're gonna
do this
and we're gonna have when we're done so
we can look here like i think it's
probably already done all dates
oop
let's do that oh
it's deciding it wants to hang for a
second
oh probably because it's doing all this
okay
all dates data
yeah because it's hammering this thing
right now
yeah so i've brought my dates across uh
that's not super impressive but if you
looked at like breeders
uh
opening tab
yeah now you're gonna see so here's all
of them and we can see down here if you
look i've got my 10 000 records because
that's my limit
and so now it's pulling all this data in
if i want to do a small database then i
can do this and keep this down to you
know 100 or
a thousand or something like that so i
can pull
like effectively a test across now
there are going to be some issues that
we're going to run into
and we will talk about these as we get
further on oh here we go now it's
speeding up
um
one of them is actually the primary one
that we have to worry about are foreign
key relationships
when we try to truncate data
it's not going to allow us to do that
when there are foreign key relationships
around we've got to actually empty
tables out in a certain certain order or
go through deletes or
disable foreign keys and then build them
back after we've
populated the data
we also have to worry about when we do
partials like this where we're only
doing you know 10 000 rows
we have to worry about the situation
where we pull partial data in one table
and partial data in another but they
don't have
like one table it's referring to ids
that don't exist
because they're not part of the partial
that was pulled in another
so
while this may seem pretty simple and
straightforward to this point as far as
building tables columns and data
we're really not there yet also with
that
we may have to worry about what if we
have a change in columns
from source to destination and so we're
going to tackle some of those next time
around uh but for right now at least
we've got something that just gives us
the more in the basic sense it does
do our synchronization of
data columns and uh tables
so that being said is a good stopping
point we'll come back around next time
but as always go out there and have
yourself a great day a great week and we
will talk to you
next time
you
Transcript Segments
0.43

[Music]

27.359

well hello and welcome back we're

29.199

continuing our series where we're

31.039

looking at

32.32

our sql sync application which is python

35.92

and uh with this we're gonna learn some

37.44

python which we've been working through

39.52

as well as some sql

41.68

now last time around

43.6

we start digging into rows we can

46.64

take tables that don't exist in our

48.64

source and create them in our

50.96

destination we can do the same with

52.96

columns so if we've got tables

54.96

that

56

have extra columns in the source

58.48

and they are not in the target then we

61.44

can handle that as well

63.199

now we're moving into

65.6

data so we're going to actively work on

67.439

synchronizing rows at this point

70.64

now the way we're going to work it

72.56

with rows if we're going to start with a

74.64

truncate

76.159

and what that does is that's going to

77.36

empty out our data so if we go look

80.799

at

83.92

let's

84.84

see let's make sure

87.68

so here in my local ones i've got some

91.28

records in here

93.36

because i've pulled stuff across

95.6

what i want to do is i want to be able

97.6

to and let me

101.28

yeah let me get rid of i'm gonna get rid

102.72

of a couple tables uh so if we go to

105.84

local my sequel

108.159

and we look at tutorial i wanna take the

112.24

uh breeders horses and owners i'm going

113.92

to take those out because it's such a

116.24

large amount of uh data involved in

119.52

those

120.399

uh let's see so for it's missing columns

124.159

missing columns synchros okay so i want

126.56

to go up to sync rows

131.599

and for table and tables list

135.84

and then i'm going to say

140.8

if

141.599

table

143.84

not in

147.68

uh let's see what is this gonna be

151.519

owners

153.92

trainers

156.8

oops look at that

161.599

horses

165.84

so i really

167.599

wouldn't know i'm not normally going to

168.959

want to do this but i'm going to do it

170

here because

171.599

let's see and i do all the cleanups okay

174

there we go

175.84

so i'm going to do is i'm going to say

176.72

if the table is not one of these

179.2

then it's going to go through

181.36

and i'm going to truncate it so i'm

182.64

going to empty the table in the

183.92

destination i'm just going to say you

185.12

know what i'm going to take everything

186.159

that exists in the source

188.159

and i'm going to dump it into

190.56

the uh the destination now i'm going to

193.44

start with my insert so sort of a

195.12

typical insert

198

line for sql so we're going to insert

200.4

into

201.84

like for example app user and we're

204.159

going to take each of the columns much

205.68

as we've seen before

208.48

so we're going to go through we're going

209.68

to do a describe so we get each of our

211.2

columns we've seen that before

213.68

now what we're going to do here is we're

214.879

going to have this little thing to say

216.239

it's the first row if it's first row

218.319

then all we do is we're going to do the

221.36

name

223.12

surrounded by these little ticks in case

225.2

it's some sort of reserved word now

227.12

notice our base is the insert into table

229.68

or whatever the table name

231.519

and then our parentheses so we get

233.519

basically

234.799

this far

236

so insert into like in this case table

237.76

name space

239.04

open paren and then we're going to go

240.799

through each of the columns and we're

242.08

going to surround them by ticks and if

244

it's not the first column

245.76

you know if first otherwise

248.319

we're going to then

250.319

precede it with a comma so the first one

252.159

we come in we're going to do here

255.519

and that's where we finish and then when

257.12

we come down to the next column it's

258.32

going to say it's not the first column

259.68

so we're going to precede it with this

261.519

comma space

262.96

and we're going to do that all the way

264.16

out and then we're going to get out to

265.759

here

268.4

and we're going to close that

269.68

parentheses

271.68

values keyword and then open the

273.759

parentheses for the values we're going

275.04

to put in there

276.4

and now what we're going to do

278.16

is we're going to go through

281.44

uh let's see oh

283.36

i'll show you this max rose in a second

285.919

um and it's because this is because i

288.4

had some issues recording so

290.4

i came through this first time now what

291.919

we're gonna do and uh i've added some

294

stuff so the first thing we want to do

295.68

is we just want to get the data

297.36

so what we would want to do is we would

299.199

say like

300.479

let's do address

302.88

is we could just say select star

304.96

from address

306.72

and that's going to get us all the

307.759

records now in this case i don't think i

309.28

have any uh

312.08

oh

313.36

because i didn't do that right maybe

314.56

let's see do i have any records in here

315.919

i do and so i get my three the problem

318.4

is if i have a large number of rows like

321.36

i don't know whatever large number is if

323.199

i won't have more than 10 or more than

324.96

20 or more than 100 or a thousand or a

327.12

million whatever it happens to be

329.36

like in this case

331.199

for owners trainers and horses

334.08

um i've got like hundreds of thousands

336.4

of rows so actually let's do this let's

338.96

just go ahead and keep those in there

343.12

and

345.759

do that

347.199

and so what i'm doing is i'm going to

348.639

pass in the source

350.56

the destination the list of tables and

353.28

this max rose

355.12

and so if max rose is greater than zero

358.319

if it's zero i'm just going to do select

359.919

star i'm going to get all the rows from

361.68

the table

363.199

if it's not then i'm only going to get

366.96

the rows i'm gonna get a certain number

368.72

of rows now this is where it gets a

370.24

little tricky uh let's see if i look at

374.4

uh well let's look i think i have a lot

376.4

from horseselect

380

oops

382

count star from horses

384.88

if i do that

386.319

yeah so there's a thousand records in

387.759

there

389.52

so i can do

391.36

and this is in my local this is my

392.72

destination so if i do that

397.199

so here's all my records but what i can

399.039

do is i can come in here

402

and i can do a limit

405.84

and let's say i do a limit to 50

409.52

then i'm only going to get 50 rows

412.16

we've seen this before if you went

413.599

through the sequel stuff now the next

415.44

thing i want to do is this is going to

416.639

just give me the first 50.

418.8

so instead what i want to do is i want

421.12

to do an order by and i'm going to

423.52

assume its id

426.16

and so what i'm going to do is i'm going

427.599

to say order by id

431.199

first column is what i'm always going to

432.56

do i'm hoping the first column is the

434.16

primary key and i'm going to do

437.44

descending so what that says is now i'm

439.68

going to get the

440.88

in this case the top 50 ids

445.52

and

446.8

now i'm going to get those top 50 ids

448.56

and so now these are the most recent

450.479

effectively these are the most recent

451.84

ones that i've pulled in

453.44

now i could do something more

454.4

complicated i could have something that

456

maybe i've got some consistency so i can

458.319

find when the creation date was

460.4

or updated date but even here you can

462.8

see date created is not

465.199

it's null in all of these cases

469.68

um let's see so

472.56

now i can with doing this let's go back

474.4

over here to the source i can actually

476.4

limit how many records i bring across

478.8

what i did

479.919

most recently was a thousand so if i

481.84

come down here

484.56

max rose is a thousand yep and so i'm

486.72

going to send that in

488.72

and so that means oops sinks rose so

491.36

that means if

494

uh since it's greater than zero it's

495.52

going to come through and it's just

496.4

going to give the first thousand rows

498

let's do something a little let's do 10

499.68

000. that'll probably still be fairly

501.759

quick

505.039

oops it's not the sink

507.12

so here's the thing so i select star

508.56

from and then all i'm going to do is i'm

509.919

going to walk through each row

512.479

and because i'm doing a select star i'm

514.719

just going to go through essentially the

516

same way i did with the columns

518.8

where i do you know a value and then if

521.36

it's a second or beyond it's comma space

523.599

i'm going to do the same thing with the

525.04

values except for here instead of tick

526.8

marks i'm going to do singles

529.44

uh single ticks unless

533.12

i get a nun back in which case i'm going

535.44

to do a null which we've seen this

536.88

before when we were fixing um

539.34

[Music]

540.72

somewhere else

542.8

we saw this somewhere else i don't

544.48

remember where but we have seen this so

546.48

apologies if i forget specifically but

549.44

since what you do is going to say hey if

551.12

it's none

552.16

then i can't really bring

555.6

this

557.839

so i need to give it a null and then

559.76

it'll allow me to insert that record so

561.6

if i do here

563.519

uh oh it's a duplicate key so let me get

565.6

a different key just because

568.64

i don't think there's a five yet

571.44

uh oh there's a five let's get let's do

574

like a 100

577.04

okay so that worked but if i do 101

583.04

this is what we ran into before

585.2

there we go okay so we're moving our way

587.04

through here

588.56

we build up our values we close it off

591.2

with our uh close prints and our

595.36

semicolon uh i'm going to insert i'm

597.76

going to print the insert statement just

598.959

cuz

599.92

for like debugging purposes and let's

602.56

see

604.56

now what i can do

607.68

is i can come through

609.36

it's going to connect to my data

611.6

it's going to go in and now it's going

613.12

to go blow out a bunch of inserts

617.12

uh because i did what ten thousand

620.48

and so i'm going to get the most 10

622.24

recent

623.6

based on

625.44

order by that first row whatever it

627.519

happens to be

628.72

so i'm going to get the most recent up

630.24

to 10 000 of like in this case breeders

637.519

and it's gonna do the same things for

639.68

horses and owners

642.16

so it's gonna take a second like see

643.519

this is why we need to do this limit or

647.76

probably want to do this limit because

649.2

it's going to take a while because this

650.16

is a

651.36

a

652.16

very brute force approach

656.079

to bringing our data over and again it's

658.8

stepping on stuff because we're

659.839

truncating a table and then we're adding

662

this on

663.12

now what we could do is we could for

665.68

example

667.2

we could store

670.079

the last the maximum

672.8

last id from that table and then say hey

675.92

let's just grab everything that's

677.279

greater than

678.64

the id from the last time we ran it so

680.48

we're only gonna get new records

682.32

and there's i mean there's other things

683.519

we could do we could go in and we could

684.88

do like a record by record compare

687.44

but that would be

689.12

painfully slow for anything that's got

690.959

more than maybe dozens or hundreds of

693.519

records once you start getting to

694.72

thousands

696.32

particularly because we're reaching out

697.519

to different databases and doing this

699.04

one at a time

700.64

it could be

701.68

very time consuming

703.36

now a faster way to do this would be

706.16

to instead of like we're doing here

709.04

we're making all these individual calls

711.519

we could have one gargantuan statement

714.639

and do this every

716.48

um

717.36

like put together a hundred

719.6

uh inserts and then execute all 100 in

722

one shot and there's some things like

723.6

that we could do to speed this up

725.839

uh but right now we're gonna keep it

727.04

just sort of uh

728.399

sort of a brute force approach

730.639

uh we're getting there we're getting

731.92

through our horses at this point

735.279

i can't remember what else we have in

737.92

there

739.839

see if it's going this way they'll

741.12

probably look up type then owners let's

743.12

see it may just be going in

745.04

we'll see what order it's going through

746.48

the tables

748.639

but we're gonna go through we're gonna

749.76

do this

750.72

and we're gonna have when we're done so

752.8

we can look here like i think it's

754.16

probably already done all dates

758.16

oop

759.92

let's do that oh

761.44

it's deciding it wants to hang for a

763.12

second

770.16

oh probably because it's doing all this

771.68

okay

773.76

all dates data

776.959

yeah because it's hammering this thing

778.48

right now

779.76

yeah so i've brought my dates across uh

782.079

that's not super impressive but if you

783.92

looked at like breeders

787.519

uh

788.48

opening tab

790.32

yeah now you're gonna see so here's all

791.92

of them and we can see down here if you

794.16

look i've got my 10 000 records because

796.639

that's my limit

798.399

and so now it's pulling all this data in

801.44

if i want to do a small database then i

803.2

can do this and keep this down to you

804.959

know 100 or

806.48

a thousand or something like that so i

807.92

can pull

809.36

like effectively a test across now

812.48

there are going to be some issues that

813.839

we're going to run into

815.36

and we will talk about these as we get

817.12

further on oh here we go now it's

818.56

speeding up

819.68

um

820.639

one of them is actually the primary one

822.639

that we have to worry about are foreign

823.76

key relationships

826

when we try to truncate data

828.399

it's not going to allow us to do that

829.839

when there are foreign key relationships

831.279

around we've got to actually empty

833.12

tables out in a certain certain order or

835.76

go through deletes or

838

disable foreign keys and then build them

840

back after we've

842.079

populated the data

843.839

we also have to worry about when we do

845.6

partials like this where we're only

847.68

doing you know 10 000 rows

850.24

we have to worry about the situation

851.76

where we pull partial data in one table

854.959

and partial data in another but they

856.72

don't have

858.32

like one table it's referring to ids

860.48

that don't exist

861.92

because they're not part of the partial

863.839

that was pulled in another

865.92

so

866.8

while this may seem pretty simple and

868.399

straightforward to this point as far as

870.88

building tables columns and data

874.32

we're really not there yet also with

876.32

that

877.68

we may have to worry about what if we

879.36

have a change in columns

882

from source to destination and so we're

884.48

going to tackle some of those next time

885.92

around uh but for right now at least

887.92

we've got something that just gives us

889.44

the more in the basic sense it does

892

do our synchronization of

894.079

data columns and uh tables

896.72

so that being said is a good stopping

898.32

point we'll come back around next time

900.24

but as always go out there and have

901.6

yourself a great day a great week and we

904.079

will talk to you

905.92

next time

923.199

you