📺 Develpreneur YouTube Episode

Video + transcript

Data Scraping - Part1

2022-11-29 •Youtube

Detailed Notes

This is part one of a short series on data scraping. We cover the topic at a high level and describe the different forms of the solutions. That includes calling an API, downloading reports, and even scraping data a field at a time while crawling a web page. There are tips and tricks included to help you create a high-quality scraping solution and to properly set expectations.

This comes from the Develpreneur.com mentor series of presentations.

Transcript Text
foreign
[Music]
for this presentation it really came
from a lot of recent requests and job
postings that I've more like I guess
Consulting project postings that I've
seen
where it seems like data
is something that people are now aware
of and they realize that there are ways
to get it free out on the Internet or
you know sometimes you have to have a
some sort of a um
a membership or you pay to log into the
site or something like that
but rather than rely on whatever a site
or provider has
sometimes it's easier to just scrape and
not use their uh you know their
interface and so that's that's really
where this presentation comes from
is looking at
the different options that you have so
if you're in a situation where you know
you want to or you need to scrape data
then
you may have something in your mind that
is scraping that
is not as simple as the solution that
you do have available and so that's
where I figured we would take this one
and some of this is going to be so
starting with an overview but we're
going to talk about some data sources
like apis actually scraping off of web
pages RSS feeds you know very similar to
apis and then sort of wrap up with some
tricks tips and trip up sort of the
things that you need to think about if
you're going to jump into some sort of a
scraping project
so in general
a lot of times people think of scraping
as back to the old school trying to find
a way to get data off of
um mainframes and get that into some
sort of modern database and that was
what was done was you know called
scraping and what you would do is you
would have something that would log in
and then based on it you know go into
the display
go you know X characters over Y
characters down pull that and then suck
that into a database and it included
things like with scraping it's usually
you can essentially the equivalent of
like a screenshot so you're pulling data
off the screen and then you're parsing
that and then pushing it somewhere but
also a lot of times it includes
interaction so you're entering data
you're clicking buttons
often uh seen with uh like some of the
web crawling and things like that as
well or spidering a website is often
going to be you know go sort of hand in
hand with the idea of scraping
but the goal of it really is a scraping
project or doing that functionality
you're really just making use of usually
it's publicly available data
sometimes it's something that's you know
behind some sort of a paywall but it's
still using somebody else's data and
sourcing that in a way that you can use
it if you think of yourself maybe even
as a Clearinghouse of data that you can
go in you can format you can
um you can do some sort of manipulations
of it calculations based on it group it
yeah you name it but now you're taking
all this data and repurposing it
essentially
usually or sometimes you'll find that
you can actually have multiple sources
so you may be validating data or maybe
it's you know maybe it's regional or
something like that so you get slightly
different data from these various
sources and you're finding some way to
sort of merge those in
a lot of times you're going to have
different formats your target is you
know your database your data warehouse
but the format of the data and even the
consistency of the data that you pull in
can be very different in some cases for
example in some cases you may get an
entire address in city state and zip and
in others you get you know maybe just a
name or a street or something like that
or maybe just a zip code
a good example to think of
for the multiple formats is think about
if you've ever gone out to job sites
particularly like you know surf three or
four of them if you look at a like a
monster.com versus uh uh
prg.jobs.com or indeed.com or
ladders.com all of these sites
effectively have job data
but it's going to be different some will
have different levels of descriptions
they'll have different categories all
kinds of things like that some will tell
you who the customer is or who the
company is some won't so all of those
are going to be multiple formats
like I said your goal is that you're
going to repurpose this for whatever
your solution happens to be and let's
just take like a if it's a the job
example so maybe you want to cite that
allows people to just see what's out
there uh all of the let's say you know
Medical Health Care related jobs
so you could go to all of these
different sites and essentially scrape
data off of them depending on how
they're set up but basically go scrape
their data and instead of people having
to go to all those sites and do you know
searches across all of them you are
basically doing the search Forum taking
the results and then you know you can
repurpose it reformat it do it in a way
that's consistent so they can more
easily do apples to apples comparisons
and then provide that for for your users
so a lot of it is
um
the ability maybe to share data that
otherwise would be very difficult to
share or to just provide easier access
you know particularly if you're talking
about a lot of different data sources it
can be a pain to go out to 10 different
places and there's definitely something
you know there is a value for not having
to do that to be able to come to one
place and see all that data if you think
of RSS feed readers those apps that's
exactly what it does is it just says
instead of you having to go look at all
these feeds and then you know go to
these sites and read their articles then
we'll just go hit all of these RSS feeds
we'll pull those articles in and then
you can read them in in one easy place
so that's sort of that's you know a
summary sort of what scraping looks like
so when you talk about ways and this
goes back to uh the many data sources
the probably the easiest way
to pull data is via apis if you have uh
data sources that provide an application
programming interface an API then
you can get a lot of data that's going
to be structured that's going to be
clean you usually can do a lot of
filtering and send parameters into the
query to sort of limit stuff to what
matters to you
Now with an API it may be public so it
may just be a matter of hitting a
hitting an address and it's going to
kick some information back you may need
some sort of an access key or
authentication that is definitely not
uncommon these days particularly for a
lot of cloud-based software they do have
apis but they also require you to have
at least some sort of access key or a
developer key
and that's really so people don't just
you know Crush their systems with
requests
uh with an API once you've got that key
then it's it's really it's what are the
parameters what is the data that you
want to use most of these are going to
be documented uh sometimes in very you
know very solid documentation so
um let's see I don't know if I've got
one up
see
uh out there
to a quick
apologies because then I think about
this till just now but I may have one
uh let's see
yeah it's just gonna give me a yeah
so you may see stuff like this as you
can see some pretty solid documentation
about API so this would be things like
you know what's your call you can see
what are the parameters which is you
know maybe I don't want to bring
everything in maybe I only want to you
know do data for the last month or maybe
I want to do only data since the last
time I made a call
uh and then you're going to see nice
formatted stuff so you're going to see
column names you're going to see data
types things like that that make it
fairly clean and easy to use an API
because then it's just a matter of you
know essentially it's just simple data
mapping at that point you know what
you're getting in from the API and then
where do you want to store that
but with an API because of how they're
set up you're going to want to think
about your frequency how often are you
going to need to do a call do you need
to do data do you need to update it
essentially real time so like every five
minutes do you need to do it once an
hour or do you need to do it once a day
do you need to do it once a week that
kind of stuff's going to vary quite a
bit when you think of again think of an
example of job sites
you may want to do that every hour so
that there's fresh jobs showing up and
bring people back but maybe you know
your customers are only typically going
to do uh they're only typically going to
be looking for jobs maybe on the weekend
and so then maybe only do it like you
know once or twice a week maybe every
you know Friday you or maybe Thursday
night you go out and get all the latest
stuff and then that's you know that
covers you for the next week
the you know the work of most apis it's
connecting is usually pretty easy
so then it's really comes down to
mapping the results deciding which of
the data that the API provides you that
you need and then mapping that into
places to store in your database now
sometimes it may be
um we'll call it like child data or
related data so you may have to make a
call pull all your data in and then
there may be some sort of like an ID
that you're going to have to go back and
use that list of IDs to make additional
calls to get more information
so for example
let's say you've got
um
you've got companies that you're you're
pulling information in on and each of
those when you call the companies they
have a primary contact but that's
basically all they have is just have
like and maybe it's a name and an ID but
also within the API they have a way to
get contact information like a address
and email and phone and things like that
so in that case you may have to do a you
know a multi-pass where you would first
get all of the companies store all that
information where you need wherever you
need to and then go back for all of
those contact IDs make calls or a call
and then you know match the contact IDs
depending on how it works to make sure
you store your data and and the
relationships within those
and that means sometimes you're going to
have to do uh you may have to do some
lookups or some sort of an ID caching or
something like that because you'll have
for example let's say you're bringing
companies in and you've just got a a
site that's displaying company
information well when that data updates
then you want to make sure that instead
of on your you know your subsequent call
instead of creating yet another customer
record you want to be able to know what
is that ID so I can go update the
correct data and in my system
and particularly if you're doing
multiple systems multiple data sources
then you're going to want to have some
way to know what was that probably what
was that data source what was the ID in
that system and then you're going to
separately have your own ID within your
system so you're going to have to be
able to you know map that in some way
and you may do that again you may have
to do that by like
pulling a bunch of data caching a list
of IDs or something like that and then
making calls based on that or using that
to do sort of on-the-fly mappings or you
may do it by you know like storing data
source and external ID things of that
nature
but apis give us
the key is apis give us the cleanest uh
typically the cleanest way to get our
data
now scraping is going to be a little
more challenging because this is where
we're going to be going out to a website
and pulling data so for an example
maybe I want to go to
the developreneur site
and look at uh look at the blog
and maybe I want to go through here and
I want to have a scraper that goes
through and I want to go to this site
and I want to pull all the content you
know basically all the titles maybe the
date it was posted that kind of
information maybe the category
um there's a lot of that you can get
because you could get it from either
scraping we actually have in this case
you can do an RSS feed but
um it may be something where that
doesn't exist where there is no RSS feed
so you end up having to go in and scrape
which means you're going to have to go
in and
you're gonna have to look into this you
know looking at the source you're gonna
have to go in and find a way to navigate
down into that source to say Okay I want
this
uh uh for me to get this title which is
probably hard to see I'm gonna have to
go in and get this little link here and
I'm going to have to get the text out of
this you know this anchor
so it can be
challenging but you can get it exactly
if you need to usually exactly how a
database has it so usually that's going
to involve with scraping that's going to
be some sort of a home page or maybe
even a login that you're going to have
to do and this is going to be automated
through something like a lot of times
through like selenium or you can't I
mean selenium gives you an easy way to
record and then generate code to do
these things but you also can you know
use whatever your language of choice is
whether it's Java or python or Ruby or
whatever
as long as it can do a post
then you can post out to the site bring
the information back and then with the
resulting data you can start walking
through it now with that you're probably
going to need some sort of keys or IDs
or path we've mentioned before things
like I can take something I think I can
go from here and I can do
uh let's see that's what oh it's here
Maybe
yeah here we go so I can so here I can
copy like you know there's we have
selector paths and xpaths but if I take
that X path
and
um let me just go here
it looks something like this you know
it's not the easiest thing but it is a
way to get you based on like an ID it's
going to give me to where I need to go
now this is not going to be really
useful
um this specific example because IDs are
going to change you're going to have to
have a way to walk through all of
in this structure whatever that is so
the
uh what is that that's probably a post
that's a post Loop so yeah everything
that's a you know a list post you're
gonna have to get all that kind of stuff
so it can be time consuming but it also
doesn't require the customer to to
provide you much
um so you're you know you're gonna go in
you're gonna figure out what your IDs
are what your path is what data are you
going to pull a lot of times you're
going to have to repeat that because
there's going to be you know lists of
data on your site that you're scraping
uh you're probably going to have to do
some some link navigating like in this
case if I wanted to get more information
I would have to you know click on the
link and then pull you know to pull like
the full data I would have to pull it
there
thank you
Transcript Segments
0.359

foreign

7.28

[Music]

27.619

for this presentation it really came

30.42

from a lot of recent requests and job

35.399

postings that I've more like I guess

37.5

Consulting project postings that I've

39.3

seen

40.26

where it seems like data

42.54

is something that people are now aware

45.3

of and they realize that there are ways

47.52

to get it free out on the Internet or

49.98

you know sometimes you have to have a

51.36

some sort of a um

53.579

a membership or you pay to log into the

57

site or something like that

58.98

but rather than rely on whatever a site

62.699

or provider has

65.1

sometimes it's easier to just scrape and

67.56

not use their uh you know their

69.72

interface and so that's that's really

72.24

where this presentation comes from

75.119

is looking at

77.4

the different options that you have so

79.5

if you're in a situation where you know

81.18

you want to or you need to scrape data

84.119

then

85.619

you may have something in your mind that

88.619

is scraping that

90.299

is not as simple as the solution that

92.28

you do have available and so that's

93.9

where I figured we would take this one

96.9

and some of this is going to be so

99.299

starting with an overview but we're

101.22

going to talk about some data sources

102.659

like apis actually scraping off of web

105.54

pages RSS feeds you know very similar to

108.84

apis and then sort of wrap up with some

111.36

tricks tips and trip up sort of the

112.979

things that you need to think about if

115.619

you're going to jump into some sort of a

117.54

scraping project

120.899

so in general

123.06

a lot of times people think of scraping

125.159

as back to the old school trying to find

128.52

a way to get data off of

130.679

um mainframes and get that into some

134.58

sort of modern database and that was

136.56

what was done was you know called

139.2

scraping and what you would do is you

140.76

would have something that would log in

143.22

and then based on it you know go into

145.68

the display

146.94

go you know X characters over Y

149.76

characters down pull that and then suck

152.94

that into a database and it included

155.879

things like with scraping it's usually

158.099

you can essentially the equivalent of

160.8

like a screenshot so you're pulling data

162.72

off the screen and then you're parsing

165.3

that and then pushing it somewhere but

167.04

also a lot of times it includes

168.18

interaction so you're entering data

169.8

you're clicking buttons

171.78

often uh seen with uh like some of the

174.78

web crawling and things like that as

176.519

well or spidering a website is often

179.879

going to be you know go sort of hand in

182.04

hand with the idea of scraping

184.98

but the goal of it really is a scraping

187.62

project or doing that functionality

189.9

you're really just making use of usually

192.72

it's publicly available data

195.239

sometimes it's something that's you know

197.099

behind some sort of a paywall but it's

199.379

still using somebody else's data and

202.68

sourcing that in a way that you can use

205.08

it if you think of yourself maybe even

206.7

as a Clearinghouse of data that you can

208.319

go in you can format you can

211.2

um you can do some sort of manipulations

214.92

of it calculations based on it group it

218.36

yeah you name it but now you're taking

221.099

all this data and repurposing it

223.799

essentially

225.56

usually or sometimes you'll find that

228.42

you can actually have multiple sources

230.459

so you may be validating data or maybe

233.34

it's you know maybe it's regional or

235.5

something like that so you get slightly

237.72

different data from these various

239.22

sources and you're finding some way to

240.78

sort of merge those in

242.459

a lot of times you're going to have

243.72

different formats your target is you

246.659

know your database your data warehouse

249.36

but the format of the data and even the

253.92

consistency of the data that you pull in

256.079

can be very different in some cases for

259.139

example in some cases you may get an

261.78

entire address in city state and zip and

264.54

in others you get you know maybe just a

266.759

name or a street or something like that

268.199

or maybe just a zip code

270

a good example to think of

272.1

for the multiple formats is think about

275.16

if you've ever gone out to job sites

277.32

particularly like you know surf three or

279.12

four of them if you look at a like a

281.22

monster.com versus uh uh

285.02

prg.jobs.com or indeed.com or

288.06

ladders.com all of these sites

291.36

effectively have job data

295.199

but it's going to be different some will

297.06

have different levels of descriptions

298.32

they'll have different categories all

300.84

kinds of things like that some will tell

302.22

you who the customer is or who the

303.419

company is some won't so all of those

305.58

are going to be multiple formats

308.1

like I said your goal is that you're

310.32

going to repurpose this for whatever

311.94

your solution happens to be and let's

314.58

just take like a if it's a the job

317.04

example so maybe you want to cite that

321.24

allows people to just see what's out

323.58

there uh all of the let's say you know

326.46

Medical Health Care related jobs

329.16

so you could go to all of these

330.66

different sites and essentially scrape

334.199

data off of them depending on how

335.88

they're set up but basically go scrape

337.38

their data and instead of people having

339.24

to go to all those sites and do you know

341.52

searches across all of them you are

344.58

basically doing the search Forum taking

346.08

the results and then you know you can

348.36

repurpose it reformat it do it in a way

350.46

that's consistent so they can more

351.96

easily do apples to apples comparisons

354.86

and then provide that for for your users

358.62

so a lot of it is

361.199

um

361.919

the ability maybe to share data that

364.32

otherwise would be very difficult to

366.24

share or to just provide easier access

368.4

you know particularly if you're talking

370.02

about a lot of different data sources it

373.56

can be a pain to go out to 10 different

375.06

places and there's definitely something

377.039

you know there is a value for not having

379.919

to do that to be able to come to one

381.9

place and see all that data if you think

384.06

of RSS feed readers those apps that's

387.78

exactly what it does is it just says

389.46

instead of you having to go look at all

390.96

these feeds and then you know go to

392.52

these sites and read their articles then

395.639

we'll just go hit all of these RSS feeds

398.28

we'll pull those articles in and then

399.72

you can read them in in one easy place

403.38

so that's sort of that's you know a

405.06

summary sort of what scraping looks like

407.58

so when you talk about ways and this

409.979

goes back to uh the many data sources

414.06

the probably the easiest way

416.819

to pull data is via apis if you have uh

421.979

data sources that provide an application

424.38

programming interface an API then

427.68

you can get a lot of data that's going

430.139

to be structured that's going to be

431.34

clean you usually can do a lot of

433.58

filtering and send parameters into the

436.8

query to sort of limit stuff to what

438.96

matters to you

441.419

Now with an API it may be public so it

443.819

may just be a matter of hitting a

446.639

hitting an address and it's going to

448.319

kick some information back you may need

450.599

some sort of an access key or

452.34

authentication that is definitely not

454.979

uncommon these days particularly for a

459

lot of cloud-based software they do have

461.34

apis but they also require you to have

464.099

at least some sort of access key or a

467.4

developer key

468.96

and that's really so people don't just

470.88

you know Crush their systems with

472.5

requests

473.94

uh with an API once you've got that key

476.58

then it's it's really it's what are the

478.319

parameters what is the data that you

480.78

want to use most of these are going to

482.88

be documented uh sometimes in very you

485.819

know very solid documentation so

490.62

um let's see I don't know if I've got

491.94

one up

493.979

see

497.72

uh out there

501.24

to a quick

504.979

apologies because then I think about

506.699

this till just now but I may have one

510.199

uh let's see

513.12

yeah it's just gonna give me a yeah

517.02

so you may see stuff like this as you

519.06

can see some pretty solid documentation

521.58

about API so this would be things like

523.919

you know what's your call you can see

526.14

what are the parameters which is you

528.36

know maybe I don't want to bring

529.8

everything in maybe I only want to you

532.08

know do data for the last month or maybe

533.7

I want to do only data since the last

536.1

time I made a call

537.6

uh and then you're going to see nice

539.04

formatted stuff so you're going to see

540.839

column names you're going to see data

542.82

types things like that that make it

546.3

fairly clean and easy to use an API

548.88

because then it's just a matter of you

550.92

know essentially it's just simple data

552.72

mapping at that point you know what

554.58

you're getting in from the API and then

556.8

where do you want to store that

559.38

but with an API because of how they're

562.74

set up you're going to want to think

564.12

about your frequency how often are you

566.1

going to need to do a call do you need

567.6

to do data do you need to update it

570.24

essentially real time so like every five

572.64

minutes do you need to do it once an

574.2

hour or do you need to do it once a day

575.519

do you need to do it once a week that

578.339

kind of stuff's going to vary quite a

580.14

bit when you think of again think of an

582.6

example of job sites

585.779

you may want to do that every hour so

587.7

that there's fresh jobs showing up and

589.86

bring people back but maybe you know

592.019

your customers are only typically going

593.64

to do uh they're only typically going to

595.8

be looking for jobs maybe on the weekend

597.18

and so then maybe only do it like you

599.88

know once or twice a week maybe every

601.56

you know Friday you or maybe Thursday

604.5

night you go out and get all the latest

606

stuff and then that's you know that

608.399

covers you for the next week

611.399

the you know the work of most apis it's

614.519

connecting is usually pretty easy

616.8

so then it's really comes down to

618.42

mapping the results deciding which of

620.58

the data that the API provides you that

622.56

you need and then mapping that into

624.899

places to store in your database now

627.48

sometimes it may be

629.64

um we'll call it like child data or

632.04

related data so you may have to make a

635.16

call pull all your data in and then

637.08

there may be some sort of like an ID

639.06

that you're going to have to go back and

641.16

use that list of IDs to make additional

643.5

calls to get more information

645.6

so for example

647.76

let's say you've got

649.86

um

650.64

you've got companies that you're you're

653.399

pulling information in on and each of

656.88

those when you call the companies they

658.86

have a primary contact but that's

661.5

basically all they have is just have

662.76

like and maybe it's a name and an ID but

665.16

also within the API they have a way to

667.32

get contact information like a address

670.32

and email and phone and things like that

673.44

so in that case you may have to do a you

675.779

know a multi-pass where you would first

678.899

get all of the companies store all that

681.899

information where you need wherever you

683.399

need to and then go back for all of

684.899

those contact IDs make calls or a call

689.1

and then you know match the contact IDs

691.079

depending on how it works to make sure

693.24

you store your data and and the

694.98

relationships within those

697.26

and that means sometimes you're going to

698.7

have to do uh you may have to do some

700.92

lookups or some sort of an ID caching or

703.5

something like that because you'll have

706.56

for example let's say you're bringing

708.48

companies in and you've just got a a

710.76

site that's displaying company

712.019

information well when that data updates

714.839

then you want to make sure that instead

717.6

of on your you know your subsequent call

720.48

instead of creating yet another customer

722.22

record you want to be able to know what

724.74

is that ID so I can go update the

727.62

correct data and in my system

730.7

and particularly if you're doing

732.92

multiple systems multiple data sources

736.079

then you're going to want to have some

737.82

way to know what was that probably what

739.98

was that data source what was the ID in

742.14

that system and then you're going to

743.94

separately have your own ID within your

746.399

system so you're going to have to be

747.54

able to you know map that in some way

749.76

and you may do that again you may have

751.74

to do that by like

753.26

pulling a bunch of data caching a list

755.64

of IDs or something like that and then

757.56

making calls based on that or using that

760.26

to do sort of on-the-fly mappings or you

763.2

may do it by you know like storing data

765.899

source and external ID things of that

768.06

nature

769.86

but apis give us

771.66

the key is apis give us the cleanest uh

774.959

typically the cleanest way to get our

776.88

data

779.88

now scraping is going to be a little

782.94

more challenging because this is where

784.32

we're going to be going out to a website

786.18

and pulling data so for an example

790.68

maybe I want to go to

793.68

the developreneur site

798.36

and look at uh look at the blog

802.62

and maybe I want to go through here and

804.36

I want to have a scraper that goes

805.86

through and I want to go to this site

807.66

and I want to pull all the content you

809.339

know basically all the titles maybe the

810.959

date it was posted that kind of

813.6

information maybe the category

816.18

um there's a lot of that you can get

817.44

because you could get it from either

820.2

scraping we actually have in this case

823.139

you can do an RSS feed but

826.38

um it may be something where that

827.459

doesn't exist where there is no RSS feed

829.26

so you end up having to go in and scrape

830.82

which means you're going to have to go

832.44

in and

834.779

you're gonna have to look into this you

836.16

know looking at the source you're gonna

837.48

have to go in and find a way to navigate

840

down into that source to say Okay I want

842.579

this

843.899

uh uh for me to get this title which is

846.899

probably hard to see I'm gonna have to

848.579

go in and get this little link here and

850.86

I'm going to have to get the text out of

852.6

this you know this anchor

855.36

so it can be

857.639

challenging but you can get it exactly

861.12

if you need to usually exactly how a

863.22

database has it so usually that's going

865.56

to involve with scraping that's going to

867.06

be some sort of a home page or maybe

869.579

even a login that you're going to have

870.959

to do and this is going to be automated

873.3

through something like a lot of times

874.98

through like selenium or you can't I

877.68

mean selenium gives you an easy way to

879.66

record and then generate code to do

881.639

these things but you also can you know

884.399

use whatever your language of choice is

886.38

whether it's Java or python or Ruby or

889.079

whatever

890.519

as long as it can do a post

893.16

then you can post out to the site bring

896.459

the information back and then with the

899.579

resulting data you can start walking

902.22

through it now with that you're probably

904.199

going to need some sort of keys or IDs

906.54

or path we've mentioned before things

909.3

like I can take something I think I can

911.639

go from here and I can do

915.92

uh let's see that's what oh it's here

920.16

Maybe

921.42

yeah here we go so I can so here I can

924.18

copy like you know there's we have

926.16

selector paths and xpaths but if I take

928.199

that X path

930.3

and

932.639

um let me just go here

935.1

it looks something like this you know

937.199

it's not the easiest thing but it is a

940.26

way to get you based on like an ID it's

943.32

going to give me to where I need to go

944.82

now this is not going to be really

946.8

useful

948.6

um this specific example because IDs are

951.18

going to change you're going to have to

952.38

have a way to walk through all of

955.8

in this structure whatever that is so

958.38

the

959.459

uh what is that that's probably a post

961.74

that's a post Loop so yeah everything

964.199

that's a you know a list post you're

966.12

gonna have to get all that kind of stuff

968.279

so it can be time consuming but it also

972.36

doesn't require the customer to to

974.399

provide you much

976.68

um so you're you know you're gonna go in

978.54

you're gonna figure out what your IDs

979.74

are what your path is what data are you

981.72

going to pull a lot of times you're

983.22

going to have to repeat that because

984.42

there's going to be you know lists of

986.279

data on your site that you're scraping

988.32

uh you're probably going to have to do

990.54

some some link navigating like in this

992.699

case if I wanted to get more information

994.079

I would have to you know click on the

996.54

link and then pull you know to pull like

1000.019

the full data I would have to pull it

1001.82

there

1012.94

thank you