📺 Develpreneur YouTube Episode

Video + transcript

Data Scraping - Part 3

2022-12-06 •Youtube

Detailed Notes

This is the wrap-up and Q&A of a short series on data scraping. We cover the topic at a high level and describe the different forms of the solutions. That includes calling an API, downloading reports, and even scraping data a field at a time while crawling a web page. There are tips and tricks included to help you create a high-quality scraping solution and to properly set expectations.

This comes from the Develpreneur.com mentor series of presentations.

Transcript Text
thank you
[Music]
one of the things that that particularly
now because there's just there's just so
much data that we deal with is consider
options for either like caching some of
the data like even locally on the
machine that's doing this the scraping
or
uh some sort you know like if you're
doing lookups on a regular basis then
you know maybe you want to cash those so
you minimize the calls
um and maybe you need to do something
where you regularly refresh that look up
but you're still storing it you know
locally or in memory or maybe even into
a local data store or out to a file that
just eliminates your need to go hit that
that website or that API or that RSS
feed yet again
and you may run into situations like API
calls in particular it is becoming you
know uh common enough that one there is
a limit to the number of calls you can
make either you know per month sometimes
per day sometimes per hour
and you may have to look particularly in
your
initial like data loading processes it
may be something where you're actually
you may have to run for days to get all
the data that you want to get your
initial setup going so you may have to
look at ways to do some sort of a you
know accounts or time delays or
something like that so that if there's a
you know of example they say you can
only make a thousand calls per hour
then you're going to have to have
something in there that has a little
timer and has an account of uh calls and
says okay I've made you know and maybe
you can make those count those pretty
quick but you know maybe it's something
where you say oh I've hit my limit so
now I'm going to give it you know a
certain amount I'm gonna give it an hour
and then I'm gonna go kick it off again
or I hit my limit but now
you know I can
I I know that I hit that limit over a 30
minute period so 35 minutes from now I
can start pulling those again or
I built a time delay so that I always do
you know I I can't do more than a
thousand so I have timers in there
that's little sleep timers or something
like that that guarantees that I will
never get more than you know 999 calls
in an hour or something on those five so
you may have to do some delays or other
little tricks to avoid hitting limits
sometimes it's just a matter of you're
gonna have to do the call and then you
have to go to sleep for you know have
the system go to sleep for five minutes
then do the call again so it doesn't
look like you're so you don't crush the
system but also you know maybe so you
don't get flagged as a super high
traffic site
uh another thing you can do is look at
um time stamps and and update dates and
things like that so that you can store
when you made your last call
and then the next time you make the call
put a parameter in there or you know
provide a parameter assuming that they
allow for that functionality that says
well I only want data that's changed
since the last time I made the call
that makes maybe you know the first time
you ever do it maybe that's an expensive
call but from then on it's only going to
take deltas and therefore maybe is
something a little more reasonable to to
work with
so what have we hopefully learned in
this
but one is that scraping can come in
many forms when you see you know or
receive a request to scrape data realize
that that may not be actual web scraping
you may be able to do that through like
an API or RSS feeds or
um another thing that I didn't really
specifically get into because it's it's
sort of a combination of us it's really
like a web scrape-ish kind of thing uh
sometimes you can just do automation
that goes in logs into a site goes out
to some report link generates that
report saves that file and then you have
that file you know whether it's XML Json
CSV
and you actually instead are parsing you
know importing and ingesting that file
as opposed to actually going out to the
uh you know to the website and scraping
and sometimes that is a far more uh
effective easier and less fragile
approach to take
also I think which hopefully it's seen
is because I've brought it up so much
time is really the the bulk of the work
outside of the parse is just is really
just going to be mapping data usually
there's
you know once you get the the parsing
done the first time you get the parsing
done particularly hitting out wires can
be a real pain but after that it tends
to be it's like okay I can get the data
it's just where do I need to put it and
so it becomes you know mapping and
making sure that the relationships are
are properly maintained
formatting your data is going to be a
critical piece otherwise it's going to
be really hard to uh generalize it so
for example you know if you want to
compare things that happen within a
certain date range you've got to have
date formats that match so that you're
able to do that Apple's Apple comparison
of one date to another
uh because of that you're you think of
the two key ends there is the the
parsers that are going to pull data in
and then the formatters that are really
converting that data to your uh your
format those are going to be you know
invaluable tools to have and those are
going to be critical for you to be able
to have something that gets the right
data and spits it out in a way that's
usable
and as I mentioned
whenever you're doing anything like this
start simple go with like a single
record or you know something like that
some of the simplest kind of stuff make
sure that you can do that validate your
process and then repeat it for you know
either more records or for some of the
more complicated uh record formats to
try to make sure that you cover
everything that way you at least have a
baseline that works
foreign questions and comments
so it's kind of funny Rob uh one of the
things that you're talking about because
I I have to deal with screen scrapers a
lot and
what exactly are you using selenium for
uh let's start with that one
so for selenium
um I I have used it a lot for scraping
where
um
I'll go in fire up selenium and I will
record
uh activity
and then turn around and with that from
selenium then you can do a it has like
it's like a bunch of different
um
I see do I have someone even fired up
here real quick I got it installed on
this guy
um
shoot I don't think I have it so are you
using it for the screen recording
capabilities to get the IDS and that
that you need to get to where you're
going and then you just translate that
into Python A lot of times yeah it
actually like I use Python a lot because
but like selenium will create it for you
selenium will generate the script
in python or Java or there's like or PHP
whatever it is there's a bunch of
different languages that you know
basically what you do is you record
yourself you know you record your
activity
and then tell it to generate the code to
do that and it will generate for you
that you know that PHP script or that
python script that does exactly what you
just did
and that at least gives me a
basically it gives me a starting point
because sometimes it's not going to give
the you know like it'll use like say
XPath uh although usually that's gonna
be pretty good if it's using like the
selector CSS selector then that may not
work for all of the the records that I
want to do
and this again typically is because it's
going to do it for
I'm going to do it for a single case
so for example you know like that job
site thing is I would record myself
going to monster.com select it you know
running a search
open clicking on a job and then
want to be able to parse that job data
and suck it into my my system
and then I would take that and basically
want to generalize it so looking at what
you know what I've been given in the
script it's basically like a it sort of
kick-starts that process so now I've got
some of the the basics of it built in
and then I can go in and I can you know
tweak the IDS and and some of the
navigational stuff in that but it
definitely speeds up the whole scraping
process
so you're using it for the record
playback feature and the generation of
the code all right yeah no that makes
sense I was just curious
in that instance that makes sense
um I was just trying to figure out why
you were using that versus just using
python but that makes sense so you
basically build the script and then you
modify the script which is what we do
for testing yeah as I get I basically
get selenium to write the python code
for me for the most part and then I go
back and you know make the adjustments
as needed
anyways
right so here's an
kind of a odd observation from my nugget
to what you're talking about so have you
considered possibly doing could you do
some of this with like those streams and
batching processes where you write
something that would essentially trigger
a script or like for your RSS feed so
you could pull in the RSS feed through
your stream do that kind of scraping
formatting and then just inject it into
your system without having to go the
other path
what's the other path
oh where you have to actually write your
own uh you know take the art like you
have to write something go hit the rxs
feed whereas if you use the stream
process you just point it to that and
say okay go consume this like every you
know day every hour
um yeah you can I mean that's it's
really it's a different
um problem that you're solving in that
case because that would you can scrape
and do it for a single source
typically what you're doing is you you
want to you're not just
um you know on the Fly displaying that
you're usually what you're doing is
you're storing that information you're
doing some sort of a
um you're doing something with that data
so it's not just
um it's not just that you're you're
redisplaining that you're like taking it
just reformatting it what you're
actually doing is you're taking it
usually from multiple sources and
[Music]
then you're converting that into
something that uh some you know General
format that then you're going to be
kicking back out to each other so like
for the RSS feed it would not be a
single feed you're actually going out to
multiple feeds and then you're probably
you know maybe you've got some
additional data around that like hey
I've shown this to that person before or
uh some other additional data to it it's
not just a uh that's a different
um
problem set or you know context is the
idea of okay I've got a stream of data
that I want to somehow display maybe in
a more user-friendly you know fashion or
that I'm going to do some some basic
processing with it uh maybe do some
lookups or something like that to give
it some more context but then just
basically you know display the screen
that stream with just uh in a different
context is different from scraping and
pulling the data in and then now it's my
day because when I scrape it uh I did
like streaming it's really still not my
data I'm just tweaking the data as it
gets displayed uh with scraping it's
like this is my data
or I'm getting all these sources and I'm
going to make it at my data so I'm going
to do some sort of value add that before
I kick that back out to my users
okay all right that kind of makes sense
but you could do a uh consumer that
essentially does that with this stream
as well that's what we're doing uh where
I'm at we're taking in a lot of fees
converting it and putting it into our
system into our data so that's why I was
kind of curious yeah so in your case I
mean that's that to me is more yeah I
mean you're it's a it would be in a
general sense in that case it would be a
scraping thing but you're not uh in that
case but that's also where you're
storing it so you're not just
um you know redirecting that out you're
taking all of those disparate sources
and converting them into your
data warehouse or whatever it is and
then being able to
combine those essentially to combine
those data pieces and then uh you know
send those back out to your users or do
your systems in a consistent format
okay that makes sense yeah it was just
it was just kind of an interesting
correlation I never thought of it that
way before until
you presented this for it after I talked
about data streams yeah and it is like I
didn't really I didn't go into streams
specifically uh because those
there's some other little
those are slightly different because
they're because there's a little more
complexity in dealing with like a stream
than just an RSS feed but at the end of
the day it's
there's definitely a lot of similarities
between those
uh see other questions comments yeah uh
question what is what is the use of
scripting does it make the website
faster or are you just identifying
abnormalities on the website
uh for scripting in the scraping side of
it yeah
uh it's really it's to automate it it's
so that you're not having to go out and
click and do stuff so it's it's really
it's in the day program it's it's
allowing the YouTube programmatically go
out and surf a website and then grab
pieces of data and do stuff with them
so okay so what do you use that data for
like as a as a developer like it's
scrapping is not scripting isn't it
um I mean you can use the data for just
about anything like for example like
healthcare there are healthcare
companies that will go out and scrape
data or you know in some way form or
fashion a lot of times it's um they've
got some sort of like you know feeds or
streams that they're looking at but then
they're going to take that data and
analyze it and kick it back out and say
hey here's
um here's some current risks that that
are showing up in health care work
that's being going on or here's
um you know how we're how we're seeing
insurance claims being processed or yeah
things like that is that there's a lot
of different ways that you may utilize
that data and it's really it's uh it's
yeah it's you saying Hey I want to
provide this service or this solution
for my users but it is more valuable if
I have data outside of what I currently
have so I'm going to go out and scrape
all of these other sources to pull that
data in and provide you know bigger
banging for the buck to my customers
okay all right got it so it's kind of
like collecting uh user data isn't it
yeah okay it's collecting data of of
whatever it may be so like a job site
would be I said maybe you you want to
have your you want to create a job site
that you're gonna that people come visit
your site and be able to see what jobs
are available well you know instead of
you going out and data entering
thousands of jobs and finding out where
those jobs are you can go build a bunch
of scrapers that go out to various sites
where you know that they you know they
publish jobs that are available
and then pull that information into your
site
got it thank you
other questions or comments
all right so
it brings us to the all-important thank
you so appreciate the time sort of
listening through this and just you know
thinking about some of these things it's
a little different
uh for what some people you know may
have thought about scraping so hopefully
it's it's been useful to you uh as
always if you have any questions or
comments that you know come up after the
the presentation asks sometimes they do
the interesting email info developer.com
we have a contact desk perform out uh
contact us form out at developmentor.com
as well
you can follow us on Twitter at
developreneur or DMS air and then also
we've got a Facebook page uh developer
out on Facebook so
we just uh we appreciate doing this
appreciate your time as we just uh work
together try to make every developer
better
having a good day
foreign
Transcript Segments
10.7

thank you

18.89

[Music]

28.5

one of the things that that particularly

30.72

now because there's just there's just so

33.239

much data that we deal with is consider

36.42

options for either like caching some of

39

the data like even locally on the

41.04

machine that's doing this the scraping

42.66

or

44.46

uh some sort you know like if you're

46.26

doing lookups on a regular basis then

49.02

you know maybe you want to cash those so

50.94

you minimize the calls

52.98

um and maybe you need to do something

54.3

where you regularly refresh that look up

56.34

but you're still storing it you know

59.219

locally or in memory or maybe even into

62.1

a local data store or out to a file that

65.519

just eliminates your need to go hit that

69.18

that website or that API or that RSS

72.72

feed yet again

76.02

and you may run into situations like API

78.24

calls in particular it is becoming you

81.72

know uh common enough that one there is

84.6

a limit to the number of calls you can

86.58

make either you know per month sometimes

88.86

per day sometimes per hour

91.68

and you may have to look particularly in

95.7

your

97.439

initial like data loading processes it

100.86

may be something where you're actually

102

you may have to run for days to get all

104.159

the data that you want to get your

105.78

initial setup going so you may have to

108.24

look at ways to do some sort of a you

110.159

know accounts or time delays or

111.96

something like that so that if there's a

114.18

you know of example they say you can

115.92

only make a thousand calls per hour

118.079

then you're going to have to have

120

something in there that has a little

121.5

timer and has an account of uh calls and

125.52

says okay I've made you know and maybe

128.099

you can make those count those pretty

129.539

quick but you know maybe it's something

131.64

where you say oh I've hit my limit so

133.739

now I'm going to give it you know a

135.9

certain amount I'm gonna give it an hour

137.04

and then I'm gonna go kick it off again

138.8

or I hit my limit but now

142.319

you know I can

144

I I know that I hit that limit over a 30

147.12

minute period so 35 minutes from now I

150.239

can start pulling those again or

153.3

I built a time delay so that I always do

156.12

you know I I can't do more than a

158.22

thousand so I have timers in there

160.56

that's little sleep timers or something

162.3

like that that guarantees that I will

163.739

never get more than you know 999 calls

166.68

in an hour or something on those five so

169.14

you may have to do some delays or other

170.94

little tricks to avoid hitting limits

174.9

sometimes it's just a matter of you're

177.18

gonna have to do the call and then you

178.5

have to go to sleep for you know have

179.94

the system go to sleep for five minutes

181.14

then do the call again so it doesn't

182.879

look like you're so you don't crush the

184.56

system but also you know maybe so you

186.54

don't get flagged as a super high

188.4

traffic site

190.08

uh another thing you can do is look at

192.959

um time stamps and and update dates and

195.48

things like that so that you can store

197.159

when you made your last call

199.5

and then the next time you make the call

201.959

put a parameter in there or you know

203.7

provide a parameter assuming that they

205.5

allow for that functionality that says

207.78

well I only want data that's changed

209.459

since the last time I made the call

212.4

that makes maybe you know the first time

214.319

you ever do it maybe that's an expensive

216.3

call but from then on it's only going to

219.42

take deltas and therefore maybe is

221.28

something a little more reasonable to to

223.68

work with

227.099

so what have we hopefully learned in

229.44

this

230.58

but one is that scraping can come in

232.44

many forms when you see you know or

234.84

receive a request to scrape data realize

238.14

that that may not be actual web scraping

240.36

you may be able to do that through like

241.62

an API or RSS feeds or

245.34

um another thing that I didn't really

246.659

specifically get into because it's it's

248.76

sort of a combination of us it's really

250.5

like a web scrape-ish kind of thing uh

253.2

sometimes you can just do automation

254.7

that goes in logs into a site goes out

258.54

to some report link generates that

260.34

report saves that file and then you have

262.979

that file you know whether it's XML Json

266.46

CSV

267.84

and you actually instead are parsing you

270.54

know importing and ingesting that file

273.78

as opposed to actually going out to the

276.12

uh you know to the website and scraping

278.759

and sometimes that is a far more uh

281.699

effective easier and less fragile

283.979

approach to take

286.139

also I think which hopefully it's seen

288.18

is because I've brought it up so much

289.979

time is really the the bulk of the work

292.74

outside of the parse is just is really

295.259

just going to be mapping data usually

298.02

there's

299.639

you know once you get the the parsing

301.8

done the first time you get the parsing

303.24

done particularly hitting out wires can

304.919

be a real pain but after that it tends

308.22

to be it's like okay I can get the data

310.02

it's just where do I need to put it and

311.759

so it becomes you know mapping and

313.44

making sure that the relationships are

315.36

are properly maintained

318.419

formatting your data is going to be a

320.28

critical piece otherwise it's going to

321.9

be really hard to uh generalize it so

325.259

for example you know if you want to

326.699

compare things that happen within a

328.979

certain date range you've got to have

330.12

date formats that match so that you're

332.759

able to do that Apple's Apple comparison

335.22

of one date to another

337.62

uh because of that you're you think of

340.199

the two key ends there is the the

342.479

parsers that are going to pull data in

344.22

and then the formatters that are really

346.5

converting that data to your uh your

349.68

format those are going to be you know

352.32

invaluable tools to have and those are

354.6

going to be critical for you to be able

355.86

to have something that gets the right

357.24

data and spits it out in a way that's

359.4

usable

361.199

and as I mentioned

362.88

whenever you're doing anything like this

364.38

start simple go with like a single

366.24

record or you know something like that

368.16

some of the simplest kind of stuff make

370.38

sure that you can do that validate your

372.06

process and then repeat it for you know

375.539

either more records or for some of the

377.94

more complicated uh record formats to

381.84

try to make sure that you cover

382.8

everything that way you at least have a

384.419

baseline that works

386.88

foreign questions and comments

394.38

so it's kind of funny Rob uh one of the

397.62

things that you're talking about because

399

I I have to deal with screen scrapers a

401.46

lot and

402.419

what exactly are you using selenium for

404.94

uh let's start with that one

407.94

so for selenium

409.8

um I I have used it a lot for scraping

413.28

where

415.44

um

416.16

I'll go in fire up selenium and I will

418.62

record

419.819

uh activity

421.8

and then turn around and with that from

424.199

selenium then you can do a it has like

426.72

it's like a bunch of different

429.06

um

429.9

I see do I have someone even fired up

431.759

here real quick I got it installed on

433.68

this guy

435.36

um

436.68

shoot I don't think I have it so are you

439.02

using it for the screen recording

440.52

capabilities to get the IDS and that

442.74

that you need to get to where you're

444.06

going and then you just translate that

445.979

into Python A lot of times yeah it

449.22

actually like I use Python a lot because

450.96

but like selenium will create it for you

453.06

selenium will generate the script

455.759

in python or Java or there's like or PHP

459.84

whatever it is there's a bunch of

461.699

different languages that you know

462.84

basically what you do is you record

464.22

yourself you know you record your

466.199

activity

467.4

and then tell it to generate the code to

470.22

do that and it will generate for you

472.319

that you know that PHP script or that

474.479

python script that does exactly what you

476.28

just did

477.539

and that at least gives me a

480.599

basically it gives me a starting point

482.4

because sometimes it's not going to give

484.259

the you know like it'll use like say

486.599

XPath uh although usually that's gonna

489.36

be pretty good if it's using like the

490.979

selector CSS selector then that may not

493.56

work for all of the the records that I

496.08

want to do

497.16

and this again typically is because it's

498.9

going to do it for

501.12

I'm going to do it for a single case

503.58

so for example you know like that job

505.44

site thing is I would record myself

506.879

going to monster.com select it you know

510.06

running a search

511.5

open clicking on a job and then

514.8

want to be able to parse that job data

518.099

and suck it into my my system

520.8

and then I would take that and basically

523.2

want to generalize it so looking at what

525.3

you know what I've been given in the

526.8

script it's basically like a it sort of

528.42

kick-starts that process so now I've got

530.82

some of the the basics of it built in

533.04

and then I can go in and I can you know

535.56

tweak the IDS and and some of the

537.66

navigational stuff in that but it

540.06

definitely speeds up the whole scraping

542.399

process

546.14

so you're using it for the record

548.04

playback feature and the generation of

550.56

the code all right yeah no that makes

552.18

sense I was just curious

554.959

in that instance that makes sense

558.3

um I was just trying to figure out why

560.58

you were using that versus just using

562.86

python but that makes sense so you

564.48

basically build the script and then you

565.98

modify the script which is what we do

567.899

for testing yeah as I get I basically

569.7

get selenium to write the python code

571.38

for me for the most part and then I go

573.06

back and you know make the adjustments

574.8

as needed

576.3

anyways

580.56

right so here's an

582.959

kind of a odd observation from my nugget

586.14

to what you're talking about so have you

588.8

considered possibly doing could you do

592.32

some of this with like those streams and

594.54

batching processes where you write

596.16

something that would essentially trigger

598.5

a script or like for your RSS feed so

601.32

you could pull in the RSS feed through

603.06

your stream do that kind of scraping

605.64

formatting and then just inject it into

607.44

your system without having to go the

608.88

other path

610.62

what's the other path

612.54

oh where you have to actually write your

614.7

own uh you know take the art like you

618

have to write something go hit the rxs

619.62

feed whereas if you use the stream

621.54

process you just point it to that and

623.399

say okay go consume this like every you

625.62

know day every hour

628.56

um yeah you can I mean that's it's

631.68

really it's a different

633.6

um problem that you're solving in that

635.82

case because that would you can scrape

638.7

and do it for a single source

641.399

typically what you're doing is you you

643.38

want to you're not just

646.8

um you know on the Fly displaying that

649.68

you're usually what you're doing is

650.82

you're storing that information you're

652.2

doing some sort of a

654.48

um you're doing something with that data

657.48

so it's not just

660.06

um it's not just that you're you're

662.1

redisplaining that you're like taking it

663.899

just reformatting it what you're

665.22

actually doing is you're taking it

666.36

usually from multiple sources and

668.89

[Music]

670.079

then you're converting that into

671.519

something that uh some you know General

673.8

format that then you're going to be

675.42

kicking back out to each other so like

677.579

for the RSS feed it would not be a

680.579

single feed you're actually going out to

682.2

multiple feeds and then you're probably

683.76

you know maybe you've got some

684.779

additional data around that like hey

686.279

I've shown this to that person before or

688.62

uh some other additional data to it it's

691.74

not just a uh that's a different

696

um

696.899

problem set or you know context is the

699.6

idea of okay I've got a stream of data

702.06

that I want to somehow display maybe in

704.339

a more user-friendly you know fashion or

707.279

that I'm going to do some some basic

709.62

processing with it uh maybe do some

711.48

lookups or something like that to give

712.8

it some more context but then just

714.3

basically you know display the screen

717.36

that stream with just uh in a different

720.839

context is different from scraping and

723.3

pulling the data in and then now it's my

725.16

day because when I scrape it uh I did

727.74

like streaming it's really still not my

729.66

data I'm just tweaking the data as it

731.94

gets displayed uh with scraping it's

734.7

like this is my data

736.8

or I'm getting all these sources and I'm

739.019

going to make it at my data so I'm going

740.579

to do some sort of value add that before

743.16

I kick that back out to my users

746.579

okay all right that kind of makes sense

748.86

but you could do a uh consumer that

752.04

essentially does that with this stream

753.42

as well that's what we're doing uh where

756.3

I'm at we're taking in a lot of fees

758.579

converting it and putting it into our

760.44

system into our data so that's why I was

762.48

kind of curious yeah so in your case I

764.459

mean that's that to me is more yeah I

766.2

mean you're it's a it would be in a

768.66

general sense in that case it would be a

770.459

scraping thing but you're not uh in that

773.22

case but that's also where you're

774.42

storing it so you're not just

777.36

um you know redirecting that out you're

779.16

taking all of those disparate sources

780.899

and converting them into your

783

data warehouse or whatever it is and

785.82

then being able to

787.639

combine those essentially to combine

789.779

those data pieces and then uh you know

792.72

send those back out to your users or do

795.54

your systems in a consistent format

802.68

okay that makes sense yeah it was just

805.26

it was just kind of an interesting

806.7

correlation I never thought of it that

808.26

way before until

809.519

you presented this for it after I talked

812.22

about data streams yeah and it is like I

814.92

didn't really I didn't go into streams

817.16

specifically uh because those

820.079

there's some other little

822.899

those are slightly different because

824.519

they're because there's a little more

826.32

complexity in dealing with like a stream

828

than just an RSS feed but at the end of

829.8

the day it's

831.42

there's definitely a lot of similarities

833.279

between those

836.76

uh see other questions comments yeah uh

839.82

question what is what is the use of

842.279

scripting does it make the website

844.26

faster or are you just identifying

846.959

abnormalities on the website

849.24

uh for scripting in the scraping side of

851.94

it yeah

854.22

uh it's really it's to automate it it's

857.519

so that you're not having to go out and

859.079

click and do stuff so it's it's really

861.66

it's in the day program it's it's

864.3

allowing the YouTube programmatically go

867.42

out and surf a website and then grab

869.94

pieces of data and do stuff with them

873.18

so okay so what do you use that data for

875.94

like as a as a developer like it's

878.72

scrapping is not scripting isn't it

883.199

um I mean you can use the data for just

884.94

about anything like for example like

886.44

healthcare there are healthcare

887.699

companies that will go out and scrape

889.44

data or you know in some way form or

892.56

fashion a lot of times it's um they've

894.959

got some sort of like you know feeds or

896.579

streams that they're looking at but then

898.62

they're going to take that data and

900.12

analyze it and kick it back out and say

902.1

hey here's

904.74

um here's some current risks that that

906.72

are showing up in health care work

908.88

that's being going on or here's

912.3

um you know how we're how we're seeing

914.88

insurance claims being processed or yeah

917.94

things like that is that there's a lot

919.44

of different ways that you may utilize

921.06

that data and it's really it's uh it's

924.6

yeah it's you saying Hey I want to

926.459

provide this service or this solution

929.459

for my users but it is more valuable if

933.6

I have data outside of what I currently

935.82

have so I'm going to go out and scrape

938.22

all of these other sources to pull that

939.959

data in and provide you know bigger

942.72

banging for the buck to my customers

945.3

okay all right got it so it's kind of

947.76

like collecting uh user data isn't it

950.1

yeah okay it's collecting data of of

953.579

whatever it may be so like a job site

956.1

would be I said maybe you you want to

958.56

have your you want to create a job site

960.18

that you're gonna that people come visit

962.459

your site and be able to see what jobs

964.079

are available well you know instead of

966.66

you going out and data entering

969.54

thousands of jobs and finding out where

971.279

those jobs are you can go build a bunch

973.56

of scrapers that go out to various sites

975.779

where you know that they you know they

977.459

publish jobs that are available

979.86

and then pull that information into your

981.899

site

983.459

got it thank you

987.899

other questions or comments

996.36

all right so

999.42

it brings us to the all-important thank

1001.459

you so appreciate the time sort of

1003.74

listening through this and just you know

1005.24

thinking about some of these things it's

1006.62

a little different

1008.18

uh for what some people you know may

1009.92

have thought about scraping so hopefully

1011.54

it's it's been useful to you uh as

1014.12

always if you have any questions or

1015.199

comments that you know come up after the

1017.12

the presentation asks sometimes they do

1019.399

the interesting email info developer.com

1022.22

we have a contact desk perform out uh

1024.98

contact us form out at developmentor.com

1027.679

as well

1028.819

you can follow us on Twitter at

1031.04

developreneur or DMS air and then also

1034.16

we've got a Facebook page uh developer

1036.5

out on Facebook so

1038.54

we just uh we appreciate doing this

1040.459

appreciate your time as we just uh work

1043.22

together try to make every developer

1045.079

better

1046.699

having a good day

1061.88

foreign