Detailed Notes
Database testing is fraught with challenges. We have to find new ways to create a database replica, match the data in size and meaning as well as validate changes. These obstacles add up to this sort of testing requires automation and careful consideration. In this session, we look at the pitfalls and ways to properly test and validate your database.
Database Testing And Automation We cover this topic through looking at the issues, looking at the basic tools, and then what automation is needed. The discussion stays surface level and does not go deep into any specific tools. That way you can find this presentation useful no matter which database or environment you find your self facing.
The Mentor-Mastermind Group This series comes from our mentoring/mastermind classes. These classes are virtual meetings that focus on how to improve our technical skills and build our businesses. The goals of each member vary. However, this diversity makes for great discussions and a ton of educational value every time we meet. We hope you enjoy viewing this series as much as we enjoy creating it. As always, this may not be all new to you, but we hope it helps you be a better developer. Drop us a line to find out when the next one is so you can join our group.
Transcript Text
[Music] so this time i want to sit down and talk we've we had some requests and we've actually sort of talked about this on the fringes over the years i think it made sense makes sense to have a presentation specifically on talking about the the challenges and the uniqueness of dealing with testing a database and maybe some potential automation around that the best thing we've talked about prior to this for doing a lot of these this kind of work and you can see a lot of those discussions was when we talked about django within django python projects you have a really nice way to maintain your database through code we've talked a little bit about some of the scripts and some ways to essentially rebuild a database on the fly similar tools if you use if you're in the php world if you use laravel it has some of that there are there's definitely a couple different java libraries that you can do some things like that you see sharp related stuff so if you want to do a little more maintenance of a database within code for specific language you can always google it and look for the related tools that's not something specific we're going to cover this time we're going to look more at uh general purpose considerations and things like that for for database testing specifically we're going to talk about first some of the data based testing challenges the things that are a little bit unique to dealing with the database and then get into some of those details one of the main things in a database that you want to test is making sure that the structure and data integrity are maintained uh key is being able to save and load your data correctly and consistently and reliably some of the more i will call advanced topics i guess in the world database of triggers constraints and keys and then sort of wrap it up talking about how do you put all this stuff together so that you can have some sort of a regression test or some way to repeat your tests with the database so first let's look at some of the challenges that are really unique to database testing the first one is that it's not just a functionality thing so typically when you're testing code you have a you have some inputs and you have an output and that's all you have to worry about databases are almost by design by nature in a lot of cases it's a side effect that you're actually looking for because you're going to have some inputs they're going to those inputs are going to either change or store or remove data you know somehow monkey with the data in a database and then you'll get some sort of output well this thing the output may be 100 correct but then you need to still verify that the changes that were supposed to be done in the database were done so that's another you're almost testing side effects which makes it you know a little a little less direct in how you do things often in a database probably one of the biggest challenges is that our testing leaves a trail whether that's through audit logs or whether that's through data that we create and then update we have to somehow deal with that data we don't want to have our test data sitting in a production database for example and so part of that is cleaning up we have to have some sort of a a mechanism to go in to test and to essentially remove our work so our test doesn't leave a mark the on the data a clean test environment can be very difficult you need in some cases you need data that is not that does not have like patient information or personal identifying information or any sort of like financial ideas and stuff like that and they have to be in some cases it's the challenges it has to be a certain format but it can't be valid for example a credit card number now you there are uh test credit card numbers that are out there that you can use for such things but still there's sort of a limit to you know how often you can use some of those and without repeating your data and then again you know we get into you don't you don't want to mess up the production data you don't want to um have stuff out there that people may think is is production day they may think it's real even though it's test and then you have to be able to in some cases it's how do i build this from scratch which is also sort of the reproducing steps it's like hey if i want to put this database on a new machine is that even possible in some cases it's not in some case the database is this living thing that nobody can actually build it from scratch anymore it's just a bunch of changes that have gone on and other than you know dump the database and import it somewhere else and sometimes even that's really challenging you don't have a way to have a test environment versus a production environment data integrity so how do we how does our data uh interrelate with data within the database so let's talk about data integrity what is it now you'll hear about it when you're dealing with the database but what does that mean well first it means that the data is a type that makes sense for example if you're dealing with financial transactions those numbers need to be you know probably two decimal float type or you know two or uh two decimal precision like a decimal a float something along those lines if you're if it's an integer you're going to lose information if it's a string then it may be very complicated to do the kind of work you need to if you're building a general ledger for an accounting system and all your numbers are actually stored as strings or maybe even something more complicated than that then you're going to have all kinds of issues and making equating values searching for values things of that nature we also need to have with data integrity so it's the type but it's also that when we have relationship data between tables that those are consistent and this often is going to be in foreign key relationships so for example let's say you have a person record in the database you have a table for people and then people can have multiple addresses well the way that you link that is the person record will have somewhere a list of address ids that it can go get specific address information based on those address ids you know it could be one for your home address you know 83 is your work address id and then some other number knows you know something else is your vacation home id and so when you're searching for data for that person you're going to go over to the address table and you're going to go grab for those ids you're going to grab the address well part of data integrity is that that id is pointing to a valid and the correct address so you can't just go in if you go in and just change the address id for that user from one to two that may be a completely wrong and different address it could even be invalid so one you have to make sure it actually exists you don't want it to just go off you know be a a non-existent id so if i have a id of one for my address i'd better have a record in the address table that has one as the id and then two it needs to be uh correct it needs to be the right one and you'll see this in uh in the results of testing of data sometimes when there's like a hard-coded value or something like that then you'll see that the ids are not what they should be it's not generating new ids it's not pointing to the right id uh it may not even be storing the right values so maybe it's storing the id to the phone table in the address column or vice versa things like that along with this we need to maintain the integrity of relationships which is uh we'll talk a little bit but it's the idea of cascading and things like that so if i delete an address record then i need to make sure that i delete the pointer to that address record in you know the person table in my prior example if i delete that person i probably need to delete those three address records that were tied to that person otherwise i have orphan records i have these things that nothing refers to and it becomes just junk trash data that's sitting out in the database so we need to make sure that our data puzzle pieces to build uh whatever you know whether it's a picture of a person or it's a some sort of a contract or business transaction or whatever it is we need to make sure that all of the pieces of data that we need to use to create that construct fit together properly that they have the right ids that they have the right types and that you can recreate it so if you this is the saving and loading idea if i save my data for a person and all of the related stuff then i need to have all of those relationships and the data integrity put together so that i can also come back and query the database and get all of that data back it's one thing to save it it's another thing to be able to reliably pull it back out to load it from the database so it's a it's a key part of our our testing hey rob before you jump ahead i got a quick question on that sure so if you go back to the slide so with the data integrity would you also include uh data encryption with this for like hipaa and pci compliance um yeah actually that's an excellent question it's not something i specifically have called out in this presentation so that's a that's an excellent question good point is you need to be able to it's it it is similar to the what you save you need to be able to load back but there are a couple of different ways that things can be encrypted when you store it in the database uh sometimes it's encrypted at the database level sometimes you encrypt it going in and you decrypt it coming back out and you need to make sure you need to ensure that if you encrypt you know something and put it into the database then you need to be able to either decrypt it when you pull it back out or reliably encrypt the same string and be able to check it against that value and for that example i'm thinking about passwords because typically what happens is we will encrypt a password and it's a essentially it's a one-way encryption or some sort of hash where we can't find out what that password is there's no way we can go back and figure out what that password was originally but what we can do is we can take a string and encrypt it and say does that encrypted result match the one that's been stored if it does then that means it's you know it was the same password if not then it's not the same password so encryption is some things you have to deal with in general but definitely it's going to show up more often in a database and some of the testing there to make sure that you have the uh and this again goes back to some of the data integrity pieces to make sure that you are consistent in the encryption library and method that you use uh pretty much it usually needs to be across the database so you're not having to figure out well this gets encrypted this way that gets encrypted that way if you do that's just more testing headache because you just have to make sure that in each specific case that the encryption and the decryption are properly paired any other question is that sort of where you were thinking mike yeah i just want to make sure we touched on that because i think that is something that really needs to be considered with data integrity when testing yes i have agreed so great we're going to catch a miss there so continuing sort of in this saving and loading data theme so the first thing was we talked at the start about the idea of this side effect that we have so we have a function we send some data there's some sort of output but along the way it's maybe saving or loading data so the saving side is you have to actually go to the database when you make this call if it saves data look at it what was saved when was it safe you need to worry about was it created if it needed to be created was it updated if it needed to be updated was it deleted if it needed to be deleted and then within that particularly when you think about i guess there's two things is if you edit you want to make sure that you can edit every single one of the fields and save the change and see that in the database if you are creating then you need to be able to test and ideally you're going to go through and test each of the fields within that record as well to make sure that you're handling null cases properly or invalid value cases properly and of course obviously the happy path where you have proper values on the whatever you save you should be able to load that back in somewhere and verify that all of those values are properly loaded and this now in some cases you're going to see things like audit trail data there's going to be some additional data that you never really it needs to it needs to appear in the database and that's actually the only place it's going to appear is when you're looking in the database there's no programming interface or anything like that that uses that but that's even essentially side effects to doing an update or a creation or even a deletion maybe of data is to make sure that the other things that need to be in place whether it's an updated time stamp whether it's some sort of a deleted records audit file somewhere or audit log somewhere uh making sure that that's that those things are in place that those things are properly triggered which we're gonna talk a little bit more here we talk about triggers also want to make sure with the saving and loading that a partial um works so that if we do a record where we're just putting like maybe some blank values in that it handles that properly based on the requirements and if we load that back in that that's also handled according to the requirements and by according the requirements sometimes empty values are not valid or should not be allowed in some cases they can and that's going to vary by the requirements you also need to a really key potential for a problem which may or may not exist in the the production server i guess is doing initial loads and essentially an empty database which may not be completely empty but it's sort of like hey i've got this new customer maintenance database and it starts with zero customers so you need to make sure that you can work with that that you can add that first customer that when you add that data when you save that first record then it also saves all of the other pieces that it needs to in order to be able to pull that record back because in some cases you'll see going back to the earlier example where you can save a customer record but maybe you don't at that point save addresses and so when you later try to pull that customer you have an issue because it can't pull an address and maybe there's a you know it's a requirement that you have to have an address so again um some things to look at that are just sort of like those common kind of issues you run into with dealing with databases and testing the data sorry rob i have one more question sure if you go back so for the save and load data would you also consider adding a bullet point here for testing permissions to make sure that different account types can do the correct permissions like they can update or not update particular tables yes you definitely want to take security into account um this is a that's usually more of an application level thing although sometimes that this is where database test testing can be a challenge is because sometimes there's a lot of logic in the database and sometimes there's basically none and depending on how that goes you may you may need to dig into data being updated only when it can and that really goes to really it's a nice little dovetail into the next slide you
Transcript Segments
[Music]
so
this time i want to sit down and talk
we've we had some requests and we've
actually
sort of talked about this on the fringes
over the years
i think it made sense makes sense to
have a presentation specifically on
talking about the the challenges and the
uniqueness of dealing with testing a
database and maybe some potential
automation around that
the best thing we've talked about prior
to this for doing a lot of these this
kind of work and you can see a lot of
those discussions was when we talked
about django
within django python projects you have a
really nice
way to maintain
your database through code
we've talked a little bit about some of
the scripts and some ways to essentially
rebuild a database on the fly
similar tools if you use if you're in
the php world if you use laravel it has
some of that
there are
there's definitely a couple different
java libraries that you can do some
things like that you see sharp related
stuff
so if you
want to
do a little more maintenance of a
database within code for specific
language you can always google it and
look for the related tools that's not
something specific we're going to cover
this time we're going to look more at
uh general purpose
considerations and things like that for
for database testing
specifically we're going to talk about
first some of the data
based testing challenges the things that
are a little bit unique to dealing with
the database
and then get into some of those details
one of the main things in a database
that you want to test is making sure
that the structure and data integrity
are maintained
uh key is being able to save and load
your data correctly and consistently and
reliably
some of the more i will call advanced
topics i guess in the world database of
triggers constraints and keys
and then sort of wrap it up talking
about how do you put all this stuff
together so that you can
have some sort of a regression test or
some way to repeat your tests
with the database
so first let's look at some of the
challenges that are really unique to
database testing
the first one is that it's not just a
functionality thing so typically when
you're testing code you have a
you have some inputs and you have an
output
and that's all you have to worry about
databases are almost by
design by nature in a lot of cases it's
a side effect that you're actually
looking for because you're going to
have some inputs they're going to those
inputs are going to either change or
store or remove data you know somehow
monkey with the data in a database and
then you'll get some sort of output
well
this thing the output may be 100 correct
but then you need to still verify that
the changes that were supposed to be
done in the database were done
so that's another
you're almost testing side effects which
makes it you know a little
a little less direct in how you do
things
often in a database probably one of the
biggest challenges is that our testing
leaves a trail whether that's through
audit logs or whether that's through
data that we create and then update
we
have to somehow deal with that data we
don't want to have our test data sitting
in a production database for example
and so part of that is cleaning up we
have to have some sort of a
a mechanism to go in to test and to
essentially remove our work so our test
doesn't leave a
mark the on the data
a clean test environment can be very
difficult
you need
in some cases you need data that is not
that does not have like
patient information or personal
identifying information or
any sort of like financial ideas and
stuff like that
and they have to be
in some cases it's the challenges it has
to be a certain format but it can't be
valid for example a credit card number
now you there are
uh test credit card numbers that are out
there that you can use for such things
but
still there's sort of a limit to you
know how often you can use some of those
and without repeating your data
and then again you know we get into you
don't you don't want to mess up the
production
data you don't want to
um
have stuff out there that people may
think is is production day they may
think it's real even though it's test
and
then you have to be able to in some
cases it's how do i build this from
scratch
which is also sort of the reproducing
steps it's like hey
if i want to
put this database on a new machine
is that even possible in some cases it's
not in some case the database is this
living thing
that nobody can actually build it from
scratch anymore it's just a bunch of
changes that have gone on
and other than you know dump the
database and
import it somewhere else
and sometimes even that's really
challenging
you don't have a way to have a test
environment versus
a production environment
data integrity so how do we how does our
data uh interrelate with data within the
database
so let's talk about data integrity
what is it now you'll hear about it when
you're dealing with the database
but what does that mean well first it
means that the data is a type that makes
sense for example
if you're dealing with financial
transactions those numbers need to be
you know probably two decimal float type
or you know two or
uh two decimal precision
like a decimal a float something along
those lines if you're if it's an integer
you're going to lose information
if it's a string
then it may be very complicated to do
the kind of work you need to if you're
building a
general ledger for an accounting system
and all your
numbers are actually stored as strings
or maybe even something more complicated
than that
then you're going to have all kinds of
issues and
making equating
values searching for values things of
that nature
we also need to have
with data integrity so it's the type but
it's also
that
when we have relationship data
between tables that those
are consistent
and this often is going to be in foreign
key relationships so for example
let's say you have a person
record in the database you have a table
for people
and then people can have multiple
addresses
well the way that you link that is the
person record will have somewhere a list
of address ids
that it can go get specific address
information based on those address ids
you know it could be
one for your home address you know 83 is
your
work address id and then some other
number knows you know something else is
your
vacation home id
and so when you're searching for data
for that person
you're going to go over to the address
table and you're going to go grab for
those ids you're going to grab the
address
well part of data integrity is that that
id is pointing to a valid and the
correct address
so you can't just go in if you go in and
just change the address id for that user
from one to two
that may be a completely wrong and
different address it could even be
invalid
so one you have to make sure it actually
exists you don't want it to just go off
you know be a
a non-existent id so if i have a id of
one for my address i'd better have a
record in the address table that has
one as the id
and then two it needs to be
uh correct it needs to be the right one
and you'll see this in
uh in the results of testing of data
sometimes when there's
like a hard-coded value or something
like that then you'll see that the ids
are not what they should be it's not
generating new ids it's not pointing to
the right id
uh it may not even be storing the right
values so maybe it's storing the id to
the phone table in the address column or
vice versa things like that
along with this we need to maintain
the integrity of relationships which is
uh we'll talk a little bit but it's the
idea of cascading and things like that
so if i
delete an address record
then i need to make sure that i delete
the pointer to that address record in
you know the person table in my prior
example if i delete that person
i probably
need to delete those three address
records that were tied to that person
otherwise i have orphan records i have
these things that nothing refers to and
it becomes
just junk trash data that's sitting out
in the database
so we need to make sure that our
data puzzle pieces to build uh
whatever you know whether it's a picture
of a person or it's a
some sort of a contract
or
business transaction or whatever it is
we need to make sure that all of the
pieces of data that we need to use to
create that construct
fit together properly that they have the
right ids that they have the right types
and
that
you can recreate it so if you this is
the saving and loading idea if i save my
data
for
a person and all of the related stuff
then i need to have all of those
relationships and the data integrity put
together so that i can also come back
and
query the database and get all of that
data back
it's one thing to save it
it's another thing to be able to
reliably pull it back out to load it
from the database
so it's a it's a key part of our our
testing
hey rob before you jump ahead i got a
quick question on that sure
so if you go back to the slide
so with the data integrity would you
also include uh data encryption with
this for like hipaa and pci compliance
um yeah actually that's an excellent
question it's not something i
specifically
have called out in this presentation so
that's a that's an excellent question
good point
is
you need to be able to it's it it is
similar to the what you save you need to
be able to load back
but there are
a couple of different ways that things
can be encrypted when you store it in
the database uh sometimes it's encrypted
at the database level sometimes you
encrypt it going in and you decrypt it
coming back out and you need to make
sure you need to ensure that if you
encrypt you know something and put it
into the database
then you need to be able to either
decrypt it when you pull it back out or
reliably encrypt the same string
and be able to check it against that
value and for that example i'm thinking
about passwords because typically what
happens is we will encrypt a password
and it's a essentially it's a one-way
encryption
or some sort of hash where
we can't
find out what that password is there's
no way we can go back and figure out
what that password was originally but
what we can do is we can take a string
and encrypt it and say does that
encrypted
result match the one that's been stored
if it does then that means it's you know
it was the same password
if not then it's not the same password
so encryption is
some things you have to deal with in
general but definitely it's going to
show up more often in a database and
some of the testing there to make sure
that you
have the uh and this again goes back to
some of the data integrity pieces to
make sure that you are consistent in the
encryption library
and method that you use
uh pretty much it
usually needs to be across the database
so you're not having to figure out well
this gets encrypted this way that gets
encrypted that way
if you do
that's just more testing headache
because you just have to make sure that
in each specific case
that the encryption and the decryption
are
properly paired
any other question is that
sort of where you were thinking mike
yeah i just want to make sure we touched
on that because i think that is
something that really needs to be
considered with data integrity when
testing
yes i have agreed
so great we're going to catch a miss
there
so continuing sort of in this
saving and loading data theme
so the first thing
was we talked at the start about the
idea of this
side effect that we have so we have a
function we send some data there's some
sort of output but along the way it's
maybe saving or loading data
so
the saving side
is
you have to actually go to the database
when you make this call if it saves data
look at
it what was saved
when was it safe you need to worry about
was it
created if it needed to be created
was it updated if it needed to be
updated was it deleted if it needed to
be deleted
and then within that particularly when
you think about
i guess there's two things is if you
edit
you want to make sure that you can edit
every single one of the fields and save
the change and see that in the database
if you are creating then you need to be
able to test and ideally you're going to
go through and test
each of the fields within that record as
well to make sure that you're handling
null cases properly or invalid value
cases properly and of course obviously
the happy path where you have proper
values
on the
whatever you save you should be able to
load that back in somewhere and verify
that all of those values
are properly loaded
and this
now in some cases you're going to see
things like audit trail data there's
going to be some additional data that
you never really
it needs to it needs to appear in the
database and that's actually the only
place it's going to appear is when
you're looking in the database there's
no
programming interface or anything like
that that uses that
but that's even essentially side effects
to
doing an update or a creation or even a
deletion maybe of data is to make sure
that the other things that need to be in
place whether it's an updated time stamp
whether it's some sort of a
deleted records audit file somewhere or
audit log somewhere
uh making sure that that's that those
things are in place that those things
are properly triggered which we're gonna
talk a little bit more here we talk
about triggers
also want to make sure with the saving
and loading that a partial
um works so that if we do
a record where we're just putting like
maybe some blank values in that
it handles that properly based on the
requirements and if we load that back in
that that's also
handled according to the requirements
and by according the requirements
sometimes
empty values are not
valid or should not be allowed in some
cases they can and that's going to vary
by the requirements
you also need to
a really key
potential for a problem
which may or may not
exist in the the production server i
guess
is doing initial loads and
essentially an empty database which may
not be completely empty but it's sort of
like hey
i've got this new customer
maintenance database and it starts with
zero customers
so you need to make sure that you can
work with that that you can add that
first customer
that
when you add that data when you save
that first record then it also saves all
of the other pieces that it needs to
in order to be able to pull that record
back
because in some cases you'll see
going back to the earlier example
where you can save a customer record but
maybe you don't at that point save
addresses and so when you later try to
pull that customer you have an issue
because it can't pull an address and
maybe there's a you know it's a
requirement that you have to have an
address
so again
um some things to look at that are just
sort of like those common kind of issues
you run into with dealing with databases
and testing the data
sorry rob i have one more question
sure
if you go back
so for the save and load data would you
also consider adding a bullet point here
for testing permissions to make sure
that different account types can
do the correct permissions like they can
update or not update particular tables
yes you definitely want to take security
into account
um
this is a
that's usually more of an application
level thing
although sometimes that this is where
database test testing can be a challenge
is because sometimes there's a lot of
logic in the database and sometimes
there's
basically none
and
depending on how that goes
you may
you may need to dig into
data being updated
only when it can and that really goes to
really it's a nice little dovetail into
the next slide
you