Detailed Notes
Database testing is fraught with challenges. We have to find new ways to create a database replica, match the data in size and meaning as well as validate changes. These obstacles add up to this sort of testing requires automation and careful consideration. In this session, we look at the pitfalls and ways to properly test and validate your database.
Database Testing And Automation We cover this topic through looking at the issues, looking at the basic tools, and then what automation is needed. The discussion stays surface level and does not go deep into any specific tools. That way you can find this presentation useful no matter which database or environment you find your self facing.
The Mentor-Mastermind Group This series comes from our mentoring/mastermind classes. These classes are virtual meetings that focus on how to improve our technical skills and build our businesses. The goals of each member vary. However, this diversity makes for great discussions and a ton of educational value every time we meet. We hope you enjoy viewing this series as much as we enjoy creating it. As always, this may not be all new to you, but we hope it helps you be a better developer. Drop us a line to find out when the next one is so you can join our group.
Transcript Text
[Music] which is when you talk about triggers and constraints and keys is that what you have triggers are basically going to be if i update delete or insert a record then here's some extra things that need to be done which is usually um it may be cascading kind of functionality so if i delete this record i need to delete its related records it may be a essentially a time stamp kind of thing that says okay if i insert this record i need to somewhere create and you know either set a value or insert a record that says that this record was created at this time or this record was updated at this time or even sometimes it's this record at this time was updated with these values there's also going to be constraints there's going to be things where you can't there's certain records that you're not going to be able to delete unless you first delete a parent record or records that are pointing to it and sometimes this fall sometimes this is by design for the data integrity sometimes they're going to be these constraints and even some of the triggers are going to be essentially their security and permission related kinds of things so there may be fields in a record that are not supposed to be updated they're only updated by maybe some triggers and constraints as opposed to directly updating and this becomes again where you're gonna have to look at the requirements and see how it's laid out because it may be that there are restrictions to how data is inserted or updated it may be that there is there should be no direct way which is not uncommon where there will be cases where there's no direct way other than within the database to [Music] to change certain data and sometimes even within the database unless you have a specific level of access then you know where basically you have to you know just like turn off triggers or constraints you're not going to be able to change the data so security can be application level that you you would see and deal with as you normally do but sometimes you're going to run into such things in done at the database level as well so it's it's not only in the save and load going back to that idea a little bit of making sure that things that uh you want to change change there's also there's things that shouldn't change and making sure that they don't sometimes it's as simple as a create versus an update timestamp you and there are you'll see bugs where every time you update it updates the created timestamp which is incorrect you actually need to you want to have a timestamp of when that record was first created and then separately you have something that shows when it was last changed when it was last updated and so you on an update you don't want to see that creation date change and this stuff again can be very uh complicated and you may have to have very specific tests that you're creating to validate those kinds of those restrictions and when thinking about triggers constraints and keys um i've sort of cut touched these but i went from a bullet point of view it may help a little bit is it one uh up here in the top right you really need to deal with updates inserts and deletes those are really the the three main actions that will prob that may have related triggers and constraints and so you have to consider each of those three actions almost with every record or table and how that how you want to test that and verify it there may be side effect requirements which are your triggers your constraints related keys and things of that nature so it an update may not be as simple as i just checked those couple of values i updated there may be other things i have to check to make sure that those side effects that happen when data is changed are properly being done you always have to do null testing essentially empty you know invalid values invalid relationships missing data partial data and so and that can be you know sometimes it can be very uh a lot very detail-oriented very just tedious because there's just a lot of data if you've got a field that's got uh you know records got 20 different columns on it then you may have to do effectively you know at least 20 different tests uh i should be more way more than that you know for each field testing what if i send it what if i set something invalid what if i set something empty what if i do the value you know the happy path and with all of these that we've already talked about there needs to be the proper uh messages or notifications that are involved with this because sometimes these things are written and it's just if it breaks it breaks and then your application's dead but typically what should happen is there should be some sort of a a response that comes back so that you can see that hey this is you know this is invalid data that that was attempted to work with you need to with the the testing one of the challenges sometimes there's going to be unique values i think a lot of in current world a lot of times it may be a phone number but it may also be like an email address that based on requirements these things need to be unique within your data within the sets and you just it's a challenge to generate all those unique things now the nice thing is that there are uh key generators and unique id generators and things like that there's some random tools out there uh actually even we have an app that we talked about way back that is our own little random data generator built for databases and to put random stuff in and we need to be able to do that sometimes to get a valid amount of test data but then when we do so we also need to make sure that is that we're tracking the ids that were generated because we're going to talk about cleaning up later and we also need to make sure that we did not have a we sort of like where we miss coverage in our testing by using too much of the same thing so um you know sometimes they're like for example testing a a user we use the name like john smith or john doe a whole lot but you also want to test other things is maybe throw some special characters like that that you would see like a hyphen um a comma things like that within a name maybe you also want to do like longer shorter names uh what happens if it's a first name last name versus first middle last and things like that that you do see these in other testing but it's much more prevalent in the database world because we're dealing with the data itself and the formats of it and all of this i've related a couple times i've mentioned this couple times is the whole idea of cascading updates and changes so there are potentially other records they're going to change based on the update that you made uh it may be that there's uh like an address record somewhere that needs to be changed there may be something that says that this record changed so you may have to go to some other table in the database and make sure that the change that you made was properly recorded again this goes back to the requirements and the challenge on all of this is that it is not uncommon for the requirements to be sparse related to how some of this stuff works that it's it's the kind of stuff that is not uncommon for there to be things that are missed and it's not until you start testing it and you say hey i had this happen what is that supposed to work is that not supposed to work how is that supposed to work and so sometimes you're going to have to go back and you're basically validating requirements as part of going through the database testing and i've mentioned these but now specifically audit logs and tracking when we're doing updates like i said often there's going to be some sort of trigger constraints and things that make sure that records exist that they don't exist when they shouldn't exist and along with that we're going to have some time stamps and maybe other logging that says this data was created at this day maybe by this person so not just time stamps but some sort of a user id needs to be part of that um and in part of your testing you need to make sure that the the user ids that are tracked works for the valid users being able to do that this may even include permissions inside the data itself so you may have a permission table somewhere that you need to make sure that this person can update this if they have this you know the correct permission and if they don't have that permission that they can't update or create that record and again it's going to go back to requirements and making sure that those are laid out properly and that you can test and validate those properly there's usually going to be differences between inserting or creating records and updating records modifying records and make sure that the side effects related to that the audit logs uh whether it's a a text log file somewhere or it may be a database table somewhere it may be fields within that record or it may be a whole other table that does the that provides that audit log there's things like shadow tables and that where it's essentially as a record is modified in one table there's a record entered somewhere else in this shadow table that says this record was modified at this time here's how it's modified so you need to make sure that those are properly being logged those those entries are being created properly it may be a database change it may be a file it may be both so there may be some sort of database level logging that you want to verify and this is sort of you one i want to verify that it is displaying the things it needs to but also in a security in a world of hipaa and all these other requirements that are out there you want to make sure that the values are not in a log file that shouldn't be in a long time so for example you don't want to be able to look at a log file and see valid credit card numbers that that should not exist you don't really and if it does if there for some reason that's needed for some sort of logging then that needs to be properly secured all that kind of good stuff so you want to make sure that and score goes back a little bit back to that correct encryption concept values that need to be that should be encrypted you need to make sure that they were properly encrypted in the database so if i save a password and it's supposed to be i'm the only one that ever knows what that password is i should not be able to go look in a database table somewhere and see what that database or what that password was you need to verify this goes back to really your relationships within the data is make sure that when the audit logs that you're verifying those that it's also the the user ids or names are properly being tracked again it's not uncommon that you'll have sort of a essentially a hard-coded value somewhere where everything is uh and it may not even be hard-coded it may be how the application works that the application logs in as you know like the super user for the database and all the transactions are then based on that super user as opposed to the actual user you know that like if it's me and i if i make a change i need to make sure that my name my id is tied to that change as to posted just being the super user and that's again not an uncommon occurrence where the application connects as a certain user and then you see all the audit trails just showed that user making changes that application user as opposed to an actual logged in user that's that's not just the application but it's a person that you can tie that back to and again it goes to the idea of what i put into the database i need to be able to get that back out properly so if i if i know that rob is updating this data and i need to have audit trails based on that i should be able to get back out of the database that rob made this change at this time and again this goes back to your your requirements as well making sure you understand them and that they properly detail what needs to be done within that cleaning up your mess this is often one of the biggest challenges one is having test data that works or that matches needs sometimes test data is just a handful of records when which you really need to test is at scale so you may need to have maybe even millions of records available for your testing to make sure that you can test the performance make sure that things work even in the what's going to be the at least close enough to the production environment and it's not uncommon in the database world that it works awesome when you've got 10 records in there it does not work at all when you put a hundred thousand records or a million records in there so you need to have properly sized test data as far as amount but then we have the cleanup aspect of if i go in and especially whether it's i guess it really doesn't matter whether i manually do it or script it if i run tests and they change data or create data or delete data then there may be there's going to be audit log entries related to that there may be test data that's now sitting around my database so part of our cleanup is we need to have ideally and sometimes it's not possible for the most part we want to be able to roll back those changes so i can go in i can create i have sort of a test script but then i have a cleanup script that goes back and for everything that i just did in my testing it's going to clean that stuff back up either mark it inactive or delete it if that's possible ideally i want to be able to remove the audit trail of that as well because if somebody's doing audit trail research they don't want to have all these test entries that they're having to dig through so sometimes i have that that side effect of data changes that i need to take into account as part of my cleanup there may be in some cases there may be auto-generated values keys and things like that that are generated that i now want to reset those counters for example if i am starting from essentially a clean database i may have this integer related id that starts at one and if i go and do a bunch of testing i don't want it now to start at you know 4058 because that's how many test records i created so i may have to actually go back and reset some of those counters and id generators that's not as common it shouldn't be an issue but sometimes it is uh particularly with regard to smaller data sets that are essentially lookup values like if you want to have if you have like account types let's say it's a financial system then you have these different account types and there's only 10 account types well you may find it much better to have those account type numbers at basically one through ten and if you're doing a bunch of testing related to that you don't want it simply to blow it out so that the numbers are non-sequential and are in the thousands or higher you may not be able to fully clean up so instead you may need some way to tag your test data sometimes this is easy to do sometimes it's a matter of maybe there's a note or a description field or some sort of free text enough that you can put you know the word test in it or something like that or test record do not touch test account do not use sometimes it's certain id values so you can go back and maybe make sure that stuff is tagged in a certain way or configured in a certain way it may be that how you do it you just need to communicate that when i test these are things that are tags or markers so you can tell that this was test data and not something that is i don't say real data the idea of a test image or a staging image or even data reloads is becoming i was going to say it's got gotten a little better but sometimes now with cloud databases and things like that not as much because when you're dealing with a cloud database sometimes there's there's a cost to those transactions and it's expensive to run all those tests and then run a bunch more transactions to clean it up one way to get around some of that is basically drop a database and rebuild it sometimes that's really the approach you want to take is that you want to before your test start do a database backup you go in you run all your testing and then when you're done and your cleanup is simply that you re restore the database from that backup point that you took sometimes that's the easiest way to get around it is just let's skip all the trouble and i'm just gonna back it up and when i'm gonna i'm done i'm gonna restore the database in production that may not be feasible but particularly with the idea of if you have a test database that may be something that you look at maybe it exists if not maybe you recommend it that there is some sort of a test image in some cases specifically with continuous integration and development deployment there may be a database image that is just reloaded on a on a daily or weekly basis so when i do a build maybe one of the things i do is i just go restore a pristine version of the database make some database you know if there's changes that have to be done to that database i make the changes and then i boom take a snapshot and the next time i want to get a i'll call it clean database and quotes then i just reload that image that may or may not be feasible but a lot of times that's going to be the the best and most efficient way to do it because then you don't have to deal with what did i change you just go back to that essentially that save point you
Transcript Segments
[Music]
which is when you talk about triggers
and constraints and keys is that
what you have triggers are basically
going to be
if i update delete or insert a record
then here's some extra things that need
to be done which is usually um
it may be cascading kind of
functionality so if i delete this record
i need to delete its related records
it may be a essentially a time stamp
kind of thing that says okay if i
insert this record i need to somewhere
create and you know either set a value
or
insert a record that says that this
record was created at this time or this
record was updated at this time
or even sometimes it's
this record at this time was updated
with these values
there's also going to be constraints
there's going to be things where you
can't
there's certain records that you're not
going to be able to delete unless you
first delete
a parent record or
records that are pointing to it
and sometimes this fall sometimes this
is by
design for the data integrity sometimes
they're going to be
these constraints and even some of the
triggers are going to be
essentially their security and
permission related kinds of things
so
there may be fields in a record
that are not supposed to be updated
they're only updated
by
maybe some triggers and constraints as
opposed to directly updating
and this becomes again where you're
gonna have to look at the requirements
and see
how
it's laid out
because it may be that there are
restrictions to how data is inserted or
updated it may be that there is there
should be no direct way
which is not uncommon where there will
be cases where there's no direct way
other than within the database to
[Music]
to change certain data and sometimes
even within the database unless you have
a specific level of access
then you know where basically you have
to you know just like turn off triggers
or constraints
you're not going to be able to change
the data
so security can be
application level that you you would see
and deal with as you normally do
but sometimes you're going to run into
such things in done at the database
level as well
so it's it's not only in the save and
load going back to that idea a little
bit
of
making sure that
things that uh you want to change change
there's also there's things that
shouldn't change and making sure that
they don't sometimes it's as simple as
a create versus an update timestamp
you
and there are you'll see bugs where
every time you update it updates the
created timestamp which is incorrect you
actually need to
you want to have a timestamp of when
that record was first created and then
separately you have something that shows
when it was last changed when it was
last updated
and so you on an update you don't want
to see that creation date
change and this stuff again can be very
uh complicated
and you may have to have very specific
tests that you're creating to validate
those kinds of
those restrictions
and when thinking about triggers
constraints and keys
um
i've sort of cut touched these but i
went from a bullet point of view it may
help a little bit
is it one uh up here in the top right
you really need to deal with updates
inserts and deletes those are really the
the three main
actions that will prob that may have
related triggers and constraints
and so you have to consider each of
those three actions
almost with every
record or table
and how that how you want to test that
and verify it
there may be side effect requirements
which are your triggers your constraints
related keys and things of that nature
so it
an update may not be as simple as i just
checked those couple of values i updated
there may be other things i have to
check to make sure that those side
effects that happen when data is changed
are properly being done
you always have to do
null testing essentially empty you know
invalid values
invalid relationships missing data
partial data and
so and that can be you know sometimes it
can be very
uh a lot very detail-oriented very just
tedious because there's just a lot of
data if you've got a field that's got
uh you know records got 20 different
columns on it then you may have to do
effectively you know at least 20
different tests
uh i should be more way more than that
you know for each field testing what if
i send it what if i set something
invalid what if i set something empty
what if i do the value you know the
happy path
and with all of these that we've already
talked about
there needs to be the proper
uh messages or notifications that are
involved with this because sometimes
these things are written and it's just
if it breaks it breaks and then your
application's dead
but typically what should happen is
there should be some sort of a
a response that comes back so that you
can
see that hey this is you know this is
invalid data that that was attempted to
work with
you need to with the
the testing one of the challenges
sometimes there's going to be unique
values
i think a lot of in current world a lot
of times it may be a phone number but it
may also be like an email address that
based on requirements these things need
to be unique within
your data within the sets
and you just it's a challenge to
generate all those unique things
now the nice thing is that there are
uh key generators and unique id
generators and things like that there's
some random tools out there
uh actually even we have an app that we
talked about way back that is our own
little
random data generator built for
databases and to put random stuff in
and
we need to be able to do that sometimes
to get a valid amount of test data but
then when we do so we also need to make
sure that is that we're tracking the ids
that were generated because we're going
to talk about cleaning up later
and we also need to make sure
that we did not have
a
we sort of like where we miss coverage
in our testing by using too much of the
same thing
so
um you know sometimes they're like for
example testing a a user
we use the name like john smith or john
doe a whole lot
but you also want to test
other things is maybe throw some special
characters like that that you would see
like a hyphen
um
a comma things like that within a name
maybe
you also want to do like longer shorter
names uh what happens if it's a first
name last name versus first middle last
and things like that
that you do see these in other testing
but
it's much more prevalent in the database
world because we're dealing with the
data itself and the formats of it
and all of this i've related a couple
times i've
mentioned this couple times is the whole
idea of cascading updates and changes so
there are potentially other records
they're going to change based on the
update that you made
uh it may be that there's uh like an
address record somewhere that needs to
be changed there may be something that
says that this record changed so you may
have to go to some other table in the
database and make sure that the change
that you made was properly recorded
again this goes back to the requirements
and
the challenge on all of this
is that it is not uncommon for the
requirements to be
sparse related to how some of this stuff
works
that it's it's the kind of stuff that is
not uncommon for there to be things that
are missed and it's not until you start
testing it and you say hey
i had this happen what is that supposed
to work is that not supposed to work
how is that supposed to work
and so sometimes you're going to have to
go back and you're basically validating
requirements as part of going through
the database testing
and i've mentioned these but now
specifically
audit logs and tracking
when we're doing updates like i said
often there's going to be some sort of
trigger constraints and things that make
sure that
records exist that they don't exist when
they shouldn't exist and along with that
we're going to have some time stamps and
maybe other logging that says
this data was created at this day maybe
by this person so not just time stamps
but some sort of a user id needs to be
part of that
um and in part of your testing you need
to make sure that the
the user ids that are tracked works for
the valid users being able to do that
this may even include
permissions inside the data itself so
you may have a permission table
somewhere that you need to make sure
that this person
can
update this if they have this you know
the correct permission and if they don't
have that permission that they can't
update or create that record
and again it's going to go back to
requirements and making sure that those
are laid out properly and that you can
test and validate those properly
there's usually going to be differences
between
inserting or creating records and
updating records modifying records
and make sure that
the side effects related to that the
audit logs uh whether it's a a text log
file somewhere or
it may be a database table somewhere it
may be fields within that record or it
may be a whole other table
that does the that provides that audit
log
there's things like shadow tables and
that where it's essentially as a record
is
modified in one table there's a record
entered somewhere else in this shadow
table that says this record was modified
at this time here's how it's modified
so you need to make sure that those are
properly being logged those
those entries are being created properly
it may be a database change it may be a
file it may be both so there may be some
sort of database level logging that you
want to verify
and this is sort of you one i want to
verify that it
is
displaying the things it needs to but
also in a security in a
world of hipaa and all these other
requirements that are out there you want
to make sure that the values are not in
a log file that shouldn't be in a long
time
so for example you don't want to be able
to look at a log file and see valid
credit card numbers that that should not
exist
you don't really and if it does
if there for some reason that's needed
for some sort of logging then that needs
to be
properly secured
all that kind of good stuff so you want
to make sure that and score goes back a
little bit back to that correct
encryption concept
values that need to be that should be
encrypted you need to make sure that
they were
properly encrypted in the database so if
i save a password and it's supposed to
be i'm the only one that ever knows what
that password is i should not be able to
go look in a database table somewhere
and see what that database or what that
password was
you need to verify
this goes back to really your
relationships
within the data is make sure that when
the audit logs
that you're verifying those that it's
also the
the user ids or names are properly being
tracked again it's not uncommon that
you'll have sort of a essentially a
hard-coded value somewhere
where everything is
uh and it may not even be hard-coded it
may be how the application works
that the application
logs in as
you know like the super user for the
database
and all the transactions are then
based on that super user as opposed to
the actual user you know that like if
it's me and i if i make a change
i need to make sure that my
name my id is tied to that change as to
posted just being the super user
and that's again not an uncommon
occurrence where the application
connects as a certain user and then you
see all the audit trails just showed
that user making changes that
application user as opposed to
an actual
logged in user that's that's not just
the application but it's a person that
you can tie that back to
and again it goes to the idea of
what i put into the database
i need to be able to get that back out
properly so if i
if i know that rob is
updating this data
and i need to have audit trails based on
that i should be able to get back out of
the database that rob made this change
at this time
and again
this goes back to your your requirements
as well
making sure you understand them and that
they properly
detail what needs to be done
within that
cleaning up your mess
this is
often one of the biggest challenges
one is having test data that works or
that
matches needs sometimes test data is
just a handful of records when which you
really need to test is
at scale so you may need to have maybe
even millions of records
available for your testing to make sure
that you can test the performance make
sure that
things work
even in the what's going to be the at
least close enough to the production
environment
and it's not uncommon in the database
world that it works awesome when you've
got 10 records in there it does not work
at all when you put a hundred thousand
records or a million records in there
so you need to have
properly sized test data as far as
amount
but then we have the cleanup
aspect of
if i go in and especially whether it's i
guess it really doesn't matter whether i
manually do it or script it
if i run tests and they change data or
create data or delete data
then
there may be there's going to be audit
log entries related to that
there may be
test data that's now sitting around my
database
so part of our cleanup is we need to
have
ideally and sometimes it's not possible
for the most part we want to be able to
roll back those changes so i can go in i
can create i have sort of a
test script but then i have a cleanup
script that goes back and for everything
that i just did in my testing
it's going to clean that stuff back up
either mark it inactive or delete it if
that's possible
ideally i want to be able to remove the
audit trail of that as well
because
if somebody's doing audit trail research
they don't want to have all these test
entries that they're having to dig
through
so sometimes i have that that side
effect of data changes that i need to
take into account as part of my cleanup
there may be
in some cases there may be
auto-generated values keys and things
like that that are generated
that i now want to reset those counters
for example if i am starting from
essentially a clean database
i may have this integer related id that
starts at one
and if i go and do a bunch of testing i
don't want it now to start at you know
4058 because that's how many test
records i created so i may have to
actually go back and reset
some of those
counters and
id generators
that's not as common
it shouldn't be an issue but sometimes
it is
uh particularly
with regard to
smaller data sets that are essentially
lookup values like if you want to have
if you have like account types let's say
it's a financial system then you have
these different account types
and there's only 10 account types well
you may
find it much better to have those
account type numbers at basically one
through ten
and if you're doing a bunch of testing
related to that you don't want it simply
to blow it out so that the numbers are
non-sequential and are
in the thousands or higher
you may not be able to fully clean up so
instead you may need some way to tag
your test data
sometimes this is
easy to do sometimes it's a matter of
maybe there's a note or a description
field or some sort of free text enough
that you can put you know the word test
in it or something like that or test
record do not touch test account do not
use
sometimes it's certain id values
so you can go back and
maybe make sure that stuff is tagged in
a certain way or configured in a certain
way
it may be that how you do it you just
need to communicate that when i test
these are things that are
tags or markers so you can tell that
this was test data and not something
that is
i don't say real data
the idea of
a test image or a staging image
or even data reloads
is
becoming
i was going to say it's got gotten a
little better but
sometimes now with cloud databases and
things like that not as much
because when you're dealing with a cloud
database sometimes there's there's a
cost to those transactions
and it's expensive to run all those
tests and then
run a bunch more transactions to clean
it up
one way to get around some of that
is basically drop a database and rebuild
it
sometimes
that's really the approach you want to
take is that you want to
before your test start do a database
backup
you go in you run all your testing and
then when you're done and your cleanup
is simply that you re restore
the database from that backup point that
you took
sometimes that's the easiest way to get
around it is just let's skip all the
trouble and i'm just gonna back it up
and when i'm gonna i'm done i'm gonna
restore the database
in production that may not be feasible
but particularly with the idea of if you
have a test database
that may be something that you
look at maybe it exists if not maybe you
recommend it that there is some sort of
a test image
in some cases specifically with
continuous integration and development
deployment
there may be a
database
image that is just reloaded on a on a
daily or weekly basis so when i do a
build maybe one of the things i do is i
just go restore
a pristine version of the database
make some database you know if there's
changes that have to be done to that
database i make the changes and then i
boom take a snapshot
and the next time i want to get a i'll
call it clean database and quotes
then i just reload that image
that may or may not be feasible but
a lot of times that's going to be the
the best
and most efficient way to do it because
then you don't have to deal with what
did i change you just go back to that
essentially that save point
you