Detailed Notes
Welcome, today we are going to be talking about one of the most talked-about, but overlooked discussions about software and that is technical support. What do we do when software has a problem? In today's discussion, we're going to look at different things like production support. What does it mean to actually support our product?
In past discussions, we've talked about software development life cycle, software test life cycle, differences in how to write code, how to debug our code and how to test it. However, we really haven't talked too much about deployment and production support. This is where we're going to focus our discussion today. How do we support our product and our customers?
We're going to look at the different types of problems that can arise, how we can troubleshoot these problems, and different ways that we can fix these problems. Finally, we're going to look at the different ways we can prevent this from happening, in the future.
Technical Support - Fire Fighting Overview:
Production Support Types of Problems Troubleshooting a Problem Ways to Fix the Problem Prevention Root Cause Analysis (RCA) Technical Support - Fire Fighting Issues
Finally, this series comes from our mentoring/mastermind classes. These classes are virtual meetings that focus on how to improve our technical skills and build our businesses.  After all the goals of each member vary.  However, this diversity makes for great discussions and a ton of educational value every time we meet.  We hope you enjoy viewing this series as much as we enjoy creating it. As always, this may not be all new to you, but we hope it helps you be a better developer.
Transcript Text
[Music] welcome today we are going to be talking about one of the most talked about but overlooked discussions about software and that is technical support what do we do when software has a problem in today's discussion we're going to look at different things like production support what does it mean to actually support our product in past discussions we've talked about software development life cycle software test life cycle difference in how to write code how to debug our code and how to test it now we really haven't talked too much about deployments and production but that's what we're going to focus on today how do we support our product and our customers we're going to look at the different types of problems that can arise how we can troubleshoot these problems different ways that we can fix these problems finally we're going to look at the different ways we can prevent this from happening in the future let's start by talking about production support so production support is where we have a group of people or person that answers the phones or our customer requests when they're having problems when things aren't going right these could be simple things where users can't log in the application crashes different things that happen with the customer so the whole role of production support or customer support is to basically be that initial point of contact for the company between the customer and our software their main goal is to gather and collect the information from the customer from the client what is going wrong what is your problem and then they go through and they prioritize the customer's compliance support problems one of the things you have to think about from a developer's perspective is we're behind the scenes we're writing code we're going through our release cycles and we're putting software out to the customers we're taking in requests in producing output but within this release cycle within this software development cycle we have to also support customers we have to fix potential bugs or issues that arise that were not expected during the initial software process one thing to think about from a support perspective is they have to basically look at the application from the customer's perspective they have to put themselves in the customer's shoes as they're troubleshooting this with the customer they're dealing with potentially angry people that may have had to wait depending upon how big your company is to actually talk to a representative so when as developers we're dealing with customer support or production issues we need to be mindful of what these people are dealing with they're trying to support our application for us they're again the front line supporters think of them as like the 911 responders they're the ones answering the phones they're dealing with people in crisis so it's a very stressful job a lot of times production from a business perspective it's great they're helping our customers from a development perspective sometimes they it can be a little rocky we can butt heads a little bit because from a user's perspective a bug might not really be a bug so we'll talk about that in a minute but the main point is production support is our first line of defense to find and fix any problems that come from the customer so customers having a problem our software's not working correctly the customers will report it then customer support production support will take that information and then relay to the company so it can be fixed what are some types of problems we can have things like the wonderful windows blue screen of death which basically means our application has had a fault or some critical failure where the application does not respond anymore it is down particular problems like this are typically a critical priority these have to be fixed immediately the software doesn't work the customer is down and we have lots of issues so this is just one example of what a problem could arrive other types of problems could be things from a customer support perspective or is this a customer facing problem is this a bug or an issue that is impacting our customers right now is this an internal bug is our application something that is used within the organization itself our customers are not impacted but the business itself is impacted is it a configuration change a new implementation for a customer a new customer that just came on using our software is the software not configured is it potentially a code problem did we just put out a release and now the next day all of a sudden our customers can't log in or the login page is missing or critical functionality is broken could it be a network issue are the users unable to connect to the internet are they unable to connect to our software is there a communication issue across the network is there some type of back-end problem finally the last one could potentially be a hardware issue is there hard drive failing is there machine feeling like what we just had before i started this presentation i had a microphone failure so people can hear audio so you could potentially have hardware issues and then the crux of it is sometimes you have a combination of all of these so customer support production support has to be mindful not only of the customer but they have to be knowledgeable and understand the different types of problems that can arise and how to triage them and prioritize them so that a customer is not down we get the right players involved to fix the problem once they identify that we have a problem and if it is a critical problem we immediately get development involved for the managers involved to help support this we go into a troubleshooting mode we have to determine the type of severities of this particular issue is it the customer able to work is the customer down but truly the first thing you want to do is be mindful take a breath talk to the customer and don't panic because if the customer calls you a lot of times they're going to be in a friendly they're going to be upset or they are going to be panicky because they can't log in they can't do their work something is wrong so i took my favorite icon here from hitchhiker's guide to the galaxy our little don't panic guy don't panic stop take a breath so again you're going to try to reproduce the problem we're going to engage with the end user collect the information try to walk through figure out where the problem is what's going on with the end user and what's broken then we could potentially look at system logs we could look at see if there were system alerts if this is like a back-end database potential servers going down there there could be many different factors that you could look at to help troubleshoot the problem but a lot of the things that come out of troubleshooting the problem is first customer support looks at the priority is the customer unable login could be a low priority it could be a medium priority another priority could simply be that oh the system is down we're getting a 400 there or page not found here in which case it's a very high priority or critical priority and when we start troubleshooting it we could determine that this is actually a low severity because the customer can't log in because they forgot their password so they just need to go reset their password problem fixed in the case of the system being down well that's a high severity we probably have to go restart the servers if the servers don't restart we then have to start really going into triage mode and start fixing the problem so once we troubleshoot the problem the customers reported the problem we've now troubleshooted we've dug into the problem a little bit to make sure that we do have an actual problem that it is something that we need to pass along get more people involved we then pull everyone together so we've identified a bug now what once you have all the players involved you sit down and you identify the risk of the problem this helps you identify that priority and severity you're going to look at things like what is the threat is this a new incident with potential harm to the system is this a bug is this a vulnerability is this a known issue or weakness that can be exploited could this be a patching issue so you take those things and then you evaluate that risk the potential for damage when a threat is exploited or a vulnerability is found so these are just some things you have to look for and troubleshoot as you go through the process of troubleshooting these applications these problems now once customers report the issue we've identified the problem we have set the risk now we have to look at a option or what path do we want to go down to fix this problem are we going to need to do a hotfix a maintenance or a bug fix so let's look at the different types of options we have here to address this first and foremost if this is a major production issue and it's something that you could potentially fix real time then you do something like a hotfix which is a fix on a live active software or application so you actually physically go into the software quickly make the change and redeploy very minimal testing very risky but if the system's already down what choice do you have you go in you fix the problem and you quickly get it out there so that you have zero to minimal downtime for the end user it also means that the fix is immediately live on all systems now this is great and it addresses the problem and potentially gets the system back up fixes the customer's issue great we move on however hot fixes are dangerous because you're hyper focused on one particular area of the software you're not looking at the big picture you're only looking at the problem as reported by the end user if you get into the habit of doing hot fixes step back and looking at the big picture you potentially could be fixing that particular customer's issue but you could also potentially be putting in additional risk or vulnerabilities into the application that could impact other customers as a whole so a hotfix is great to quickly fix and get the system back online it's not typically an approach you want to take basically the system is down that there is a major catastrophic failure that needs to be fixed immediately to get all your customers happy another approach is patching so we get a list of reported bugs or issues from the end users these are not necessarily critical issues meaning the customer can still do their job there's just some problems with the software that are it's a little buggy it's quirky it's not working entirely as expected they can still do their job but they have to find workarounds for what they're doing so it's not an ideal process in that case we go through a software maintenance cycle our software release cycle and we create a patch it's a temporary fix on a product so here what we do is we go through we create a small release cycle fix the issues that we've identified and we wrap the software up and we push it out as a release as a small patch release and this typically is pushed out to the end users by either some type of automatic update system or if it is a server fixed you're going to be updating the applications directly if you're dealing with standalone applications or package software you may potentially have to send a disk out to a customer now in the age of the internet and software as a service 99 of these patches are done through automatic updates they're basically pushed out to the customer in some cases the customers do need to go to the sites and actually download the patch that they need patches typically are not always pushed out to everyone typically a patch is generated pushed up and then a customer can come download it as they need it now if a patch ultimately is deemed to fix the issue long term then it will go into the full release package or release cycle and then everyone gets the update and typically these are planned between releases of the software so patch is not a full release cycle it's a small software fix to address the problem that is that goes through a small release cycle whereas the hotfix you just immediately fix it and deploy it next to no testing next to no development cycle you are fixing this real time i'm going to skip over maintenance for a minute we'll talk about bug fixes so we've talked about hotfix and patching so hotfix again is real time fix it right now make the customer happy patching okay we've got a couple bug fixes or a couple bugs or issues reported by the customer we can bundle those together we can put them through a small release cycle we can test them make sure everything's working and then actually push it out after a couple of days or a week depending upon your release cycle now with bug fixing when you hear bug fixes this is typically where we actually identify the bugs during the actual software development cycle so bug fixes are actually issues that we find during the development cycle this is where testing finds a problem fixes the problem identifies the problem documents it pushes to development to be fixed when it comes to bug fixing there is zero impact to the users because we are addressing the bugs real time during the development cycle so any bugs that are identified should be fixed within the release cycle so when we go to release once we get down into after the testing phase in the release phase once the application goes out there should be no potential bugs or risk to the end user now we say this but there are always things that potentially can arise during the software development process and that's why we have customer support production support because at the end of the day we build this software and we're writing code we're trying to do things in such a way that that meets the requirements specified by the end users so the business gives us the requirements and we try to write them the problem is sometimes those specifications or those business requirements are actually used by the customer in an unexpected way which potentially can cause problems and requires additional code fixes that's why we get into maintenance cycle we'll get into that routine of constantly going back through and doing software updates which leads us to the maintenance option so the maintenance here is a fix on live action active software or an application this has a visible impact via downtime or system restart and maintenance is typically when we do a planned outage so we communicate to our customers that we are going to be down for uh system maintenance for software maintenance it's a maintenance we have to bring the system down for a little bit so our software will not be available banks do this regularly so they can restart their systems they can apply patches to their back-end servers or even put in major software updates but they do it in a planned maintenance approach because yes they have released cycles for bigger organizations especially financial they can't necessarily do them on the fly they can do them somewhat on the platform internal but if it is an external facing especially with banks you have to do it in some type of maintenance window so that you can have a planned outage so people know that they cannot access their financial information it's very critical that they know that they can't because of maintenance that there isn't a critical failure and that their money is not at risk so these are the four major options you have for addressing bug fixes and how we can go through the software process and fix the problem there's a couple caveats here the idea is to constantly follow through the software development cycle we want to make sure that we document the bug we want to make sure we create some type of ticket outlining what happened we want to make sure we go through some type of review process look at the requirements documentation and look at this particular bug and make sure things work then we want to get all the players together to discuss how to address the particular problem maybe include the customer get their feedback figure out what's going on at the end user then we want to go through fix the code test deploy and retest so really you want to go through that full software cycle you want to make sure that not only are you fixing the problem but you're not introducing any new problems or threats however we can't necessarily do that all the time like i said we can do that for a patch and we can do that for maintenance but typically if our system is down if we have that blue screen of death and a restart does not fix our application we need to go back in and fix it now so we go in and do a hotfix you
Transcript Segments
[Music]
welcome today we are going to be talking
about one of the most talked about but
overlooked discussions about software
and that is technical support
what do we do when software has a
problem
in today's discussion we're going to
look at different things like production
support
what does it mean to actually support
our product
in past discussions we've talked about
software development life cycle software
test life cycle difference in how to
write code how to debug our code and how
to test it now we really haven't talked
too much about deployments and
production but that's what we're going
to focus on today
how do we support our product and our
customers
we're going to look at the different
types of problems that can arise
how we can troubleshoot these problems
different ways that we can fix these
problems
finally we're going to look at the
different ways we can prevent this from
happening in the future
let's start by talking about production
support
so production support is where we have a
group of people or person that answers
the phones or our customer requests when
they're having problems when things
aren't going right these could be simple
things where users can't log in
the application crashes
different things that happen with the
customer
so the whole role of production support
or customer support is to basically be
that initial point of contact for the
company between the customer and our
software
their main goal is to gather and collect
the information from the customer from
the client what is going wrong
what is your problem
and then they go through and they
prioritize the customer's compliance
support problems
one of the things you have to think
about from a developer's perspective is
we're behind the scenes
we're writing code we're going through
our release cycles and we're putting
software out to the customers we're
taking in requests in producing output
but within this release cycle within
this software development cycle we have
to also support customers
we have to fix potential bugs or issues
that arise that were not expected during
the initial software process
one thing to think about from a support
perspective is they have to basically
look at the application from the
customer's perspective they have to put
themselves in the customer's shoes
as they're troubleshooting this with the
customer they're dealing with
potentially angry people that may have
had to wait depending upon how big your
company is to actually talk to a
representative
so when as developers we're dealing with
customer support or production issues we
need to be mindful of what these people
are dealing with they're trying to
support our application for us they're
again the front line supporters
think of them as like the 911 responders
they're the ones answering the phones
they're dealing with people in crisis
so it's a very stressful job
a lot of times production from a
business perspective it's great
they're helping our customers from a
development perspective
sometimes they
it can be a little rocky we can butt
heads a little bit because from a user's
perspective a bug might not really be a
bug
so we'll talk about that in a minute but
the main point is production support is
our first line of defense to find and
fix any problems that come from the
customer
so customers having a problem our
software's not working correctly
the customers will report it then
customer support production support will
take that information and then relay to
the company so it can be fixed
what are some types of problems
we can have things like the wonderful
windows blue screen of death which
basically means our application has had
a fault or some critical failure where
the application does not respond anymore
it is down particular problems like this
are typically a critical priority
these have to be fixed immediately the
software doesn't work the customer is
down
and we have lots of issues
so this is just one example of what a
problem could arrive other types of
problems could be things from a customer
support perspective or is this a
customer facing problem
is this a bug or an issue that is
impacting our customers right now
is this an internal bug
is our application something that is
used within the organization itself
our customers are not impacted but the
business itself is impacted
is it a configuration change a new
implementation for a customer a new
customer that just came on using our
software
is the software not configured is it
potentially a code problem did we just
put out a release and now the next day
all of a sudden our customers can't log
in or the login page is missing or
critical functionality is broken could
it be a network issue
are the users unable to connect to the
internet
are they unable to connect to our
software
is there a communication issue across
the network
is there some type of back-end problem
finally the last one could potentially
be a hardware issue
is there hard drive failing is there
machine feeling
like what we just had before i started
this presentation i had a microphone
failure so people can hear audio so you
could potentially have hardware issues
and then the crux of it is sometimes you
have a combination of all of these
so customer support production support
has to be mindful not only of the
customer
but they have to be knowledgeable and
understand the different types of
problems that can arise and how to
triage them and prioritize them so that
a customer is not down
we get the right players involved to fix
the problem
once they identify that we have a
problem and if it is a critical problem
we immediately get development involved
for the managers involved to help
support this
we go into a troubleshooting mode
we have to determine the type of
severities of this particular issue is
it the customer able to work is the
customer down but truly the first thing
you want to do is be mindful
take a breath talk to the customer and
don't panic because if the customer
calls you a lot of times they're going
to be in a friendly
they're going to be upset or they are
going to be panicky because they can't
log in they can't do their work
something is wrong
so i took my favorite icon here from
hitchhiker's guide to the galaxy our
little don't panic guy
don't panic
stop take a breath
so again you're going to try to
reproduce the problem
we're going to engage with the end user
collect the information try to walk
through figure out where the problem is
what's going on with the end user and
what's broken
then we could potentially look at system
logs we could look at see if there were
system alerts
if this is like a back-end database
potential servers going down there there
could be many different factors that you
could look at to help troubleshoot the
problem
but a lot of the things that come out of
troubleshooting the problem is
first customer support looks at the
priority
is the customer unable login could be a
low priority it could be a medium
priority another priority could simply
be that oh the system is down we're
getting a 400 there or page not found
here in which case it's a very high
priority or critical priority
and when we start troubleshooting it we
could determine that this is actually a
low severity because the customer can't
log in because they forgot their
password so they just need to go reset
their password problem fixed
in the case of the system being down
well that's a high severity we probably
have to go restart the servers
if the servers don't restart we then
have to start really going into triage
mode and start fixing the problem so
once we troubleshoot the problem the
customers reported the problem we've now
troubleshooted we've dug into the
problem a little bit to make sure that
we do have an actual problem that it is
something that we need to pass along get
more people involved
we then pull everyone together so we've
identified a bug
now what
once you have all the players involved
you sit down and you identify the risk
of the problem this helps you identify
that priority and severity you're going
to look at things like what is the
threat is this a new incident with
potential harm to the system
is this a bug is this a vulnerability
is this a known issue or weakness that
can be exploited could this be a
patching issue
so you take those things and then you
evaluate that risk the potential for
damage when a threat is exploited or a
vulnerability is found so these are just
some things you have to look for and
troubleshoot as you go through the
process of troubleshooting these
applications these problems
now once customers report the issue
we've identified the problem
we have set the risk
now we have to look at a option or what
path do we want to go down to fix this
problem are we going to need to do a
hotfix a maintenance or a bug fix
so let's look at the different types of
options we have here to address this
first and foremost if this is a major
production issue and it's something that
you could potentially fix real time then
you do something like a hotfix which is
a fix on a live active software or
application
so you actually physically go into the
software quickly make the change and
redeploy very minimal testing very risky
but if the system's already down what
choice do you have
you go in you fix the problem and you
quickly get it out there so that you
have zero to minimal downtime for the
end user it also means that the fix is
immediately live on all systems
now this is great and it addresses the
problem and potentially gets the system
back up fixes the customer's issue
great we move on
however hot fixes are dangerous because
you're
hyper focused on one particular area of
the software you're not looking at the
big picture you're only looking at the
problem as reported by the end user
if you get into the habit of doing hot
fixes step back and looking at the big
picture you potentially could be fixing
that particular customer's issue but you
could also potentially be putting in
additional risk or vulnerabilities into
the application that could impact other
customers as a whole
so a hotfix is great to quickly fix and
get the system back online
it's not typically an approach you want
to take basically the system is down
that there is a major catastrophic
failure that needs to be fixed
immediately to get all your customers
happy
another approach is
patching
so we get a list of reported bugs or
issues from the end users
these are not necessarily critical
issues meaning the customer can still do
their job
there's just some problems with the
software that are it's a little buggy
it's quirky
it's not working entirely as expected
they can still do their job but they
have to find workarounds for what
they're doing
so it's not an ideal process
in that case we go through a software
maintenance cycle our software release
cycle and we create a patch it's a
temporary fix on a product
so here what we do is we go through we
create a small release cycle
fix the issues that we've identified and
we wrap the software up and we push it
out as a release as a small patch
release
and this typically is pushed out to the
end users by either some type of
automatic update system or if it is a
server fixed you're going to be updating
the applications directly
if you're dealing with standalone
applications or package software you may
potentially have to send a disk out to a
customer
now in the age of the internet and
software as a service 99 of these
patches are done through automatic
updates they're basically pushed out to
the customer in some cases the customers
do need to go to the sites and actually
download the patch that they need
patches typically are not always pushed
out to everyone
typically a patch is generated pushed up
and then a customer can come download it
as they need it
now if a patch ultimately is deemed to
fix the issue long term then it will go
into the full release package or release
cycle and then everyone gets the update
and typically these are planned between
releases of the software
so patch is not a full release cycle
it's a small software fix to address the
problem that is that goes through a
small release cycle whereas the hotfix
you just immediately fix it and deploy
it next to no testing next to no
development cycle you are fixing this
real time i'm going to skip over
maintenance for a minute
we'll talk about bug fixes
so we've talked about hotfix and
patching so hotfix again is real time
fix it right now
make the customer happy
patching
okay we've got a couple bug fixes or a
couple bugs or issues reported by the
customer
we can bundle those together
we can put them through a small release
cycle
we can test them make sure everything's
working and then actually push it out
after a couple of days or a week
depending upon your release cycle now
with bug fixing when you hear bug fixes
this is typically
where we actually identify the bugs
during the actual software development
cycle
so bug fixes are actually issues that we
find during the development cycle
this is where testing finds a problem
fixes the problem
identifies the problem documents it
pushes to development to be fixed
when it comes to bug fixing there is
zero impact to the users because we are
addressing the bugs real time during the
development cycle
so any bugs that are identified should
be fixed within the release cycle
so when we go to release once we get
down into after the testing phase in the
release phase once the application goes
out there should be no potential bugs or
risk to the end user now we say this but
there are always things that potentially
can arise during the software
development process
and that's why we have customer support
production support because at the end of
the day we build this software and we're
writing code
we're trying to do things in such a way
that that meets the requirements
specified by the end users so the
business gives us the requirements and
we try to write them the problem is
sometimes those specifications or those
business requirements are actually used
by the customer in an unexpected way
which potentially can cause problems and
requires additional code fixes
that's why we get into maintenance cycle
we'll get into that routine of
constantly going back through and doing
software updates which leads us to the
maintenance option so the maintenance
here is a fix on live action active
software or an application this has a
visible impact via downtime or system
restart and maintenance is typically
when we do a planned outage
so we communicate to our customers that
we are going to be down for uh system
maintenance for software maintenance
it's a maintenance we have to bring the
system down for a little bit
so our software will not be available
banks do this regularly so they can
restart their systems they can apply
patches to their back-end servers or
even put in major software updates but
they do it in a planned maintenance
approach
because yes they have released cycles
for bigger organizations especially
financial
they can't necessarily do them on the
fly they can do them somewhat on the
platform internal but if it is an
external facing especially with banks
you have to do it in some type of
maintenance window so that you can have
a planned outage so people know that
they cannot access their financial
information it's very critical that they
know that they can't because of
maintenance that there isn't a critical
failure and that their money is not at
risk so these are the four major options
you have for addressing bug fixes and
how we can go through the software
process and fix the problem
there's a couple caveats here the idea
is to constantly follow through the
software development cycle we want to
make sure that we document the bug we
want to make sure we create some type of
ticket outlining what happened
we want to make sure we go through some
type of review process
look at the requirements documentation
and look at this particular bug and make
sure things work
then we want to get all the players
together to discuss how to address the
particular problem maybe include the
customer get their feedback figure out
what's going on at the end user then we
want to go through fix the code test
deploy and retest
so really you want to go through that
full software cycle
you want to make sure that not only are
you fixing the problem but you're not
introducing any new problems or threats
however we can't necessarily do that all
the time like i said we can do that for
a patch and we can do that for
maintenance but typically if our system
is down if we have that blue screen of
death and a restart does not fix our
application we need to go back in and
fix it now
so we go in and do a hotfix
you