Building Better Developers

Detailed Notes

What should you do when software fails?

In this episode of Building Better Developers with AI, Rob Broadhead and Michael Meloche explore what really happens when systems crash, production goes down, and disaster strikes. Originally inspired by the episode When Coffee Hits the Fan, we walk through real developer mistakes, recovery strategies, and the tools that can help you respond fast when failure hits.

✅ Key Topics Covered: • Real-world developer outage stories • Common mistakes that cause production failures • How to build your own recovery strategy • Tools like Docker, Terraform, GitHub Actions, and Chaos Monkey • Blameless postmortems and communication during outages • Why mindset matters when things go wrong

🛠️ Whether you’re solo or on a DevOps team, this episode will help you prepare, respond, and recover the next time your software fails.

⸻

🎧 Listen to the Podcast: https://develpreneur.com/what-happens-when-software-fails/

⸻

🔗 Links & Mentions: • Docker: https://www.docker.com • Gremlin (Chaos Engineering): https://www.gremlin.com • GitHub Actions: https://github.com/features/actions • Terraform: https://www.terraform.io • Sentry: https://sentry.io

⸻

📌 Connect with Us: • Website: https://develpreneur.com/ • LinkedIn: https://www.linkedin.com/company/develpreneur/ • Twitter / X: https://X.com/develpreneur • Facebook: https://facebook.com/Develpreneur

⸻

#SoftwareFails #DevOps #DeveloperRecovery #DisasterRecovery #BuildingBetterDevelopers

Transcript Text

[Music]
That was a recording in progress. All
right, this was going to be fun. Uh,
diving right in. We were just whining
before this, so we're trying to adjust
here and get back to being normal people
and not the whining little pansies that
we are. And yes, we were whining about
work. So, welcome to our world is the
same as your world probably.
So, uh this time we're going to
continue. This is the when coffee hits
the fan real talk on developer disaster
recovery.
Um, this is actually interesting because
I already threw it in there and the
first thing came back is that's a
brilliant and catchy title which I think
was generated by itself. So, it's like
hiding itself on the back while it's
like, "Wow, that's awesome. I'm glad you
sent me one of those or you used our
title." All right. Well, let's dive into
it and see how it goes with a little
three, two,
one. Well, hello and welcome back. We
are continuing our season where it is
with AI. Yes, we are building better
developers the developer podcast. This
season we are going back through prior
season going through some of the
episodes and basically shoving those
into an AI engine uh specifically chat
GPT and seeing what it gives us back as
its recommend recommendations just to
sort of see how AI does stuff. Before
that I need to talk about how I do
stuff. Who am I? I am Rob Broadhead. I'm
one of the founders of developer
developer also a founder of actually the
founder of RB consulting where we are
a we are you know it's a lot of
different ways to refer to it sometimes
we're called a fractional CTO CIO
sometimes we' be referred to as boutique
consulting the bottom line is we sit
down with our customers we work through
their business talk about what are they
doing what are their processes what are
their goals and their vision and then we
look at what kind of technologies do
they have what kind technologies are out
there for them. And we will use a whole
bunch of different tools, whether it's
simplification, integration, automation,
innovation, any of those things and more
we will use to help them craft a
essentially a recipe for success that is
unique to that company because
everybody's got a little bit of a, you
know, different take on stuff. You got
different staff, different needs,
different resources. We help mirror
those or actually marry those things
together, provide a technical road map
and then can help you implement it or
can let you go on and just on your merry
way following that road map. Good thing
bad thing uh very much near and dear to
my heart right now. Good thing is we are
going through and updating the town home
that we just recently got. We got a
whole bunch of little things too. So
that's great. That's awesome. And I've
got a wife that does that stuff. So
she's off doing all that cool things.
Bad thing is is that sometimes she can't
get some of those things done because it
needs as she calls it somebody with
muscles. And yes, I happen to have a
couple. So, I had to actually come to
the townhouse, which we don't have,
which is really the bad thing. We don't
really have Wi-Fi here. So, I'm working
off my phone. So, I may not have the
best connectivity this time around,
which is okay because at some point that
means I'll just block
have to see him. But you will if you're
on the YouTube. but more importantly
let him introduce him hopefully dig
himself out of the hole that I just
created for him. Apologies.
>> Hey everyone, my name is Michael Malash.
I'm one of the co-founders of Del
Developer. I'm also the founder and
owner of Envision QA where we help
startups and growing companies launch
better products faster. That means fewer
bugs, smoother customer experience, and
less wasted time and money. We take care
of the behind-the-scenes quality work so
teams can focus on building and scaling.
Learn more at envisionqa.com.
We also offer other things like uh
software assessments, technical
assessments, and really we can help you
understand what your software stack is
or what technologies you have if you
have no idea. So we can also help you
help yourself and improve your products.
Good thing, bad thing. good thing. Uh
had a nice restful weekend. Fourth of
July was a lot of fun. Got to catch up
with some old friends and uh see some
people we haven't seen in a while.
Weather was absolutely great. Uh it
rained a little bit, but I have to say
that was probably the first week we've
had less rain and more sun, so that was
a lot of fun. uh bad side of that. Uh
we were at the riverhouse and of course
as we're going through all the checklist
uh we had to get some repairs done as
well. So we had to replace a couple
toilets and uh garbage disposal went out
and then of course some plumbing issues.
So uh good with the bad.
>> Well, we have nothing but good ahead of
us today. We're going to take the
episode that was called was I guess is
originally was called when coffee hits
the fan real talk on developer disaster
recovery and so we have thrown this into
chat GBT and it comes back with that's a
brilliant and catchy title because it
likes us. It was going like neutral for
a little bit. It's come back to I think
it heard me talking about it. It's
probably out there like sourcing data
and it's like, ooh, we're being, you
know, we're getting shade thrown at us.
So, let's come back to this like little
bubbly thing that it likes to do. So,
let's dive right into this one. This
one's interesting. So, they add a cold
open episode structure. Cold open, one
to two minutes. Describe a humorous or
dramatic coffee spill scenario. For
example, you just push to production. CI
passed. You sip your coffee and then you
see the site is down completely. Q theme
music. That would be a really dramatic
entrance into our podcast that I have
never actually done. I don't think so.
Interesting idea. Maybe in a future
season we will adjust how we come into
these things. The best we get is the
like the Christmas and Thanksgiving deal
music that we get in New Year's I guess
when we do those. Not a whole lot there.
So, all right. We missed that. All
right, we're going to go into act one,
real developer disasters, 8 to 10
minutes. Let's see if we can keep it to
that. Hook with real world stories or
invite a guest to share their worst oops
moment. Now, this is interesting because
this is actually giving me a very
different uh answer than we've gotten in
the past couple of time, well actually
now you know dozen or so times that
we've done this. So, hook with real
world stories or invite a guest,
including accidentally wiped out a
production database, hot fix that broke
everything, misconfigured cloud
resources, uh, which equals $10,000
overnight, copied pasted secret keys
into a public repository, actually
spilling on a keyboard mid deploy. It is
amazing
how many of those things I think I think
there's one maybe that I haven't had.
Don't think I've actually spilled coffee
on a keyboard mid deploy. I don't know
that I've ever actually because I don't
drink coffee that I've actually spilled
it. Uh I think I've sp stuff on
keyboards and lost keyboards but deploy.
So I have like one of the five that they
give us that has not happened.
Misconfigured cloud resources. I'm just
going to jump on that one real quick
because that's like it's not necessarily
a disaster per se, but it can be a real
pretty darn close to it. And it's
actually I think it's more common. I've
seen it a lot of times. It's probably
more common than some of these other
we'll call them, you know, in quotes
disasters.
And it really comes down to, and I I
hate to throw them under the bus, but
you know, Amazon, Google, Microsoft,
when you build with their tools, with
their cloud tools, when you build
environments, they build them, which is
I'm going to give them, you know, a
little bit of grace and and forgiveness
here because they build them to be
essentially an enterprise level
solution. The problem that we have a lot
of times particularly as side hustlers
and developers and things like that is
we don't need something of that level.
So for example
I used and this goes back to when we're
doing our uh our learn to you know learn
your uh learn to develop an internet
launch your internet business. That's
it. Uh we're doing that. One of the
things we do is we set up we use Amazon
we do an EC2 instance and we do
WordPress. Well, at one point I decided
to use the Amazon tools to do it to make
it easier. And it set up a web pre a
WordPress environment, but it included
um it went out and did route 53 for SSL.
It had a secure C, which was I didn't
even need, but it was like, you know,
that's 100 bucks right there. Um it it
spun up like three different servers. I
think there was like actually there was
two front-end web servers. There was one
back-end database and they were all
connect an RDS database and then there
was a load balancer and then there was
like some extra a couple of extra file
servers out there just for media and
stuff like that which if you've got a
nice big WordPress installation awesome
the like daily cost of that thing that
it generated for me was I think wasn't
huge but it was like 50 or 60 bucks
maybe a h 100red bucks a day if you're
blogging and you're just starting out,
you're not going to generate a hundred
bucks a day to support your, you know,
your and we were sitting there launch
your internet business on, you know,
basically less than 50 bucks. So, that
would have completely blown it all up.
So, we immediately said, "No, that is
not an option that we're going to
suggest to people." And I don't want to
throw Amazon under the bus because I
have done the same thing with Azure,
with um or with Google. It's like they
they what they build makes sense, but it
also doesn't often make sense for you if
you're just like, you know, playing
around with it, doing some development,
some testing and things like that. So,
minor disasters, but they can cost you.
And I have more than a couple times
spent a little more money than I wanted
to on uh places like especially like
cloud providers and things like that.
I will now toss it to you because I'm
probably almost burned through all of
our 8 to minute 10 minutes already.
>> Yeah. So on this one, I'm going to take
the database one because
so many times that has bitten me in the
butt because it's like okay, you're
working and you may think you're in
production or you may think you're not
in production, you're on a different
machine depending upon how you interface
with the database. If you're like
logging in through command line
interface,
they all look the same unless you go do
some custom coloring to make sure that
the system you're connecting to is the
color for what you're on. You know,
green is good, yellow is worn, you know,
like QA and red is prod. Make sure you
don't break anything.
I'm just going to go a little kind of
quickly through this one. So with this
I will start with kind of the what to
what you should be doing first. We time
wise you don't always have this but
always back up the system that you're
working on. Make sure you have data
backups or system backups. Uh make sure
that the tools that you're using has
those color features that I was just
mentioning. Also make sure that if you
are using tools, make sure you turn off
autocommit for any environment other
than dev.
If you make a mistake, you can easily
recover if autocommit is off. It may
take longer but it's can save your butt
so many times uh it won't even The other
thing is
uh
again make sure that when you are
working or you're making the changes to
production make sure that you've tested
these scripts out on lower environments
first test them on a dev environment
test them on a QA environment test them
on more than one environment to make
sure that they work the way you expect
them to
Maybe write up some testing depending
upon what language you have to make sure
that hey you've got the data there. Uh
maybe write some um sort procedures to
do some uh testing for you. Sometimes
even having sort procedures doing SQL uh
updates for you is a better idea because
you can establish rules that if failed
you can run everything as a single
transaction and roll it all back. uh or
you could once it's finished you can do
some checks in there and if those checks
pass cool. If those checks fail it could
also roll back. So you can kind of build
in some disaster recovery with this. But
I I just have to say more times than not
you're going to be in a hurry. And all I
have to say is before you do delete,
update, you make a change, make sure you
know what system you're on because you
can make a mistake and be hurting
tomorrow.
Other thing is back up the even if you
don't have a system backup, back up the
table that you're working on. So
hopefully you at least have a snapshot
and you can fix the data quickly. Be
warned, triggers and things of that
nature will make that harder. it's
better to do a full database backup
before you make changes.
>> A couple things I want to throw on that
is that um yes, it's you you definitely
want to do it in a call lower level
database as opposed to production. Um
one of the things I was taught years ago
as a DBA is that whatever you're going
to do, if you're going to do an update,
a delete, or anything like that that's
going to change stuff, even inserts, uh
depending on how you're doing the
insert, is do it as a select first. I
know this is a little geeky database
stuff, but like if you're going to do
delete blank from blank, do the select
first and take a look at that and make
sure that your row count looks like it
should. Um, if you are typing stuff,
even though like always turn your auto
commit off. That is a very important
thing. U make sure if you're writing
something that the first thing you do is
your wear clause. I have found too many
times that I've tried to do something,
I've been doing something and then you
accidentally like flub a key and then
you have to, you know, hopefully you
just have to roll it back. Uh definitely
use color coding wherever possible. Even
if I'm using command line shells, I will
use stuff that I have. Uh granted, I'm a
Mac person, but I know you can do this
with other systems as well. You can
change the background of the command
line environment that you're in. So if
you tet into something, you can actually
change that to make sure that you're in
the proper shell. And as Michael said,
typically, you know, green is, you know,
development, yellow for test and red for
production. And please don't, you know,
mess with that kind of stuff. Um, also I
will just say do not use root anywhere
ever if you can possibly avoid it. I
know there's some places you can't. uh
in those cases don't be afraid to use
things like Docker and other you know
and things like that and ways to and
even AI to generate essentially your own
development environment even when you're
dealing with production environments
that are just too deal find a way to
mimic that I have I had a customer for
years that was uh there was no way I was
going to pull all of their data down
besides the fact that it was like you
know private and I didn't want to deal
with security it was also just too much
stinking data
So I did pull down the entire structure
and then used a couple of tools to
generate you know mock data. That kind
of stuff will help you immensely and you
can always pull down some specific
examples as well to just test your stuff
beforehand. Moving on because this is
actually I'm going to fly through this
one because this actually some stuff
that we really just touched on sort of
like AI was Michael's not looking but it
was like you know thinking the same
thing. So what should happen? Developer
dea disaster recovery 101 break it down
like a postmortem for focusing on how to
build better recovery habits
prevention back feature flags only
access and prod detection monitoring and
alerting for example Sentry Data Dog
Prometheus blah blah there's so many
tools out there depending on where
you're at what your environment is there
are monitoring and logging tools that
can help you out. Uh there's even some
things that will give you like warnings
of hey you are doing something that is
going to affect more than five rows. Are
you really really sure you want to do
that? Uh monitoring alerting is great.
Uh rollbacks we've talked about canary
deployments hot fix versus proper
patching. Uh this is again is the kinds
of things is where you're going to go
test it out before you actually do it.
Uh communication have an on call policy
transparent incident update. This is
important. I think even if you're in a
small team, you should be regularly
letting people know when you screwed up
because I like I have a team that has
got some people that are newer
developers. Uh a lot of times we're new
into an environment, things like that.
It is very helpful to have in our
standups mentions of by the way I did
this and it turned out into something
that I did not expect because that helps
other people know that what the
relationships are in your data what some
of the you know the potholes are and
things like that. Uh runbooks
documentation runbooks disaster recovery
plans and retrospectives. Uh I highly
recommend anywhere that you can use
automation of any sort including
runbooks including even the things we've
talked about before like ant shell
scripts you know if you're depending
what your environment is things like
Maven uh even like continuous you know
CI/CD tools like your Jenkins and things
like that pipelines all those even a
platform as a as code those kinds of
things the more you can automate it the
better that is one of the reasons I've
become more and more a fan of using
things like Docker uh Docker Desktop for
current environments because then you
can just like code the whole thing out
and then if you need to replicate an
environment, bam, just run it and you're
off and going and you got a lot of other
containers like that. Now dive into the
next section which is act three tools
and techniques for DR readiness. This is
where I want us I think we're going to
have some good conversation here. Share
tools and frameworks, infrastructure,
terraform, anible, kubernetes,
helmcharts, uh, CI/CD, GitHub, actions,
GitLab, Jenkins, backups, automate
snapshots, DR regions, testing, chaos
engineering like Gremlin, Netflix's,
Chaos Monkey, load testing, Ksix, and
Lo, Locus. There are a lot of tools that
we just mentioned that we I just
mentioned in that little list that I
don't think we use enough unless we are
we'll use them at work. We'll use them
if we're working for a company or an
employer and they're the kinds of things
it's like because sometimes they require
some time and some money and things like
that set up and we don't necessarily
have it but I think if you can make that
part of your technical roadmap as you
grow out your side hustle your these
things that you you definitely need
obviously obviously but I'm going to
start backups like automating snapshots
and and making sure this is if you're
using cloud
I beg you to make sure that once you put
that server, even if it's, you know,
Bob's pet store street, once you set up
that server for him, take a snapshot and
then just hold on to that because what
you can do is from that snapshot, you
can generate another instance almost
instantaneously.
I would actually say take that and share
it out to another zone. So if, for
example, if they're on an East Coast
zone and it goes down, have it out there
in the West Coast zone. So you can spend
something up very quickly. This is
called disaster recovery. It's at a very
low and it's you don't have to spend
much time or money to do so. Once you do
it, I think you will realize how simple
and addictive it is. That's where I'm
going to stop because I know Michael has
dealt with some of these things as well.
And I'm curious where you want to go on
this long list I just provided.
>> So I'm actually going to just kind of
keep it simple. So from a developer
perspective,
if you're new to if a lot of these terms
are new to you and or you've touched on
one thing or another, but not all of it,
start playing with containers. If you
are not using containers today, start
now.
I I I basically beaten the horse dead
with the kitchen sink idea. But you can
build a kitchen sink application or
model using containers and these
frameworks like load balancers, uh like
disaster recovery, spinning of
production environments.
Essentially, anything that you do,
throw it in a container, replicate it,
make sure it can be replicated. You can
share those containers between
developers. You can then take that
container, stick it, literally take that
image, push it up to another environment
and see if it scales. If it doesn't,
well, make it beefier. See if you can
play with it. Containers let you do so
much so quickly and they're really very
quick to build and easy to throw away.
So, if you mess it up, nope, drop it,
rebuild it, start again. These are also
very good ways to kind of centralize
your development environments. So you
can create one development environment,
get all your tools in there for your
developers for what they need, share it
out to everyone. They can use that. They
break it, drop it, rebuild it. All your
code should be in a code repository. So
you should be able to pull it up and
down. Backups are easy. You just take a
snapshot. Done.
Literally with in the world of testing,
this is what should be happening and a
lot of your big corporate companies are
using things like uh you know um is it
Sonar Bayer uh test GE uh Cucumber they
use these things but they also use
cloud-based uh grid kind of ideas to
differentiate and test multiple systems
at once to test multiple you can run
multi-threaded tests against your
current systems.
start playing around with that if you're
not doing that now. These are things
that you should and really need to learn
and know to be able to succeed in
enterprise and also you want to make
sure that your products that you build
for your customers can scale and this is
the best way to do it
>> and it really is it's one of those that
I don't know how often things have
gotten lost in configuration uh there
that as developer spent over the I lost
a year my entire time doing
configuration issues and fixing those
kinds of things. I know my team has
regularly had like every project there's
been at least at some point where each
developer probably loses a day or two on
configuration issues and things like
that. Containers can really help you
move that kind of stuff forward and
homogenize the environment so you don't
have to worry about, you know, Bob's got
this setup and Al's got this other one
and Sam's got a third setup and then
none of it, you know, they and then once
they commit stuff, it doesn't always
work. Those kinds of things can get
actually repaired very quickly. And then
especially when you get into complicated
things where you have to configure an
environment, one person can do it and
then they share it out and you don't
have to worry about everybody else
having to go through that process. Now
dive into the last section because we're
we're time and just sort of cover this
in a you know a high level. Uh culture
calm promote the human side of handling
disasters. Blameless postmortems stay
calm under pressure. Nobody codes well
in panic. It's not if it's not if, it's
when and how you respond. Now, I I just
want to say
stuff happens. There is going to be
things where there's going to be uh bad
queries. There's going to be something
happens and a ser down. There's going to
be things that a drive will fail. Uh if
you go back to our season, we talked
about lessons learned from mistakes. I
think there were three or four episodes
at least that were mistakes where the
disaster recovery plan was not in place
beforehand. U test them and really this
stuff can be so simple at times. I know
especially from hustle if you're sitting
there and you're building an a Apache
web application and you're just throwing
in this one folder how hard is it to
back up that folder and put it somewhere
else and just put it on another machine
you can like if you're if somebody's
paying you to develop you should be able
to afford having your own local
development environment that is
different from production that is close
enough that you can make it work or
something along those lines especially
if you're using containers ers, you
should be able to replicate that stuff
good enough to be able to do some
testing and do even DR testing
beforehand.
More importantly, shoot us an email at
[email protected]
because we would love to hear about what
is it what is it you're going through?
What are some of your uh DR disasters?
What are some situations where something
happened and you guys and also like what
are some of the things that helped you
where like hey we did this thing that is
not the norm but it helped us through a
disaster and then some of the tools that
you use because there are a lot of them
out there and again a lot of them are
environment we'll say dependent however
there's a lot out there and it's always
good to hear about new ones. We'd love
to throw that out to the group and just
say, "Hey, by the way, here's something
else you guys can check out."
That being said, we're going to wrap
this one up. As always, you can leave us
obviously the email, but shoot us
something at developer.com. We've got
anywhere you want to. We've got
articles, all kinds of stuff there.
Leave us a a comment, leave us just any
kind of review there. Anywhere that
you're listening to podcast, if you're
finding a place for podcast and we're
not there, we'll get there. YouTube at
developreneur out at that channel
developer onx uh Facebook we apparently
have a page out there and a lot of other
places like that. So let us know
anywhere you want to find us if you
don't. We will find a way to get there
um sooner or later. It may take us a
couple of minutes. That being said, go
out there and have yourself a great day,
a great week, and we will talk to you
next time. Now, bonus material. It
actually gives us this time. Bonus
segments, and it it's disaster
snackables, a 60-cond real fail from
listeners, which we're not going to
reach out to you guys, so you can, you
know, like just chill out. Coffee cup
advice, one actionable DR tip for your
next sprint.
And so, I'm going to put you on the hot
seat. And for the next sprint,
regardless of what where somebody's at,
what would be a good DR tip that they
can verify they do or add to their their
application next time around?
Well, we kind of touched on it in this
one. The the one I would say is make
sure that your databases are being
backed up. If you have a database, make
sure that you are using your tools
correctly for the different
environments. Uh, one thing that we
didn't really touch on through this
particular episode, uh, because in most
modern day applications, we're dealing
with cloud-based or software as a
service application. So, stuff is in the
cloud. However, you will still run into
situations where you have customers with
machines in the office running their
software and services.
First and foremost, make sure you have a
power backup.
I will throw that one out there because
that one runs in more often than not.
And two, make sure your power cables are
nowhere near where your janitor might
unplug something to plug in a vacuum
cleaner.
Not bad. I have seen power issues. I had
a customergo that that was regularly the
thing. The problem was I guess part of
the issue was that we were dealing with
was actually literally in a closet. Not
a data closet, but just like a broom
closet and it regularly got turned off.
and I'd be like and I would have to
remote in. So, um I think from the data
I want to jump to the database one
because we really didn't talk about
this. Not only do you need to back up
your database but restore it, do an
actual test of it. Restore it and make
sure you can connect to your database.
There are a lot of times uh I actually
recently had an issue where I had lost a
bunch of databases because I got a
corrupted server, database server, a lot
of them. And when I brought them back,
one of the things that I'd forgotten was
I had a lot of users that I had to
create in order to actually deal with
those application. So the applications
could deal with data in this case. Uh
and there's also particular you're going
to run into things if you're in uh I
know SQL server does this a lot. I don't
know. I don't think Oracle does, but
pick on SQL Server where their ids are
goods. And so you need to make sure when
you bring stuff in in that it's not
regenerating UI like for records because
if your primary keys get broken and
you've got related foreign keys, guess
what? Those are going to be broken too.
So
test your disaster recovery. Uh the
simplest thing I would say
uh for your next sprint if you are not
backing up your source code and your
database
separately and putting them somewhere
that's on a different machine do that.
It's a very easy script to write. You
could have an AI engine can write one
for you. Whether you want to do it in
pick the language you want it written
in, it will do it and it'll get you
something close enough, take you
probably 15, 30 minutes tops to have a
nice little backup script for your
stuff.
We have not backed this up, so we're
going to go do this right away because
we never know what happens to our our
episodes. I just a little bit because I
don't think I've ever actually lost an
episode. I've always hit record in time
and so knocking on some not actual wood
but fake wood. Um hopefully that does
not happen this time. We will be back.
We're going to continue this. We've got
plenty of episodes left. We're like I
don't know only halfway through the
season or something like that. So we've
got a lot more artificial intelligence
ahead and all of the shenanigans that it
causes. So go out there and have
yourself a good one. Thanks for
watching. We will talk to you next time.
[Music]

Transcript Segments

1.35

[Music]

That was a recording in progress. All

31.519

right, this was going to be fun. Uh,

34.8

diving right in. We were just whining

36.32

before this, so we're trying to adjust

37.68

here and get back to being normal people

39.44

and not the whining little pansies that

42.879

we are. And yes, we were whining about

44.64

work. So, welcome to our world is the

47.6

same as your world probably.

50.079

So, uh this time we're going to

51.6

continue. This is the when coffee hits

the fan real talk on developer disaster

56.559

recovery.

58.16

Um, this is actually interesting because

60.32

I already threw it in there and the

61.359

first thing came back is that's a

62.719

brilliant and catchy title which I think

65.119

was generated by itself. So, it's like

67.52

hiding itself on the back while it's

69.52

like, "Wow, that's awesome. I'm glad you

72.4

sent me one of those or you used our

74.799

title." All right. Well, let's dive into

78.56

it and see how it goes with a little

80.799

three, two,

83.2

one. Well, hello and welcome back. We

86.4

are continuing our season where it is

88.479

with AI. Yes, we are building better

90.64

developers the developer podcast. This

92.799

season we are going back through prior

94.88

season going through some of the

96.4

episodes and basically shoving those

98.72

into an AI engine uh specifically chat

101.6

GPT and seeing what it gives us back as

104.32

its recommend recommendations just to

106.64

sort of see how AI does stuff. Before

110

that I need to talk about how I do

111.68

stuff. Who am I? I am Rob Broadhead. I'm

113.6

one of the founders of developer

115.2

developer also a founder of actually the

117.759

founder of RB consulting where we are

121.6

a we are you know it's a lot of

123.759

different ways to refer to it sometimes

125.6

we're called a fractional CTO CIO

127.84

sometimes we' be referred to as boutique

129.599

consulting the bottom line is we sit

132.08

down with our customers we work through

134.319

their business talk about what are they

136.16

doing what are their processes what are

137.76

their goals and their vision and then we

140.56

look at what kind of technologies do

142.4

they have what kind technologies are out

144

there for them. And we will use a whole

146.239

bunch of different tools, whether it's

147.44

simplification, integration, automation,

149.52

innovation, any of those things and more

152.64

we will use to help them craft a

155.36

essentially a recipe for success that is

158.239

unique to that company because

160.56

everybody's got a little bit of a, you

162.239

know, different take on stuff. You got

164.08

different staff, different needs,

166

different resources. We help mirror

168.4

those or actually marry those things

170.16

together, provide a technical road map

173.04

and then can help you implement it or

175.28

can let you go on and just on your merry

177.28

way following that road map. Good thing

179.92

bad thing uh very much near and dear to

182.48

my heart right now. Good thing is we are

186.239

going through and updating the town home

188.159

that we just recently got. We got a

189.519

whole bunch of little things too. So

191.519

that's great. That's awesome. And I've

192.959

got a wife that does that stuff. So

194.4

she's off doing all that cool things.

196.64

Bad thing is is that sometimes she can't

198.8

get some of those things done because it

200.239

needs as she calls it somebody with

202.64

muscles. And yes, I happen to have a

205.599

couple. So, I had to actually come to

207.519

the townhouse, which we don't have,

209.2

which is really the bad thing. We don't

210.799

really have Wi-Fi here. So, I'm working

212.64

off my phone. So, I may not have the

215.12

best connectivity this time around,

216.959

which is okay because at some point that

218.319

means I'll just block

220.319

have to see him. But you will if you're

223.12

on the YouTube. but more importantly

225.76

let him introduce him hopefully dig

228.159

himself out of the hole that I just

229.84

created for him. Apologies.

232.72

>> Hey everyone, my name is Michael Malash.

234.4

I'm one of the co-founders of Del

236.08

Developer. I'm also the founder and

238.239

owner of Envision QA where we help

240.4

startups and growing companies launch

242

better products faster. That means fewer

244.319

bugs, smoother customer experience, and

247.12

less wasted time and money. We take care

249.439

of the behind-the-scenes quality work so

251.36

teams can focus on building and scaling.

253.92

Learn more at envisionqa.com.

257.44

We also offer other things like uh

260.079

software assessments, technical

261.759

assessments, and really we can help you

264.88

understand what your software stack is

266.88

or what technologies you have if you

268.88

have no idea. So we can also help you

272.16

help yourself and improve your products.

274.56

Good thing, bad thing. good thing. Uh

278.4

had a nice restful weekend. Fourth of

280.32

July was a lot of fun. Got to catch up

282.16

with some old friends and uh see some

284.8

people we haven't seen in a while.

286.639

Weather was absolutely great. Uh it

289.04

rained a little bit, but I have to say

291.12

that was probably the first week we've

292.479

had less rain and more sun, so that was

295.199

a lot of fun. uh bad side of that. Uh

300.479

we were at the riverhouse and of course

303.28

as we're going through all the checklist

306.08

uh we had to get some repairs done as

308.639

well. So we had to replace a couple

310.32

toilets and uh garbage disposal went out

313.36

and then of course some plumbing issues.

314.96

So uh good with the bad.

319.44

>> Well, we have nothing but good ahead of

321.44

us today. We're going to take the

323.199

episode that was called was I guess is

326

originally was called when coffee hits

327.84

the fan real talk on developer disaster

330.4

recovery and so we have thrown this into

333.52

chat GBT and it comes back with that's a

336

brilliant and catchy title because it

339.12

likes us. It was going like neutral for

341.68

a little bit. It's come back to I think

343.12

it heard me talking about it. It's

344.56

probably out there like sourcing data

346.4

and it's like, ooh, we're being, you

348.16

know, we're getting shade thrown at us.

349.759

So, let's come back to this like little

351.759

bubbly thing that it likes to do. So,

355.12

let's dive right into this one. This

357.6

one's interesting. So, they add a cold

359.52

open episode structure. Cold open, one

361.6

to two minutes. Describe a humorous or

364.72

dramatic coffee spill scenario. For

367.039

example, you just push to production. CI

370.88

passed. You sip your coffee and then you

373.12

see the site is down completely. Q theme

377.039

music. That would be a really dramatic

379.52

entrance into our podcast that I have

382

never actually done. I don't think so.

385.68

Interesting idea. Maybe in a future

387.36

season we will adjust how we come into

389.52

these things. The best we get is the

391.6

like the Christmas and Thanksgiving deal

394.24

music that we get in New Year's I guess

396.24

when we do those. Not a whole lot there.

399.12

So, all right. We missed that. All

401.039

right, we're going to go into act one,

402.56

real developer disasters, 8 to 10

404.639

minutes. Let's see if we can keep it to

406.479

that. Hook with real world stories or

409.36

invite a guest to share their worst oops

411.84

moment. Now, this is interesting because

413.44

this is actually giving me a very

415.12

different uh answer than we've gotten in

417.759

the past couple of time, well actually

419.12

now you know dozen or so times that

420.88

we've done this. So, hook with real

423.599

world stories or invite a guest,

425.68

including accidentally wiped out a

427.52

production database, hot fix that broke

429.919

everything, misconfigured cloud

431.599

resources, uh, which equals $10,000

434.88

overnight, copied pasted secret keys

437.28

into a public repository, actually

439.44

spilling on a keyboard mid deploy. It is

443.12

amazing

444.639

how many of those things I think I think

447.84

there's one maybe that I haven't had.

450.479

Don't think I've actually spilled coffee

452.479

on a keyboard mid deploy. I don't know

455.599

that I've ever actually because I don't

456.96

drink coffee that I've actually spilled

458.24

it. Uh I think I've sp stuff on

460.8

keyboards and lost keyboards but deploy.

464.24

So I have like one of the five that they

466.72

give us that has not happened.

468.72

Misconfigured cloud resources. I'm just

470.96

going to jump on that one real quick

472.08

because that's like it's not necessarily

476.08

a disaster per se, but it can be a real

480.479

pretty darn close to it. And it's

482.479

actually I think it's more common. I've

484.08

seen it a lot of times. It's probably

485.44

more common than some of these other

486.96

we'll call them, you know, in quotes

488.4

disasters.

489.919

And it really comes down to, and I I

493.919

hate to throw them under the bus, but

495.759

you know, Amazon, Google, Microsoft,

498.8

when you build with their tools, with

501.84

their cloud tools, when you build

503.44

environments, they build them, which is

508.24

I'm going to give them, you know, a

509.919

little bit of grace and and forgiveness

511.599

here because they build them to be

513.919

essentially an enterprise level

515.76

solution. The problem that we have a lot

518.8

of times particularly as side hustlers

521.039

and developers and things like that is

522.64

we don't need something of that level.

525.44

So for example

527.839

I used and this goes back to when we're

530.24

doing our uh our learn to you know learn

533.92

your uh learn to develop an internet

536.72

launch your internet business. That's

538

it. Uh we're doing that. One of the

539.839

things we do is we set up we use Amazon

542.08

we do an EC2 instance and we do

543.6

WordPress. Well, at one point I decided

545.68

to use the Amazon tools to do it to make

548.48

it easier. And it set up a web pre a

552.56

WordPress environment, but it included

556.32

um it went out and did route 53 for SSL.

560.64

It had a secure C, which was I didn't

562.88

even need, but it was like, you know,

564.16

that's 100 bucks right there. Um it it

567.04

spun up like three different servers. I

569.36

think there was like actually there was

571.2

two front-end web servers. There was one

573.2

back-end database and they were all

574.8

connect an RDS database and then there

576.8

was a load balancer and then there was

579.36

like some extra a couple of extra file

582.399

servers out there just for media and

585.12

stuff like that which if you've got a

587.04

nice big WordPress installation awesome

591.04

the like daily cost of that thing that

593.68

it generated for me was I think wasn't

596.08

huge but it was like 50 or 60 bucks

598.16

maybe a h 100red bucks a day if you're

601.44

blogging and you're just starting out,

604

you're not going to generate a hundred

605.68

bucks a day to support your, you know,

608.48

your and we were sitting there launch

611.44

your internet business on, you know,

613.279

basically less than 50 bucks. So, that

616.16

would have completely blown it all up.

617.68

So, we immediately said, "No, that is

619.839

not an option that we're going to

621.36

suggest to people." And I don't want to

622.959

throw Amazon under the bus because I

624.32

have done the same thing with Azure,

626.16

with um or with Google. It's like they

630.48

they what they build makes sense, but it

633.68

also doesn't often make sense for you if

635.92

you're just like, you know, playing

637.68

around with it, doing some development,

639.279

some testing and things like that. So,

641.519

minor disasters, but they can cost you.

643.92

And I have more than a couple times

645.6

spent a little more money than I wanted

647.279

to on uh places like especially like

650.88

cloud providers and things like that.

653.44

I will now toss it to you because I'm

655.519

probably almost burned through all of

656.88

our 8 to minute 10 minutes already.

660

>> Yeah. So on this one, I'm going to take

661.92

the database one because

664.88

so many times that has bitten me in the

667.44

butt because it's like okay, you're

669.04

working and you may think you're in

671.12

production or you may think you're not

672.56

in production, you're on a different

673.92

machine depending upon how you interface

676.72

with the database. If you're like

678.32

logging in through command line

679.6

interface,

681.519

they all look the same unless you go do

683.68

some custom coloring to make sure that

685.76

the system you're connecting to is the

687.92

color for what you're on. You know,

690.24

green is good, yellow is worn, you know,

692.399

like QA and red is prod. Make sure you

695.2

don't break anything.

697.2

I'm just going to go a little kind of

698.8

quickly through this one. So with this

702.959

I will start with kind of the what to

706.399

what you should be doing first. We time

708.8

wise you don't always have this but

710.8

always back up the system that you're

712.56

working on. Make sure you have data

714.079

backups or system backups. Uh make sure

717.36

that the tools that you're using has

719.6

those color features that I was just

721.6

mentioning. Also make sure that if you

724.079

are using tools, make sure you turn off

726.959

autocommit for any environment other

729.44

than dev.

731.76

If you make a mistake, you can easily

733.76

recover if autocommit is off. It may

736.48

take longer but it's can save your butt

738.8

so many times uh it won't even The other

742.8

thing is

744.8

746.72

again make sure that when you are

750

working or you're making the changes to

752

production make sure that you've tested

753.76

these scripts out on lower environments

755.92

first test them on a dev environment

758.32

test them on a QA environment test them

760.48

on more than one environment to make

762.639

sure that they work the way you expect

764.72

them to

765.92

Maybe write up some testing depending

768.32

upon what language you have to make sure

770.16

that hey you've got the data there. Uh

772.56

maybe write some um sort procedures to

775.36

do some uh testing for you. Sometimes

778.32

even having sort procedures doing SQL uh

781.76

updates for you is a better idea because

784.72

you can establish rules that if failed

787.92

you can run everything as a single

789.36

transaction and roll it all back. uh or

792.24

you could once it's finished you can do

794.16

some checks in there and if those checks

796.079

pass cool. If those checks fail it could

798.639

also roll back. So you can kind of build

801.04

in some disaster recovery with this. But

803.44

I I just have to say more times than not

807.04

you're going to be in a hurry. And all I

810.639

have to say is before you do delete,

812.88

update, you make a change, make sure you

816.16

know what system you're on because you

818.88

can make a mistake and be hurting

821.44

tomorrow.

823.12

Other thing is back up the even if you

825.44

don't have a system backup, back up the

826.959

table that you're working on. So

828.56

hopefully you at least have a snapshot

830

and you can fix the data quickly. Be

832.959

warned, triggers and things of that

834.8

nature will make that harder. it's

836.72

better to do a full database backup

838.959

before you make changes.

842.32

>> A couple things I want to throw on that

843.76

is that um yes, it's you you definitely

847.36

want to do it in a call lower level

849.92

database as opposed to production. Um

852.959

one of the things I was taught years ago

854.16

as a DBA is that whatever you're going

855.76

to do, if you're going to do an update,

856.959

a delete, or anything like that that's

858.32

going to change stuff, even inserts, uh

860.8

depending on how you're doing the

861.68

insert, is do it as a select first. I

864

know this is a little geeky database

865.44

stuff, but like if you're going to do

867.12

delete blank from blank, do the select

869.6

first and take a look at that and make

871.12

sure that your row count looks like it

873.68

should. Um, if you are typing stuff,

876.8

even though like always turn your auto

879.199

commit off. That is a very important

881.36

thing. U make sure if you're writing

883.839

something that the first thing you do is

885.279

your wear clause. I have found too many

887.279

times that I've tried to do something,

888.8

I've been doing something and then you

890.16

accidentally like flub a key and then

892.88

you have to, you know, hopefully you

894.399

just have to roll it back. Uh definitely

896.48

use color coding wherever possible. Even

898.24

if I'm using command line shells, I will

900.16

use stuff that I have. Uh granted, I'm a

902.56

Mac person, but I know you can do this

904.48

with other systems as well. You can

907.04

change the background of the command

908.639

line environment that you're in. So if

910.32

you tet into something, you can actually

912.24

change that to make sure that you're in

913.519

the proper shell. And as Michael said,

916.16

typically, you know, green is, you know,

918.56

development, yellow for test and red for

921.279

production. And please don't, you know,

923.44

mess with that kind of stuff. Um, also I

926.959

will just say do not use root anywhere

930.72

ever if you can possibly avoid it. I

932.72

know there's some places you can't. uh

934.56

in those cases don't be afraid to use

936.56

things like Docker and other you know

939.199

and things like that and ways to and

941.44

even AI to generate essentially your own

944.24

development environment even when you're

946

dealing with production environments

947.279

that are just too deal find a way to

951.68

mimic that I have I had a customer for

954.079

years that was uh there was no way I was

956.72

going to pull all of their data down

958.079

besides the fact that it was like you

959.519

know private and I didn't want to deal

960.639

with security it was also just too much

962.88

stinking data

964.079

So I did pull down the entire structure

966.639

and then used a couple of tools to

968.32

generate you know mock data. That kind

971.199

of stuff will help you immensely and you

973.6

can always pull down some specific

975.04

examples as well to just test your stuff

977.04

beforehand. Moving on because this is

979.44

actually I'm going to fly through this

980.399

one because this actually some stuff

981.519

that we really just touched on sort of

984

like AI was Michael's not looking but it

986.56

was like you know thinking the same

988.32

thing. So what should happen? Developer

990.639

dea disaster recovery 101 break it down

994

like a postmortem for focusing on how to

996.32

build better recovery habits

999.199

prevention back feature flags only

1003.12

access and prod detection monitoring and

1007.12

alerting for example Sentry Data Dog

1008.959

Prometheus blah blah there's so many

1011.199

tools out there depending on where

1012.399

you're at what your environment is there

1014.32

are monitoring and logging tools that

1016.24

can help you out. Uh there's even some

1018.32

things that will give you like warnings

1019.839

of hey you are doing something that is

1021.68

going to affect more than five rows. Are

1023.36

you really really sure you want to do

1024.72

that? Uh monitoring alerting is great.

1028

Uh rollbacks we've talked about canary

1031.28

deployments hot fix versus proper

1033.439

patching. Uh this is again is the kinds

1035.919

of things is where you're going to go

1037.679

test it out before you actually do it.

1041.12

Uh communication have an on call policy

1043.52

transparent incident update. This is

1045.76

important. I think even if you're in a

1048.24

small team, you should be regularly

1051.12

letting people know when you screwed up

1053.6

because I like I have a team that has

1055.6

got some people that are newer

1056.88

developers. Uh a lot of times we're new

1059.039

into an environment, things like that.

1060.48

It is very helpful to have in our

1062.72

standups mentions of by the way I did

1065.12

this and it turned out into something

1066.64

that I did not expect because that helps

1069.039

other people know that what the

1070.799

relationships are in your data what some

1072.96

of the you know the potholes are and

1075.44

things like that. Uh runbooks

1078.08

documentation runbooks disaster recovery

1080.64

plans and retrospectives. Uh I highly

1083.84

recommend anywhere that you can use

1085.76

automation of any sort including

1087.44

runbooks including even the things we've

1090.32

talked about before like ant shell

1092.64

scripts you know if you're depending

1094.16

what your environment is things like

1095.28

Maven uh even like continuous you know

1098.24

CI/CD tools like your Jenkins and things

1100.16

like that pipelines all those even a

1103.12

platform as a as code those kinds of

1105.6

things the more you can automate it the

1108.16

better that is one of the reasons I've

1109.84

become more and more a fan of using

1111.52

things like Docker uh Docker Desktop for

1114.96

current environments because then you

1116.32

can just like code the whole thing out

1118.16

and then if you need to replicate an

1120.16

environment, bam, just run it and you're

1122.16

off and going and you got a lot of other

1124

containers like that. Now dive into the

1125.919

next section which is act three tools

1128.08

and techniques for DR readiness. This is

1130.08

where I want us I think we're going to

1131.2

have some good conversation here. Share

1133.12

tools and frameworks, infrastructure,

1134.88

terraform, anible, kubernetes,

1136.64

helmcharts, uh, CI/CD, GitHub, actions,

1139.44

GitLab, Jenkins, backups, automate

1141.52

snapshots, DR regions, testing, chaos

1144.4

engineering like Gremlin, Netflix's,

1146.4

Chaos Monkey, load testing, Ksix, and

1148.88

Lo, Locus. There are a lot of tools that

1153.44

we just mentioned that we I just

1155.679

mentioned in that little list that I

1158.16

don't think we use enough unless we are

1161.039

we'll use them at work. We'll use them

1162.64

if we're working for a company or an

1164.48

employer and they're the kinds of things

1166.4

it's like because sometimes they require

1168.799

some time and some money and things like

1170.559

that set up and we don't necessarily

1172.32

have it but I think if you can make that

1174.799

part of your technical roadmap as you

1177.039

grow out your side hustle your these

1180.64

things that you you definitely need

1182.48

obviously obviously but I'm going to

1184.72

start backups like automating snapshots

1186.88

and and making sure this is if you're

1191.039

using cloud

1192.4

I beg you to make sure that once you put

1195.36

that server, even if it's, you know,

1198

Bob's pet store street, once you set up

1200.32

that server for him, take a snapshot and

1203.52

then just hold on to that because what

1206

you can do is from that snapshot, you

1208.64

can generate another instance almost

1211.28

instantaneously.

1212.799

I would actually say take that and share

1214.72

it out to another zone. So if, for

1217.6

example, if they're on an East Coast

1218.88

zone and it goes down, have it out there

1220.4

in the West Coast zone. So you can spend

1221.919

something up very quickly. This is

1223.52

called disaster recovery. It's at a very

1225.76

low and it's you don't have to spend

1227.919

much time or money to do so. Once you do

1230.32

it, I think you will realize how simple

1232.559

and addictive it is. That's where I'm

1234.64

going to stop because I know Michael has

1236.32

dealt with some of these things as well.

1237.919

And I'm curious where you want to go on

1239.76

this long list I just provided.

1242.08

>> So I'm actually going to just kind of

1244.799

keep it simple. So from a developer

1247.36

perspective,

1248.96

if you're new to if a lot of these terms

1251.2

are new to you and or you've touched on

1253.28

one thing or another, but not all of it,

1256.799

start playing with containers. If you

1259.039

are not using containers today, start

1261.52

now.

1263.039

I I I basically beaten the horse dead

1265.76

with the kitchen sink idea. But you can

1268.159

build a kitchen sink application or

1270.88

model using containers and these

1273.36

frameworks like load balancers, uh like

1276.88

disaster recovery, spinning of

1278.88

production environments.

1280.88

Essentially, anything that you do,

1284.559

throw it in a container, replicate it,

1286.88

make sure it can be replicated. You can

1288.72

share those containers between

1290.32

developers. You can then take that

1292.48

container, stick it, literally take that

1294.799

image, push it up to another environment

1297.12

and see if it scales. If it doesn't,

1299.2

well, make it beefier. See if you can

1301.52

play with it. Containers let you do so

1303.919

much so quickly and they're really very

1308

quick to build and easy to throw away.

1310.08

So, if you mess it up, nope, drop it,

1312

rebuild it, start again. These are also

1314.64

very good ways to kind of centralize

1318.96

your development environments. So you

1321.2

can create one development environment,

1322.88

get all your tools in there for your

1324.24

developers for what they need, share it

1326.159

out to everyone. They can use that. They

1329.6

break it, drop it, rebuild it. All your

1332.08

code should be in a code repository. So

1334.559

you should be able to pull it up and

1336

down. Backups are easy. You just take a

1338.24

snapshot. Done.

1341.2

Literally with in the world of testing,

1344.08

this is what should be happening and a

1347.2

lot of your big corporate companies are

1349.76

using things like uh you know um is it

1352.4

Sonar Bayer uh test GE uh Cucumber they

1357.2

use these things but they also use

1359.44

cloud-based uh grid kind of ideas to

1363.28

differentiate and test multiple systems

1365.44

at once to test multiple you can run

1368.32

multi-threaded tests against your

1369.76

current systems.

1372.159

start playing around with that if you're

1373.84

not doing that now. These are things

1376.24

that you should and really need to learn

1378.96

and know to be able to succeed in

1381.919

enterprise and also you want to make

1383.76

sure that your products that you build

1386

for your customers can scale and this is

1388.08

the best way to do it

1390.559

>> and it really is it's one of those that

1392.64

I don't know how often things have

1393.919

gotten lost in configuration uh there

1396.159

that as developer spent over the I lost

1401.12

a year my entire time doing

1403.6

configuration issues and fixing those

1405.6

kinds of things. I know my team has

1407.679

regularly had like every project there's

1409.679

been at least at some point where each

1411.84

developer probably loses a day or two on

1414.08

configuration issues and things like

1415.44

that. Containers can really help you

1417.44

move that kind of stuff forward and

1419.28

homogenize the environment so you don't

1421.28

have to worry about, you know, Bob's got

1423.6

this setup and Al's got this other one

1426.08

and Sam's got a third setup and then

1427.919

none of it, you know, they and then once

1429.679

they commit stuff, it doesn't always

1432

work. Those kinds of things can get

1434.08

actually repaired very quickly. And then

1435.52

especially when you get into complicated

1437.28

things where you have to configure an

1439.679

environment, one person can do it and

1441.6

then they share it out and you don't

1442.799

have to worry about everybody else

1444

having to go through that process. Now

1446.64

dive into the last section because we're

1449.52

we're time and just sort of cover this

1451.84

in a you know a high level. Uh culture

1455.44

calm promote the human side of handling

1457.84

disasters. Blameless postmortems stay

1460.88

calm under pressure. Nobody codes well

1462.72

in panic. It's not if it's not if, it's

1466.72

when and how you respond. Now, I I just

1470.159

want to say

1472.48

stuff happens. There is going to be

1474.64

things where there's going to be uh bad

1476.799

queries. There's going to be something

1477.919

happens and a ser down. There's going to

1480.159

be things that a drive will fail. Uh if

1482.96

you go back to our season, we talked

1484.4

about lessons learned from mistakes. I

1487.679

think there were three or four episodes

1488.88

at least that were mistakes where the

1490.32

disaster recovery plan was not in place

1492.799

beforehand. U test them and really this

1497.039

stuff can be so simple at times. I know

1501.039

especially from hustle if you're sitting

1503.2

there and you're building an a Apache

1506.24

web application and you're just throwing

1508.559

in this one folder how hard is it to

1511.6

back up that folder and put it somewhere

1513.52

else and just put it on another machine

1516.4

you can like if you're if somebody's

1518.64

paying you to develop you should be able

1521.039

to afford having your own local

1522.64

development environment that is

1523.919

different from production that is close

1525.44

enough that you can make it work or

1527.76

something along those lines especially

1529.12

if you're using containers ers, you

1530.72

should be able to replicate that stuff

1533.039

good enough to be able to do some

1535.36

testing and do even DR testing

1538.24

beforehand.

1540

More importantly, shoot us an email at

1542

[email protected]

1543.6

because we would love to hear about what

1545.6

is it what is it you're going through?

1547.6

What are some of your uh DR disasters?

1551.2

What are some situations where something

1553.44

happened and you guys and also like what

1556.4

are some of the things that helped you

1558.64

where like hey we did this thing that is

1560.72

not the norm but it helped us through a

1563.2

disaster and then some of the tools that

1564.96

you use because there are a lot of them

1567.12

out there and again a lot of them are

1570.159

environment we'll say dependent however

1573.2

there's a lot out there and it's always

1574.64

good to hear about new ones. We'd love

1575.919

to throw that out to the group and just

1577.2

say, "Hey, by the way, here's something

1578.72

else you guys can check out."

1581.52

That being said, we're going to wrap

1583.12

this one up. As always, you can leave us

1585.36

obviously the email, but shoot us

1586.88

something at developer.com. We've got

1589.279

anywhere you want to. We've got

1590.799

articles, all kinds of stuff there.

1592.159

Leave us a a comment, leave us just any

1595.039

kind of review there. Anywhere that

1596.559

you're listening to podcast, if you're

1598.08

finding a place for podcast and we're

1600.32

not there, we'll get there. YouTube at

1603.2

developreneur out at that channel

1605.279

developer onx uh Facebook we apparently

1608.24

have a page out there and a lot of other

1611.36

places like that. So let us know

1614.24

anywhere you want to find us if you

1615.679

don't. We will find a way to get there

1618.559

um sooner or later. It may take us a

1620.48

couple of minutes. That being said, go

1622.24

out there and have yourself a great day,

1623.84

a great week, and we will talk to you

1626.799

next time. Now, bonus material. It

1630.159

actually gives us this time. Bonus

1632.08

segments, and it it's disaster

1634.64

snackables, a 60-cond real fail from

1637.2

listeners, which we're not going to

1638.72

reach out to you guys, so you can, you

1640.32

know, like just chill out. Coffee cup

1643.2

advice, one actionable DR tip for your

1646

next sprint.

1648.64

And so, I'm going to put you on the hot

1650.08

seat. And for the next sprint,

1652.32

regardless of what where somebody's at,

1654.32

what would be a good DR tip that they

1657.12

can verify they do or add to their their

1661.36

application next time around?

1665.919

Well, we kind of touched on it in this

1668.08

one. The the one I would say is make

1670.64

sure that your databases are being

1672.08

backed up. If you have a database, make

1674

sure that you are using your tools

1675.84

correctly for the different

1677.76

environments. Uh, one thing that we

1680

didn't really touch on through this

1682.08

particular episode, uh, because in most

1686.32

modern day applications, we're dealing

1688

with cloud-based or software as a

1690

service application. So, stuff is in the

1691.84

cloud. However, you will still run into

1694

situations where you have customers with

1696.559

machines in the office running their

1698.88

software and services.

1701.52

First and foremost, make sure you have a

1704.24

power backup.

1706.64

I will throw that one out there because

1708

that one runs in more often than not.

1709.76

And two, make sure your power cables are

1711.52

nowhere near where your janitor might

1713.84

unplug something to plug in a vacuum

1715.44

cleaner.

1717.6

Not bad. I have seen power issues. I had

1720.64

a customergo that that was regularly the

1723.279

thing. The problem was I guess part of

1725.36

the issue was that we were dealing with

1728.32

was actually literally in a closet. Not

1730.64

a data closet, but just like a broom

1732.399

closet and it regularly got turned off.

1734.64

and I'd be like and I would have to

1735.919

remote in. So, um I think from the data

1739.679

I want to jump to the database one

1740.799

because we really didn't talk about

1741.76

this. Not only do you need to back up

1744

your database but restore it, do an

1746.88

actual test of it. Restore it and make

1749.36

sure you can connect to your database.

1751.12

There are a lot of times uh I actually

1752.96

recently had an issue where I had lost a

1756.08

bunch of databases because I got a

1758

corrupted server, database server, a lot

1760.96

of them. And when I brought them back,

1763.919

one of the things that I'd forgotten was

1765.36

I had a lot of users that I had to

1767.52

create in order to actually deal with

1770

those application. So the applications

1772.159

could deal with data in this case. Uh

1774.32

and there's also particular you're going

1776.08

to run into things if you're in uh I

1777.919

know SQL server does this a lot. I don't

1779.52

know. I don't think Oracle does, but

1781.679

pick on SQL Server where their ids are

1784.32

goods. And so you need to make sure when

1786.559

you bring stuff in in that it's not

1788.64

regenerating UI like for records because

1792.799

if your primary keys get broken and

1794.64

you've got related foreign keys, guess

1796.64

what? Those are going to be broken too.

1798.32

1799.84

test your disaster recovery. Uh the

1802.64

simplest thing I would say

1805.279

uh for your next sprint if you are not

1807.76

backing up your source code and your

1810.399

database

1812.159

separately and putting them somewhere

1814.799

that's on a different machine do that.

1816.88

It's a very easy script to write. You

1818.399

could have an AI engine can write one

1820.32

for you. Whether you want to do it in

1822

pick the language you want it written

1823.36

in, it will do it and it'll get you

1825.6

something close enough, take you

1827.52

probably 15, 30 minutes tops to have a

1830

nice little backup script for your

1832.159

stuff.

1833.76

We have not backed this up, so we're

1835.84

going to go do this right away because

1837.2

we never know what happens to our our

1840.159

episodes. I just a little bit because I

1843.6

don't think I've ever actually lost an

1845.36

episode. I've always hit record in time

1848

and so knocking on some not actual wood

1851.6

but fake wood. Um hopefully that does

1854.64

not happen this time. We will be back.

1856.799

We're going to continue this. We've got

1858.48

plenty of episodes left. We're like I

1860.159

don't know only halfway through the

1861.52

season or something like that. So we've

1863.12

got a lot more artificial intelligence

1865.36

ahead and all of the shenanigans that it

1867.84

causes. So go out there and have

1869.2

yourself a good one. Thanks for

1870.88

watching. We will talk to you next time.

1875.59

[Music]