Summary
In this episode, we discuss the challenges of managing hardware resource issues in software development. We explore the importance of considering system resources when dealing with large datasets and the impact of moving data between systems. We also discuss the need to understand how our code change will affect the ecosystem in a distributed environment.
Detailed Notes
The hosts, Rob Wright and Michael Melasch, share their experiences with managing hardware resource issues in software development. They discuss the importance of considering system resources when dealing with large datasets and the impact of moving data between systems. They also explore the need to understand how our code change will affect the ecosystem in a distributed environment. The conversation is filled with real-world examples and practical advice for developers.
Highlights
- Debugging hardware issues is not just about code, but also about understanding system resources and how they interact.
- When dealing with large datasets, it's essential to consider the hardware and system resources that will be required.
- Moving data between systems can be challenging, and it's crucial to consider the impact on system resources.
- In a distributed environment, it's essential to understand how your code change will affect the ecosystem.
- Don't assume your app will always run in a connected state; consider the impact of a disconnection on system resources.
Key Takeaways
- Consider system resources when dealing with large datasets.
- Understand how your code change will affect the ecosystem in a distributed environment.
- Don't assume your app will always run in a connected state.
- Consider the impact of moving data between systems on system resources.
- Use profilers to monitor code and identify memory leaks.
Practical Lessons
- Rewrite code to improve efficiency and reduce resource usage.
- Use caching and other techniques to reduce the load on system resources.
- Consider using in-memory databases and message queues to improve performance.
- Regularly monitor system resources and adjust code as needed.
- Test code in different environments to ensure it will work in production.
Strong Lines
- Debugging hardware issues is not just about code, but also about understanding system resources and how they interact.
- When dealing with large datasets, it's essential to consider the hardware and system resources that will be required.
- Moving data between systems can be challenging, and it's crucial to consider the impact on system resources.
Blog Post Angles
- The importance of considering system resources in software development.
- Real-world examples of managing hardware resource issues in software development.
- Practical advice for developers on how to improve efficiency and reduce resource usage.
- The benefits of using caching and other techniques to reduce the load on system resources.
- The challenges of debugging hardware issues in software development.
Keywords
- software development
- hardware resource issues
- system resources
- distributed environment
- ecosystem
Transcript Text
Welcome to Building Better Developers, the developer podcast where we work on getting better step by step professionally and personally. Let's get started. Well, hello and welcome back. We are sorry, I didn't mean to smack too close to the mic, but that happens sometimes. We are back with Building Better Developers yet another episode where we're talking about stuff that we've run into. This episode we're going to go a little bit, some of it's very current for Michael, some of it's not so much for me, but some of the things that I've hit in the past. Really it gets into debugging stuff when you get away from it's like not just code, it's hardware issues and it literally is like memory and resource issues and things like that, which you may not run into as much depending on what your applications are because these a lot of stuff that is very much plays well within the norms. But if you start doing big data, if you start doing data processing, if you've got an app that's got a lot of users or a lot of processing, a lot of data, any of that's a lot of stuff going on, things can happen. So we're going to talk about that. First, I'm going to introduce myself. My name is Rob Wright and I'm one of the founders of Develop a Norm. I'm also a founder of RB Consulting, all things about simplifying, integrating and just automating your software, your solutions to make your business run smoother. On the other side, this time I'm not even going to mention his name. I'm going to let him do the whole introduction. So go for it, sir. Hey everyone. My name is Michael Melasch. I'm another co-founder of Develop a Norm. I'm also the founder of Envision QA where we build custom software for small to mid-sized businesses and healthcare clinicians. We also offer consulting services and testing services for your needs. Awesome. Now we're going to get right into one of those cases where you may need some of those testing services and particularly it is with the big behemoth of stuff that we can sometimes run into in the software world is the hardware stuff, the resources that are related to like processing and the ones related to well and storage and memory. Now the fun ones are when storage and memory cross lines and this is where you see some really neat stuff. And these are the kinds of things that it's just we're going to give you some like they're not I don't know if they're necessarily red flags but there's some some things that are maybe some notes that if you are in a testing situation or in a situation where hey it works on my machine or usually and now it doesn't and the code should be the same when you get a sense like that doesn't make sense it's not a different system. Here's some things to go look at. Maybe there's something going on that's a little deeper than that where you're going to want to have some sort of you know some more detailed logging particularly from the operating system side the platform side to see what is actually going on. Now one of the like the mother of all break your application things that happens is when you trigger when you when you get past physical memory and it starts to cache stuff because what happens is now you've got physical memory that's just like really it's like boom boom boom boom boom boom. Now it's gotten better because SSD drives and some of that stuff is very fast and there's a lot of intelligence build around those things often so that you can so it's going to keep the the most recent data like upfront so if you have something that if you're caching your memory the system memory into like an SSD drive or really high speed drive you may not notice the difference as much but definitely if you're old school and you go to one of the old you know disk driven type hard drives the processing speed goes from microseconds to multi seconds and it literally I don't know how many times it was stuff where it's like you know I've just picking up like we have let's say we have two gig of memory on our machine and everything was awesome until we had it grow beyond two gig because as soon as it did that it had to swap stuff out and now you have the swap cost so now it's pushing stuff out to that hard drive and then having to pull stuff back and that's adding some processing time and then it just sort of like feeds on top of itself. Now in a real world what you will often see is that that really spirals out of control because what you have is you have people are pulling reports and they're expecting reports to come let's say you know picking up every 10 seconds they should be able to get the report back and see their data and what happens is people get impatient so when it triggers that and so now it goes from 10 to say 15 seconds people hit report again and so there are times I've seen it's like if you don't have intelligence around that then it'll be you know I just created eight instances of trying to run this report that is doing table scans and if you don't know what that is go look it up and realize that's something that you should never do but maybe doing table scans and the next thing you know it's like your time has gone from 10 seconds to 10 hours and it like everybody freaks out everybody panics we're somewhat I mean it's somewhat reasonable but it's also it's like hey just chill if we all we need to do is make sure that we don't hit that limit or sometimes it's just a matter of like hey let's clean up our queries a little bit or what we cash or what we store now sometimes that's harder than others sometimes the easiest way is it just like slap a bunch of hardware on it say boom we're gonna get more memory we're gonna get more processors and we'll take care of it that way but honestly it is safer better and more reliable to fix it in the software in the code itself to do the software fix now sometimes that means rewriting and so I'm gonna give one more one more thing that I've run into the past before I toss it back over to that guy I didn't name earlier and that is sometimes you don't want you want to change and this goes into like being a better developer you want to change your approach or your format your what you're doing completely the framework it could even be the foundation of what you're doing and a good example of this for me is I was working for a company many years ago and they were doing these very large ACH files which is basically just a lot of fixed length data that's coming across we're talking hundreds of thousands of records and I was hitting this problem is that we I originally had like a nice little it was a job application that would just pull this thing in spin through it really fast and kick it back out and did what it needed I was throwing it out to a database it just stored it out to the database cool and it was doing all Well once it got to a certain size what happened is it was sucking too much of the file in and then it ran out of memory and then we started running into problems now I'd switched it first thing I did I was like okay let's try to do this differently so it was like pulling a line at a time I was like okay we're just going to process we're not gonna have the whole file in we're just gonna have a pointer we're gonna move through still ended up with problems because what we're having to do is actually read data and then refer to other data and so there was a lot of relational things that we had to deal with within the data so that to me was included what if we try a relational database and what I ended up finding out was that if I use and this was just using my sequel I think it was my CEO may have been Postgres it was one of those where I used their import there it was a it was a stored procedure that allowed you to read from a file and so what I did is I would read from the file created create a temporary table shovel that stuff in the table do a bunch of manipulations with it and then peel it back out I think I was even creating indexes on the fly so I would pull it in index certain things basically do a big you know met some mass updates and then repull it back out and I went from hours to less than 30 seconds to run through this stuff and it was because the tool that I had originally thought I was going to use and so it worked fine to a certain extent and then it went it was like it was limited after that ain't going to work and I ended up rewriting it and went a lot better and that is why more of the story is make sure that you don't get stuck in a single technology a single solution or just insist that whatever you're writing is the way to do it until you you know you verify that so that's my little nugget for the day let's toss this back over to you so I'm going to take you to a different direction so you talked more single system based you know talking about writing your code doing the database side of things moving it to the data moving the file important to the database so it's kind of a paradigm shift or moving the load essentially to a different location now in today's world we hear a lot about distributive systems we hear a lot about micro applications micro systems and as we break things down smaller into their smaller components and we break these out in the different applications api's deployments you know s3 with amazon you know message queues you know there's a whole lot of different systems out there that we have to integrate with and it can actually mask a lot of the problems that we potentially have either with our code with the configurations we have that for these systems that are deployed and it's rather interesting so it this I'm going to tell you a story of a situation I was in a couple years ago and the application I inherited when I became manager of this team I found out that we had essentially our core application was on fire it literally went down once a day or two to three times a day this is a health care application this is actually pulling healthcare information in from external systems in our system so this could actually impact patients lives so we needed to come up with a way one we had to figure out why it was going down so much and the very first thing we had to look at was is it our code well the system that was crashing was called rhapsody and we found out that it's essentially just a message queue that's pulling information in and passing it over to us yes it did some things on its side but it was essentially being just used as a pass through so as we were looking at that we literally had just taken out all of the logic and just moved message in message out so we would just pass it over it still kept crashing so we literally just moved it down to one process so it did one thing we still were essentially going down because of volume so the other thing we looked at was okay what is our network threshold are we just essentially having too much information coming in and it's a big pipe over here and a small pipe over here and that was part of it so we did increase the output flow through the network so we had more data going out than we did initially and then we essentially found out that we were running on a machine that had two core processes maybe four gigs of RAM and it was nowhere near enough to handle the amount of data that we were consuming the other thing that no one happened to think about was how much disk space we had because of the large volume that was coming in we were actually burning out a disk space on top of the memory issues on top of the CPU issues so it's not always your applications it could essentially be that the system that this was built for has aged out to the point that you need to upgrade the hardware that's the backbone of the system so that was one story the second story is now in a more distributed environment with AWS we have lots of applications all over the place and with the micro system approach you typically want to break your application to be very small to consume a very small amount of resources so it'll run on a very small build or AWS like EC2 instance or Windows instance or something like that so the smaller instance that you can deploy to the less money it's going to cost to keep that application up and running the problem with that potentially is you could scale very small your application works initially but then again you could run into that situation like we did with the Rhapsody system where you're the load of the application or the applications essentially outgrown its hardware it's outgrown the container that it's being deployed to recently I ran into a situation where we to address this they were essentially just kicking up the memory of our container thinking that oh we needed more memory to keep working you know to keep things happy well at 16 gigs we're still going down for one particular transaction that goes through doing some research like we've talked about before so this is one of those situations where the customer does one thing it breaks we as developers do something the same thing literally the same thing and it works so now it's like okay what is different between the two environments well as developers we typically have these supercomputers or very beefy machines that can do just about anything that we wanted to do but again we're deploying to containers that are meant to be very small they could you know a lot less resources so they cost less money ideally what I found out was we underpowered our container to the point that we were doing so many transactions so much work behind the scenes that we essentially just seeing more CPUs not more memory so this is one of those situations where you need to understand what the CPU is used for versus the amount of memory yes we had a lot of data in memory but we're doing a lot of transactions with that so you actually need more CPU power than memory because if you try to keep all that in memory you're gonna get an imbalance and things are gonna crash however the potential you could still have a memory leak so what you're gonna want to do is you're gonna want to get things like profilers to go monitor your code as they're running and then look for things like heaps or objects in memory or memory spikes when you're doing transactions within your code so you can not just check the hardware and the physical hardware but you still can check your code to make sure that you don't have runaway code you don't have like an infinite loop that's just eating up memory and you never know what happens so I'll give you a specific review I'll give you two little quick ones too that I talked about that you reminded me as you were going through this one is hard drive space it used to be like I don't know and the back in the day back in the old days used to always find code somewhere along the way that was saying hey is there enough hard drive space like I remember going to like this is back like the old 386 dos days were like every little app that you did one of the first things that would do would be do you have enough hard drive space on it and that still exists but not to the same level in particular server applications there's a lot of times that they don't think about it because they don't care they're like hey I'm just gonna run but things like log files in particular can do two things actually log files I've sound several times that those got huge and those sucked up the all of the data for I actually have a little miniature same thing I had a little VM app or a little you know small EC2 app and it blew up because the law got too big and it and it crashed the whole system and I've also seen it when you've got upload files if you have the ability for users to upload stuff if they can upload huge images or videos or if that's part of it then you need to make sure that that is a constant check that you have in some way form or fashion to make sure that you have enough space if you'd store it actually if you start locally but also if you temporarily store it locally to make sure that those are getting cleaned out because I've seen some situations where it's it's using s3 or something like that so it's moving stuff off but sometimes it doesn't move it off fast enough or clean enough and the next thing you know you've blown that up the other thing that you can run into I just forgot but it's so it's log files and I now forgot the other one that I have run into oh it's I'm sorry the other one is the debugger itself I have run into situations where if you put something into debug mode then it causes issues it in itself because it's eating up extra processing and extra time and so I have actually seen situations where it goes it the application goes into production and it breaks because somebody left the debug flag on or it's not useful because they've got it in debug mode so when you push it to production clean it up you know don't like shut that stuff off like make sure that you've got your settings such that it's not like just dumping a whole bunch of debug information to a drive somewhere if you don't need it so don't be afraid to like have some flags or settings or something so that it only does that when you want it to which is basically you as a developer you want to see all that stuff nobody else does but final thoughts on that yeah the other things we kind of talked about them but we didn't really address the resource things but in this distributed world we live in we have things like databases we have in-memory databases we have message queues we have s3 for external data sources as developers as we write our applications unfortunately we're kind of siloed we kind of see our picture our problem that we're solving and we know we have these resources we use these resources but we don't consciously take into account what is my code change going to do to the rest of the ecosystem so typically you kind of throw it over the wall and let someone else deal with it and you don't think about it but sometimes you really need to do you need to understand how that ecosystem works so that as you're writing your code if something does happen you have a better idea of where to go within the structure to debug the problem also you don't want to necessarily fill up someone's database by leaving those log files on and how you know make someone else's day miserable because they're not necessarily in your department you know you don't want to really make your sysup guys mad so one other bonus that I just thought of while you were going through that is don't assume unless you can don't assume that you are always your app is always going to run in a connected state in particular one of the things that I'm early on when I was doing so I would do some development stuff and I would have some things break periodically and I'd be like wow like that website I'm bringing up looks horrible and what it was is because I was just instead of having the some of those like say like css files and things like that that we'd like to grab local they were on some you know cdn somewhere else and they were pulling it down and so if I wasn't connected nor were those and so you would suddenly have like keep you know you'll have libraries and stuff like that that they're not available because you're normally pulling those off the internet this also and this even if it's if you're a web application so you assume that the only way people are going to get to it is if you know the internet exists and it's working don't be afraid to pull and I would say don't be afraid but just like the smart person the wise man pulls those things locally because if that cdn goes down or that site that you're borrowing that from or you're connecting to or that you're relying on goes down you are down so the more you can bring that stuff in and make it you know so if your site's up it's up and if it's not it's not and you have less other places that you're relying on then you're going to be it's just going to be that much better however if you rely on us do that just like double down on it rely on us check us out like subscribe six different ways subscribe on your phone on your laptop on your watch whatever it is because we're going to give you good stuff and if we don't that's your fault because you didn't tell us at info at developer.com or developer.com fill out a contact form and let us know what is the good stuff that you want us to talk about how can we help you out because we're happy to help help you out this way to just like as you can see we are an endless well of full of just mistakes and issues and all kinds of problems you can run into but we might have a couple that you know be more than happy for you to suggest something so that we can make sure that it's much more applicable to you yes you right there that you're the one i'm talking to not the not the one next to you don't look behind you you you're the one that can help us out and then we get to talk to you specifically and say here's your problem and we may even use your name if you want us to or we'll use a really cool fake name if you want us to do that those are just some of the services we have to offer that being said it is time to wrap this one up we got to go we got to run so do you so go out there and have yourself a great day a great week and we will talk to you next time thank you for listening to building better developers to develop a newer podcast you can subscribe on apple podcast stitcher amazon anywhere that you can find podcasts we are there and remember just a little bit of effort every day ends up adding into great momentum and great success