When working with computers be it Windows, Linux, or Mac, there will come the point in your career where you’re going to have to deal with a system outage.  Especially if you are using an Amazon EC2 instance.  That’s because the EC2 instance runs on very limited resources.  However, it is very easy for you to bring down your server by misconfiguring an application or running too many processes at ones.  While your first gut reaction will be to panic, don’t.  In most cases, the problems can easily be solved once you know what to look for.  So, let’s start by looking at the most common EC2 problems and how to address them.

Some common EC2 problems

 

System running out of memory

Typically, memory issues present themselves in the form of a system running sluggish or not responsive.  Start by checking your available memory and then see what processes are running.  For instance, you can use the Linux command top to see real-time information on the tasks, memory, CPU and swap that are currently in use.

  • Some key switches for the top command include:
    • -h  Show the current version
    • -c  This toggles the command column between showing command and program name
    • -d  Specify the delay time between refreshing the screen
    • -o  Sorts by the designated field
    • -p  Only show processes with specified process IDs
    • -u  Show only processes by the specified user

If you are using Swap files, make sure they are still present and in use.  Sometimes when installing updates or rebooting a system, it may cause a swap file to drop.  So it’s always a good idea to check your swap status and usage.  However, even if everything looks good with the swap file, you may still be experiencing memory issues.  Therefore, you may need to increase the size of your swap file.

Here are some commands to check on the status and usage of your swap drives:

  • cat /proc/meminfo to see total swap, and free swap (all Linux)
  • cat /proc/swaps to see what swap devices are in use (all Linux)
  • swapon -s to see swap devices and sizes (where swapon is installed)
  • vmstat for current virtual memory, statistics
  • Click here to see how to increase your swap memory.

Cannot start service because it is locked

Ok, you log onto your EC2 instance, and you notice all of your services have stopped.  Yet, when you try to restart them you receive a message like “[service name] is dead but subsys locked.”  In most cases, this occurs because an update or a process fails to execute properly.  Causing the outage to occur because the system was unable to release a service or resource correctly.

In most cases, this is easy to rectify in just a few steps.  For example to fix a locked mysqld process you could try the following:

  • Start by copying the lock file to a temporary backup folder.
> cp /var/lock/subsys/mysqld /temp/mysqld
  • Once you create a backup, go ahead and delete the lock file from the subsys folder.
> rm /var/lock/subsys/mysqld
  • Make sure that you close all services that depend on the service you are trying to start.  For instance of you are running an Apache server with WordPress installed, then you will need to stop the httpd service.
> service httpd stop
  • Finally, restart your services.  Start with mysqld, then start up the services that depend on mysqld.
> service mysqld restart
> service httpd restart

If this does not fix your problem, then you will need to start looking at the logs and see if you can determine what caused the lock.  Then do some additional Googling on the error message or the lock message to see what other solutions you may apply.


How to troubleshoot common EC2 problems

Check the logs

Most OS’s and applications have some form of logging.  These log files typically contain error messages or debugging information that companies and developers use to help them debug what went wrong.  In the case of Linux, most of the system and application log files are stored in /var/log.  Although, this may not always be the case.  Some thirdparty application may store their logs elsewhere.  So if the log you are looking for is not there, then you will need to search for it.

For example, using the Linux find command combined with a piping into grep is often the easiest way to find files on a system.  This combination also gives you the full power of regular expressions for arbitrary wildcard matching.

> sudo find . -print | grep -i '*log'

Some log files are stored in protected folders or have strict folder permissions.  In these cases, you will need to run your commands as root with sudo.  Also, once you have found your log files it may be a good idea to add a symbolic link (shortcut) to them from your home folder for easy access.

> ln -s /var/log ./log

This command will create a symbolic link in your current folder called log, which is a shortcut (symbolic link) to /var/log.


Go to a backup

One of the largest causes of system issues is user error or a failed system update.  In either case, your system may be in a state that just cannot be fixed.  Don’t Panic.  If you have solid backup policies, then the chances of recovering from a mistake are manageable.  If you don’t, then you may have to panic.  Without a backup, you may be in a position where you will have to rebuild your instance or reinstall some software.  So it is always a good idea to backup your system or applications before making changes to them.

For example, if you want to backup a MySQL database, then you need to run the following command:

> mysqldump -u user database_name > dbbackup.sql

Depending on your database permissions you may need to include a password with the -p attribute.

Backup files can become large so make sure you compress them with gzip and tar:

> tar czvf dbbackup.tgz dbbackup.sql

In most cases, you are not going to want to have to do this manually.  So consider adding all your backup commands into a single script file.  Which you can then schedule the backup to run at designated times.


Further Reading

These are just a few of the common Ec2 problems you will encounter and some of the most common ways to address them.  However, if you find yourself in a situation that doesn’t fit these then consider checking out the following resources.  These are an excellent source for additional help on troubleshooting your server.

 

Leave a Reply