One of the architectural challenges that are often overlooked is large file storage. These may be documents, images, or other general binary structures. This situation is also becoming a much more common challenge for applications. We often have a feature for storing a profile picture for users, PDF documents, or other similar features. This area is something worth spending time on during the creation of the software architecture.
Storage and Retrieval
Any time you store data, you have to include the mechanism to retrieve it. In the world of data stores, that often means a way to index metadata or maybe the content. When we look at large file storage, we have several options. There is the old solution of file system storage with a path, or we can look at blobs, cloud storage, and other solutions.
Each of these options has pros and cons to consider as we architect our solution. We also need to think about how to retrieve and restore that data to its original form. This step may be a simple task, or we might have encryption, compression, and file format steps to deal with. For example, we can store a resume as a file. However, we may accept Acrobat, Word, and other formats. We are usually not going to be able to maintain the original name when we store it to a file system. However, does that include an extension as well?
Moving Data
One of the essential factors in how you architect a solution to this problem is how often you will need to move around data. If you will only have one server and can support the needed space, then you might find a file system easier. When you move to multiple servers, then a cloud or SAN will make more sense. When you need to keep the data within the database for easily moving around the database in a single step, then you will need to use blobs.
In my experience, the “store it in the database” approach often is the first thought. However, this is rarely the best approach. The exception usually involves security requirements, although even those can often be addressed while storing data via a file system.
Cleaning Up Your Mess
When you decide to use a file system as storage either for temporary files or the long term, you have to consider clean-up. The problem with using a file system for storage is that it does not clean up or re-use space automatically. You will need to create processes that determine what can be removed/re-used and how that clean up is done. This obstacle is not very different from situations where log files can become large or numerous. However, it is a problem you should keep in mind while architecting your solution.