Amazon S3 is commonly used to provide persistent storage for user-uploaded content in web and mobile applications, given its outstanding reliability, availability, and scalability. However, unrestricted file uploads present significant risks, in particular:
To mitigate these risks, consider the following approach.
Traditional server-based applications would receive the user-provided content and handle its transfer to a persistent data store, potentially processing it along the way. Instead, allow the application frontend, usually a web or mobile application, to upload directly to Amazon S3. This enables a fast and reliable transfer, without straining your backend application’s resources like network and CPU.
We can allow this direct upload by generating a presigned URL that encodes the parameters for a successful request into the URL, including authorization for a specific action. This authorization is temporary, with a configurable expiration time, but can be used from anywhere while still valid. There are two methods of using presigned URLs, HTTP PUT and POST, that appear similar but have a key difference — only POST can enforce restrictions on the uploaded content, like the file size. For that reason, we’re going to use POST URLs.
You can generate a POST URL by calculating the signature yourself, but I don’t recommend it – leverage an AWS SDK instead. The response to generate_presigned_post will contain everything needed, notably an AccessKeyId, signature, and base64-encoded policy. To perform the upload, submit a standard HTTP POST request (you don’t need an AWS SDK) and include the fields returned with the authorization response as form fields. A sample response looks like this:
The frontend should still limit the selected content to your desired restrictions, but you can’t trust the frontend to enforce them — the presigned URL can be used anywhere, even multiple times while still valid. A POST policy ensures that requests violating the restrictions will be rejected.
Consider including the following elements in the POST policy.
You should also consider limiting how much any user can upload. By maintaining a database table of requests, you can check for those limits before authorizing uploads. For example:
S3 has no ability to execute the provided files, so there is no immediate risk of a user uploading malware. However, anything that handles that file downstream could be at risk. Additionally, if your application makes that file retrievable once uploaded, then your S3 bucket could be used to distribute malware to other targets.
There are various open-source and vendor-provided solutions for scanning malware in S3. The major differences are their reporting and response capabilities. Whichever you choose, you should ensure that detected malware is immediately quarantined and can not be processed by downstream systems.
Data processing tasks should be auto-scalable and run independent of other backend functions so they can meet variable demands without degrading the performance of other components. Amazon S3 Event Notifications provide a straightforward way of triggering this processing upon completion of an upload. In general, I recommend sending the event notification to an Amazon SQS queue first, then configuring the processing task (e.g., Fargate task, Lambda function) to pull from the SQS queue. This allows you to handle failures and retries in the queue.
Another major benefit of this pattern is that each uploaded file is processed in an environment isolated from everything else. In the event that a malformed file causes unexpected behavior, it won’t affect other data or processes.
Allowing users to upload files introduces several risks, but we can mitigate them effectively with careful design of cloud architecture. Although these concepts aren’t very new, I still commonly see applications without one or more of them, leaving potential vulnerabilities. Since each application is unique, I build a threat model of the application and consider how these design patterns may mitigate its risks.