Requirements
Imagine a somewhat generic file storage mechanism built on top of S3. The storage should allow for the following:
- Storing an immutable set of files
- Listing all files contained in a stored set
- Retrieving individual file contents from a stored set after listing them
The file sets are expected to consist of a larg number of files that are predominantely small or moderately sized.
S3 Pricing
For object storage we can generally break down the cost drivers into the following items. Reality is more complex, but it’s a reasonable enough model. The numbers given are for S3 Standard in us-east-1, but they should be similar for other regions and cloud providers. Some items get cheaper with higher usage - for those, the highest tier is assumed.
- PUT, COPY, POST, LIST requests: $0.005 per 1k requests
- GET, SELECT, and other requests: $0.0004 per 1k requests
- Storage: $0.021 per GB stored per month
- Ingress: Free from the same region, free to the public internet
- Egress: Free within the same region, $0.05 to the public internet
Note that the cost difference between (1) and (2) is approximately one order of magnitude i.e. it costs ~10x as much to write an entry than what it costs to retrieve it.
Schemas
Entry per file
The simplest possible schema (e.g. done by s3 sync) would be:
- Utilize a unique prefix to scope the files
- Create a separate entry for each file with the file name as the key
- List files by listing all entries under the prefix
- Retrieve individual file contents by reading the key
There is nothing fundamentally wrong with this and, for many situations, this will be a decent or even optimal solution. However, it does come with some drawbacks:
- Number of write requests: Uploading a large set of files one by one requires us to issue a single request to S3 for each file. This poses some challenges in terms of client-side request overhead. Furthermore, as the average request incurs approximately 30ms in latency overhead, achieving near-optimal throughput will require a significant level of concurrency.
- Cost of writes: As long as the number of reads is similar (or slightly higher) than the number of writes, the cost of issuing a PUT request per file dominates the cost of reads. If the number of reads is lower than the number of writes (i.e. we don’t read every file we store) the impact shifts further.
As an example: If we were to store 1B files and only read every second file, the cost would be
1B * 0.005$/1k = $5000
for writing and1B * 0.5 * $0.0004/1k = $200
for reading. - Non-atomic deletion: Deleting a set of files (e.g. via a lifecycle policy) doesn’t happen atomically. This isn’t an insurmountable problem, but it does expose some possible concurrency issues around deletion that need to be handled carefully.
Packed tarball
This solution makes use of the immutability of the uploaded set of files.
Instead of uploading file-by-file, we can:
- Utilize a unique prefix to scope our upload
- Create the following entries in S3:
- A tarball containing all files
- An index of all files including the length and offset of their respective tarball entry
- List files by reading the index
- Retrieve individual file contents by issuing a range request with the respective offset and length.

Tarball of stored files and associated index
Compared to the “entry per file” solution, the “read individual file contents” operation now requires knowledge of a files offset and length. This can be achieved in two ways:
- If all read operations are preceded by a list operation, we can have the list operation return an opaque key referencing the file. In this key (e.g. a signed JWT), we can encode the offset and length of the tarball entry.
- If read operations can occur without a preceding list operation, we need to issue at least one additional lookup to resolve this information. The simplest possible approach is to fetch the entire index, but there are possibilities for optimization. See cotar for a possible approach.
Assuming that the individual files are significantly smaller than the maximum payload size of a PUT
request, this allows us to significantly reduce the number of write requests.
This directly reduces the associated cost and, as there are no fewer but larger requests, requires a lower amount of concurrent requests to achieve the same throughput.
Correctly deleting the set is now significantly simpler, as all files are packed into a single key.
On the other hand, reading individual files (particularly directly by file name) has now become more complex and requires additional work. Furthermore, we have removed the possibility of easily modifying a stored set of files by e.g. adding another file.
References and Related Work
- S3 pricing
- S3 GetObject Range requests
- cotar implements a similar approach