In this post, we’re going to look at how we can optimize S3 performance.
First, we’re going to examine the S3 prefixes, what they are. We’ll then look at how we can employ prefixes to optimize S3 performance. Also, We’ll then examine restrictions with KMS, which means when we’re using Amazon’s encryption service.
Second, we’ll then analyze S3 performance when are talking about uploads, also in terms of our downloads.
Optimize S3 Performance with Prefixes
When we create a new S3 bucket, we define our bucket name, and later regarding the Path or URL, we have folders, so we can have a FolderA and also a directory SubB and then for last we have our object name, so it could be documentA.pdf.
Basically, the S3 prefix is just the folders inside our buckets.
So, on the example above, the S3 prefix will be /FolderA/SubB.
S3 Prefix: /FolderB
On the example above we could have /FolderB and have nothing inside it, no other sub-folder.
How can S3 prefixes give us a better performance?
The AWS S3 has remarkably low latency and you can reach the first byte out of S3 within approximately 200 ms and you can even achieve an increased number of requests. For example, 3,500 requests for COPY, PUT, DELETE, POST and 5,500 GET requests or head requests per second per prefix.
So, the more prefixes that you have inside your S3 buckets, the more increased performance you’re going to be capable to get, and the essential number are to look at the 5,500 GET requests. For example, if we are trying to access an object in one specific S3 bucket, performing a GET request, it gets 5,500 requests per second per prefix. So, it means that if we wanted to get more satisfactory performance out of S3 in terms of GET, what we would do is we’d spread our reads across various prefixes or across various folders.
Let’s see one example, if you’re utilizing 3 prefixes, you’d be capable to reach 16,500 requests per second. And in our last sample, we looked at 4 different types of prefixes.
Or, we were to use 4 different prefixes, then we’re going to get 5,500 requests times 4, which would provide us 22,000 requests per second.
So the fundamental idea to learn with this, is the more folders and subfolders you have in your S3 bucket, the more satisfactory performance you can get from the S3 bucket for your project.
Limitations with S3
If we use the Key Management Service (KMS), which is Amazon’s encryption service, for example, if you enabled the SSE-KMS to encrypt and decrypt your objects on S3, you have to keep in mind that there are actually built-in limitations within KMS. For example, if we upload a file, behind the scene we are also calling the KMS function called “generate data key” in the KMS API, also the same situation happens when we download a file, we call the decrypt function from the KMS API. And the essential information, the built-in limits are region-specific. However, it’s going to be around 5,500, 10,000, or 30,000 requests per second, and uploading and downloading are going to depend on your KMS quota. And nowadays, we can’t even ask for a quota expansion for KMS.
If you need performance and encryption at the same time, you might want to think of just utilizing the native S3 encryption that’s built-in rather than utilizing KMS.
If you are trying to troubleshoot a KMS issue, it could just be that you’re reaching the KMS limit and that could be what’s pushing your downloads or your requests to be considerably slower.
You would like to learn the difference between AWS KMS and CloudHSM?
Optimize S3 Performance on Upload
To improve the S3 upload process we’re going to examine multipart uploads and this is suggested for files that are over one hundred megabytes and it’s really needed for any files over five gigabytes in size. So, what multipart uploads do is essentially permit you to parallelize your uploads. Also, this essentially permits you to improve your efficiency.
For example, for a big file, you’re cutting it into pieces, and then you’re uploading those pieces all at the same time, you’re doing parallel uploads, and this improves your efficiency.
So, how you can get a better performance doing uploads? Simple, use multipart uploads. Also, we have a very comparable scenario for downloads.
Let’s see another advantage: if there’s a failure in the download process, it only affects one specific byte range. So that’s all it is, it’s parallel downloads, and it’s just named S3 byte-range fetches. When we do uploads, it’s named S3 multipart uploads. So that’s all S3 byte-range fetches are, we can use them to accelerate your downloads and also can be used to download partial amounts of the file.
So, we have learned about Optimize S3 Performance using Prefix, it’s just the folders and subfolders within your S3 bucket. The more prefixes that you utilize with your project using S3, the more satisfactory performance you’re going to get. So you can consistently reach a high number of requests. By spreading your data and your reads across different prefixes.
Also, we have learned if you are using SSE-KMS to encrypt your objects in S3, you must keep in mind that there are built-in limits. So uploading and downloading data will count towards the KMS quota. Also, it is region-specific. And nowadays, you can’t request a quota expansion for KMS. And just remember when you’re uploading objects to utilize multipart uploads to improve your performance when uploading your objects to S3.
So that is it for this article. If you have any questions, please let me know. Thank you.