Amazon Web Service (AWS) S3 is all based on the object naming mechanism. Any object in a S3 bucket has its name as an identity. Even though S3 console give us a view of the filing system with depth folders, it actually store the data all at the root. And it provides a way to allow us leveraging a concept of prefix to structure our files. This is the basic storage mechanism of S3.
When talking about listing (scanning) the objects in a pretty large S3 bucket, AWS forth you to get the results with pagination. There is a 1000 keys upper limit for the S3 list operation. Regarding the virtually unlimited number of keys support, obviously the operation is not atomic. So it is important to know how it works. So based on the AWS documents, there is a concept called marker
which indicates the position of the pagination scrolling. I envision that S3 will order the files naturally. Then a marker could divide the whole files to two parts: one has been scanned and the other hasn't. And when you provide a marker, S3 could know which files should be put into the next page results. So there are some cases that S3 would return an empty page if at the point no file exists in unscanned part.
So assuming we are in the middle of scrolling, and we have a marker, the files in both parts could be changes in three ways:
Put
into it.Deleted
from it.Updated
from it.And what will happen to the rest list results is illustrated as following:
Put | Deleted | Updated | |
---|---|---|---|
Scanned | Nothing | Nothing | Nothing |
Unscanned | With the new file | Without the new file | With the updated file |
One thing to keep in mind here is that it is all about the file names because it will determine which part should your file be divided in.