Years ago, when on-premise Unix servers with large file systems were a thing, companies were building extensive folder management rules and strategies for administering access rights to different folders for different people.
Usually, an organization’s platform serves different groups of users with completely distinct interests, confidentiality level restrictions, or content definitions. In the case of global organizations, this could even mean separating content based on location, so basically, between the users belonging to different countries.
Further typical examples might include:
- data separation between development, test, and production environments
- sales content not accessible to a wide audience
- country-specific legislative content that can’t be seen or accessed from within another region
- project-related content where “leadership data” is to be provided only to a limited group of people etc.
There is a potentially endless list of such examples. The point is there is always some kind of need to orchestrate access rights to files and data between all the users to which the platform provides access.
In the case of on-premise solutions, this was a routine task. The administrator of the file system just set up some rules, used a tool of choice, and then people were mapped into user groups, and user groups were mapped into a list of folders or mount points they shall be able to access. Along the way, the level of access was defined as read-only or read & write access.
Now looking into AWS cloud platforms, it is obvious to expect people to have similar requirements for content access restrictions. The solution to this problem must be, however now, different. Files are not anymore resisting on Unix servers but in the cloud (and potentially accessible not only to the whole organization but even the whole world), and the content is not stored in folders but in S3 buckets.
Below described is an alternative to approach this problem. It is built on the real-world experience I had while I was designing such solutions for a concrete project.
Simple But Vastly Manual Approach
One way how to resolve this problem without any automation is relatively straightforward and simple:
- Create a new bucket for each distinct group of people.
- Assign access rights to the bucket so that only this specific group can access the S3 bucket.
This is certainly possible if the requirement is to go with a very simple and quick resolution. There are, however, some limits to be aware of.
By default, only up to 100 S3 buckets can be created under one AWS account. This limit can be extended to 1000 by submitting a service limit increase to the AWS ticket. If those limits are not something your particular implementation case would be worried about, then you can let each of your distinct domain users operate on a separate S3 bucket and call it a day.
The problems might arise if there are some groups of people with cross-functional responsibilities or simply some people that need access to the content of more domains a the same time. For example:
- Data analysts evaluating the data content for several different areas, regions, etc.
- The testing team shared services serving different development teams.
- Reporting users requiring to build up dashboard analysis on top of different countries inside the same region.
As you might imagine, this list can again grow as much as you can imagine, and organizations’ needs can generate all kinds of use cases.
The more complex this list gets, the more complex access rights orchestration will be needed to grant all those different groups different access rights to different S3 buckets in the organization. There will be required additional tools, and maybe even a dedicated resource (administrator) will need to maintain the access rights lists and update them whenever any change is requested (which will be very often, especially if the organization is large).
So then, how to achieve the same thing in a more organized and automated way?
Introduce Tags For Buckets
If the bucket-per-domain approach does not work, any other solution will end up with shared buckets for more user groups. In such cases, there is necessary to build the whole logic of assigning access rights in some area that is easy to change or update dynamically.
One of the ways how to achieve that is by using Tags on the S3 buckets. The tags are recommended to be used in any case (if for nothing other than to enable easier billing categorization). However, the tag can be changed anytime in the future for just any bucket.
If the whole logic is built based on the bucket tags and the rest behind is configuration dependent on the tag values, the dynamic property is assured as one can redefine the purpose of the bucket just by updating the tag values.
What kind of tags to use to make this work?
This depends on your concrete use case. For example:
- It can be needed to separate buckets per environment type. So, in that case, one of the tag names shall be something like “ENV” and with possible values “DEV”, “TEST”, “PROD”, etc.
- Maybe you want to separate the team based on the country. In that case, another tag will be “COUNTRY” and value some country name.
- Or you might want to separate the users based on the functional department they belong to, like business analysts, data warehouse users, data scientists, etc. So you create a tag with the name “USER_TYPE” and the respective value.
- Another option could be that you want to explicitly define a fixed folder structure for specific user groups that they are required to use (to not create their own clutter of folders and get lost there over time). You can do that again with tags, where you can specify several working directories like: “data/import”, “data/processed”, “data/error”, etc.
Ideally, you want to define the tags so that they can be logically combined and make them form a whole folder structure on the bucket.
For example, you can combine the following tags from the examples above to construct a dedicated folder structure for different types of users from various countries with predefined import folders they are expected to use:
Just by changing the <ENV> value, you can redefine the purpose of the tag (whether to be assigned to test environment ecosystem, dev, prod, etc.)
This will enable the use of the same bucket for many different users. Buckets do not support folders explicitly, but they do support “labels”. Those labels work like subfolders in the end because the users need to go through a series of labels to reach their data (just like they would do with subfolders).
Create Dynamic Policies And Map Bucket Tags Inside
Having defined the tags in some usable form, the next step is to build S3 bucket policies that would use the tags.
If the policies are using the tag names, you are creating something called “dynamic policies”. This basically means your policy will behave differently for buckets with different tag values that the policy is referring to in form or placeholders.
This step obviously involves some custom coding of the dynamic policies, but you can simplify this step using the Amazon AWS policy editor tool, which will guide you through the process.
In the policy itself, you will want to code concrete access rights that shall be applied to the bucket and the access level of such rights (read, write). The logic will read the tags on the buckets and will build up the folder structure on the bucket (creating labels based on the tags). Based on the concrete values of the tags the subfolders will be created, and required access rights will be assigned along the line.
The nice thing about such a dynamic policy is that you can create just one dynamic policy and then assign the very same dynamic policy to many buckets. This policy will behave differently for buckets with different tag values, but it will always be along with your expectation for a bucket with such tag values.
It is a really effective way to manage access rights assignments in an organized, centralized way for a large number of buckets, where it is the expectation that every bucket will follow some template structures that are agreed upon upfront and will be used by your users within the whole organization.
Automate The Onboarding of New Entities
After defining dynamic policies and assigning them to the existing buckets, the users can start using the same buckets without the risk that users from different groups will not access content (stored on the same bucket) located under a folder structure where they don’t have access.
Also, for some user groups with wider access, it will be easy to reach out for the data because it will be all stored on the same bucket.
The final step is to make onboarding new users, new buckets, and even new tags as simple as possible. This lead to another custom coding, which, however, doesn’t need to be overly complex, assuming your onboarding process has some very clear rules that can be encapsulated with simple, straightforward algorithm logic (at least you can prove in this way that your process has some logic and it’s not done in the overly chaotic way).
This can be as simple as creating a script executable by AWS CLI command with parameters needed to successfully onboard a new entity into the platform. It can even be a series of CLI scripts, executable in some specific order, like, for example:
- create_new_bucket(<ENV>,<ENV_VALUE>,<COUNTRY>,<COUNTRY_VALUE>, ..)
You get the point. 😃
A Pro Tip 👨💻
There is one Pro Tip if you like, which can be easily applied on top of the above.
The dynamic policies can be leveraged not only for assigning access rights for folder locations but also to assign service rights for the buckets and user groups automatically!
All that would be needed is to extend the list of tags on the buckets and then add dynamic policy access rights to use specific services for concrete groups of users.
For example, there might be some group of users that also need access to the specific database cluster server. This can undoubtedly be achieved by dynamic policies leveraging bucket tasks, more so if the accesses to the services are driven by a role-based approach. Just add to the dynamic policy code a part that will process tags regarding the database cluster specification and assign the policy access privileges to that particular DB cluster and user group directly.
This way, the onboarding of a new user group will be executable just by this single dynamic policy. Moreover, since it is dynamic, the same policy can be reused for onboarding many different user groups (expected to follow the same template but not necessarily the same services).
You may also have a look at these AWS S3 Commands to manage buckets and data.