Database Groups
Database groups or roles provide a way to associate a group of users with a set of permissions. By organizing users into a group you can grant permissions once rather than granting them to each individual user.
It is considered best practice to grant permissions to users via groups rather than granting permissions directly to users. This practice reduces the overhead associated with managing permissions and makes it easier to monitor who has access to what.
Organizing into Groups
Given the best practice outlined above, we now have the challenge of organizing both users and permissions into groups. In order to do this, you will need a general understanding of not only how your data is organized in your database (see Schema Organization), but also what requirements you have for managing data access. For example, certain data may be sensitive and require restricted access by team or department.
Depending on your specific data access requirements and organizational structure, you may find one of the below approaches more useful. Regardless, you want to give your database groups descriptive names that make it easy to understand what access they provide and who should be a member.
By Team or Role
Oftentimes members of the same team or users with the same role should have similar access to the same data. For example, you may only want members of the HR team to have access to sensitive employee data or you may not want data analysts or reporting users to have write access to any data (see Schema Organization).
Example Groups:
- data_eng
- Users - data engineering team; responsible for loading and preparing raw data
- Permissions - read and write permissions to most data
- Exceptions
- Sensitive data
- Reporting data (if another teams manages reporting)
- data_analyst
- Users - data analyst team(s); responsible for reporting and data analysis
- Permissions - read permissions to most data and perhaps write permissions to reporting data
- Exceptions
- Sensitive data
- Sometimes the same users occupy the role of a data engineer and data analyst
- reporting
- Users - any user or team that only consumes reports or analysis for making decisions
- Permissions - read permissions to reporting data only
- Exceptions
- Sensitive data
- Sometimes access is restricted by department (see By Data Category below)
By Data Category
Many times data access requirements are more consistent by data category (hr data, financial data, etc.). For example, if you don’t work in finance or accounting you probably don’t need access to granular data on company finances, accounts receivable, etc. You can always make aggregate data or analysis available more broadly via an aggregate view of the data in a reporting schema (see Schema Organization).
In this category it's often helpful to differentiate between read and write permissions as well (see Restrict Write Access to Production Data for more information).
Example Groups:
- financials_read
- Users - anyone that should have read access to financial or accounting data
- Permissions - read permissions only
- financials_write
- Users - anyone that maintains detailed financial or accounting data
- Permissions - write permissions only
- sales_read / sales_write - same as above
- customers_read / customers_write - customers, donors, subscribers, etc.; same as above
- etc.
Hybrid Approach
Many times the same data engineering team may manage most of your data or the same data analyst team may provide reporting and/or data analysis for most of your data. In these cases, it may be helpful to take a hybrid approach where you use database groups organized by team or role for some users and permissions, and database groups organized by data category for other users and permissions.
Restrict Write Access to Production Data
Given the importance and sensitivity of protecting production data from being corrupted and impacting your reporting, analysis and/or business operations, write access to production data should be restricted. This also includes restricting who can create new production tables and views.
It is considered best practice to reserve a database user account not associated with any end user for running your production data pipelines and for writing to and managing your production data. You then restrict access to this generic database user account and ensure that changes to production data pipelines and data are code reviewed prior to being promoted.
The benefits of this model are as follows:
- Significantly less risk of unintended changes to production data by limiting the number of users with write access
- Easier to manage database permissions since less users need write access
- All database objects are owned by the generic database user account limiting the risk of permission related failures in your production data pipelines (certain permissions cannot be granted and are only available to the database user that owns the database object, such as drop schema, drop table, truncate table, etc.)
- Production database objects are only created by the generic database user so there are less issues to manage with future database objects (see Future Database Objects below)
In Civis Platform, the above model is supported via Civis Robot Users. Please reach out to CS for more information.
Separating Read and Write Permissions
If you do not restrict write access to your data, then you may want to consider separating read and write permissions into different database groups. This makes it easier to monitor who has access to see data versus who has access to change it. This is especially useful when you have different groups of users that have read access to the same data, but only some of the groups have write access.
Depending on your strategy for organizing your database groups this may or may not make sense.
Superusers
Many databases provide a “superuser” privilege that can be granted on a user by user basis. In Redshift, the superuser privilege gives a user access to all permissions and database objects regardless of the permissions granted to them directly or inherited via group membership. In Postgres, the superuser privilege does not provide additional access to database objects.
While it may be easy to grant the superuser privilege to users rather than investing in a permission provisioning strategy detailed above, it undermines the data security on your database and puts your production data and data pipelines at risk of failing or reporting incorrect data.
For this reason, it is considered best practice to reserve the superuser privilege only for those admin users that administrate your database.
Future Database Objects
One of the big pain points that exist with managing database permissions in databases like Redshift and Postgres is that none of the above setup applies to new database objects, such as tables or views, that you create in the future.
This issue can be mitigated and almost eliminated by restricting who has access to create new database objects (see Restrict Write Access to Production Data above).
Luckily, both Redshift and Postgres provide a solution for this problem, alter default privileges. Default privileges allows you to tell the database what permissions to grant when a database object is created.
Therefore, for each user that can create database objects in a database schema that other users should have access to, you will need to alter their database privileges based on your database permission strategy discussed above.
Comments
0 comments
Please sign in to leave a comment.