In Redshift/Postgres, you can think of database schemas as folders that you use to organize database objects, such as tables and views. Spending some time up front determining how to organize your database objects into schemas will pay dividends later as the number of database objects and users in your database grows.

There are a few different approaches that are popular for organizing database schemas. Sometimes people use a single approach or they may combine several approaches together into a hybrid strategy.

Before we review these, it is important to note that database permissions can be assigned at a schema level as well as an object level. Organizing database objects into schemas such that you can assign permissions to your users at the schema level will make setting up and managing your database permissions much easier. Therefore, regardless of the strategy you choose, you should keep this in mind.

By Environment: Production vs Development

It is important to keep your production data separate from ongoing development or analysis of new data sets. This minimizes the risk of someone accidentally making changes to a production table that you may depend on for daily reporting or some other production process in your organization.

In order to do this, you should create one or more schemas to be used solely for development and one-time analysis. Any one-off tables should be created in these schemas.

Provisioning Options:

1 development schema for everyone
1 development schema for each team
1 development schema for each developer

Database Permissions: This approach allows you to limit the permissions that users have to production database schemas so that they don’t even have the access required to accidentally make changes to production tables.

By ETL Phase

Data is often loaded, transformed and processed in various stages until it is finally aggregated and consumed by reports and/or dashboards. Since each data set goes through similar processing, oftentimes with various teams or team members owning a stage, organizing your tables by ETL stage can be a useful approach.

Common Phases:

Landing/Raw - used for loading raw data, typically as a generic string data type, prior to detecting data types or splitting raw data into multiple columns
Staging - used for staging data, typically with the appropriate data types, prior to applying any transformation rules
Final Raw - used for finalized raw data after applying any transformation rules
Reporting - used for aggregate rollups of the finalized raw data, typically for reports and/or dashboards

Naming Conventions: It is common to apply specific naming conventions to each stage so that you can easily distinguish related tables across different stages/schemas. For example, imagine you are loading some donor data.

Schema name: "landing" / "raw"

Table suffix: "_raw"
Table name: "donor_raw"

Schema name: “staging”

Table suffix: "_stg"
Table name: "donor_stg"

Schema name: "final"

Table suffix: none
Table name: "donor"

Schema name: “reporting”

Table suffix: "_rpt" / "_agg" / "_monthly"
Table name: "donor_rpt"

Database Permissions: Since specific teams or team members often focus on specific stages, this makes it easier to grant permissions while limiting the risk of each team accidentally impacting data that they do not maintain. For example,

data engineering team - responsible for loading and preparing data

landing, staging, final schemas - read and write permission

data analyst team - responsible for reporting

final schema - read permission
reporting schema - read and write permission

reporting users - consume reports to make decisions

reporting schema - read permission

By Functional Area

The data loaded into a database often falls into different functional areas or categories (e.g., HR, sales, CRM, etc.). The data in these categories tends to come from similar systems and is maintained and understood by similar teams or team members. This makes it another great way to organize your data into schemas. Since these schema names are more descriptive, it also helps improve data discoverability.

Hybrid Example

This uses a hybrid approach of all three of the above.

staging

If there are enough tables then you could break this out into hr_staging, sales_staging, etc.

hr
sales
donors
accounting
reporting

Similar to staging above you could break this out further depending on the amount of reporting

<username> - development schemas

Creating one per user, if possible, makes them easier to clean up as users transition between different teams since you don’t need to try and figure out who created which table.

Articles in this section

Schema Organization

By Environment: Production vs Development

By ETL Phase

By Functional Area

Hybrid Example

Comments

Articles in this section

By Environment: Production vs Development

By ETL Phase

By Functional Area

Hybrid Example

Related articles