Datasets are collections of data store locations that are meant to be governed as a single unit. Data engineers create datasets as part of the data development lifecycle, and can assign data stewards to manage the day to day operations of access to data.
When data consumers query data, Satori associates the query with the relevant datasets and applies the permissions and policies that are defined on them.
To create a dataset, you need either the Admin or Editor role in the management console. Go to the Datasets view and click the Add button. Provide a name and description for the dataset, and optionally assign data stewards.
Select datastore locations to include in the dataset, and optionally, locations to exclude. Satori uses the longest match approach when checking if a data store location is included in the dataset. For example, consider the following dataset:
- Finance Snowflake Account / Forecast database / Q2 schema
- Finance Snowflake Account / Forecast database / Q2 schema / Orders
When querying any table other than the Orders table in the Q2 schema, Satori will associate the query with this dataset and apply any permissions or policies that are defined on it.
Managing technical metadata
Using the Inventory view of a dataset, data engineers or data stewards can review the results of the automatic data classification and override, remove or add any necessary tags. See the Data Inventory section for more details.
Implementing custom policies
Using the Custom Policy view of a dataset, data engineers or data stewards can implement custom data access policies using the Policy Engine. See the Policy Engine Overview section for more details.