Skip to content

Datasets

Datasets are collections of data store locations that are meant to be governed as a single unit. Data engineers create datasets as part of the data development lifecycle, and can assign data stewards to manage the day to day operations of access to data.

When data consumers query data, Satori associates the query with the relevant datasets and applies the permissions and policies that are defined on them.

Creating datasets

To create a dataset, you need either the Admin or Editor role in the management console. Go to the Datasets view and click the Add button. Provide a name and description for the dataset, and optionally assign data stewards.

Select datastore locations to include in the dataset, and optionally, locations to exclude. Satori uses the longest match approach when checking if a data store location is included in the dataset. For example, consider the following dataset:

Included locations:

  • Finance Snowflake Account / Forecast database / Q2 schema

Excluded locations:

  • Finance Snowflake Account / Forecast database / Q2 schema / Orders

When querying any table other than the Orders table in the Q2 schema, Satori will associate the query with this dataset and apply any permissions or policies that are defined on it.

Managing technical metadata

Using the Inventory view of a dataset, data engineers or data stewards can review the results of the automatic data classification and override, remove or add any necessary tags. See the Data Inventory section for more details.

Implementing custom policies

Using the Custom Policy view of a dataset, data engineers or data stewards can implement custom data access policies using the Policy Engine. See the Policy Engine Overview section for more details.