Skip to content

Datasets

A dataset is a collection of data store objects such as tables or schemas from one or more data stores, that you wish to govern access to as a single unit.

Screenshot

For example, a set of tables in a Snowflake account which contain private customer information such as name, address and purchase history can be represented in Satori as a Customer Data dataset".

Data engineers create datasets as part of the data development lifecycle. Once a dataset is defined you can then assign a data stewards to manage the day to day operations of access to data.

When data consumers query data, Satori associates the query with the relevant datasets and applies the access rule permissions and policies that are defined on them.

Creating and Managing Datasets

To create a dataset, you require the Admin or the Editor role which is defined in the management console.

Dataset Stewards

To help you manage and maintain your dataset you can also assign dataset stewards to the dataset for performing the day to day operations of access to data.

The Dataset Steward can create, approve or deny user access rules, create security policies and masking profiles and assign them to the dataset. In addition, the dataset steward can edit the catagories in the data inventory tab of the dataset.

NOTE:The data steward cannot give Satori control over access to the dataset or change the default security policy.

Dataset Access Approvers

In addition to the dataset steward you can also assign dataset access approvers tasked with approving or denying access requests to the dataset. The access approvers do not have access to view or edit the dataset in the management console.

Adding a Dataset

To create a Dataset perform the following tasks:

  1. Go to the Datasets view and click the Add button.

  2. Provide a dataset name and description for the dataset, optionally assign dataset stewards and dataset access approvers.

  3. Select datastore locations to include in the dataset and optionally, define the locations to exclude.

Screenshot

Checking Data Store Locations

Satori uses the longest match approach when checking if a data store location is included in the dataset. See the following dataset examples:

Included Locations

Finance Snowflake Account / Forecast database / Q2 schema

Excluded Locations

Finance Snowflake Account / Forecast database / Q2 schema / Orders

When querying any table other than the "Orders" table in the Q2 schema, Satori associates the query with this dataset and applies any permissions or policies that are defined on it.

Dataset User Access Rules

Permissions to access datasets are defined for individual users or groups and are limited to a predefined time range. In addition, Satori can automatically revoke permissions if they are unused. This helps organizations avoid excess and unused permissions.

Screenshot

Satori provides three main capabilities for controlling dataset access. These access controls can be used in parallel to streamline the process of managing access to data.

Dataset Permissions

Dataset access rule permissions enable data engineers and dataset stewards to grant access to datasets without requiring users to ask for access. Satori recommends that you use this method for providing access if you know which users or groups require access to a dataset and your organization's policy does not require an approval process.

NOTE: When users query data, Satori searches for the required permissions, if available Satori sends the query to the datastore.

Data Inventory

Satori provides you with a rich out-of-the-box taxonomy. The dataset data inventory provides a holistic view of the sensitive data and access patterns. In addition to the provided taxonomy, you can also add to it by creating customer classifier categories and custom classifiers.

Screenshot

User Access History

Every change to permissions or access request is audited by Satori.

Screenshot

User Access Requests

Enable access requests to allow users that do not have the required permissions to request access. When users query data they receive an access request notification via their Data Portal.

Screenshot

User Access Requests via Slack

User access requests can also be made via Slack. Users with access to the Satori Slack App can make data access requests by using the Slack command /satori access.

User Access Requests

User Access Requests are sent via email to the dataset's dataset stewards and appear in the management console.

Screenshot

Self-Service Access

Enable self-service access to allow users without the right to grant themselves predefined permissions. When users query data they receive a URL link enabling them to audit their access by specifying why they need to access the dataset. Once they submit the form they are granted with the relevant permissions that were defined on the dataset.

This method is the recommended alternative to the standard dataset permissions because it audits users access to datasets.

Screenshot

Managing Technical Metadata

Using the Data Inventory view of a dataset - data engineers or dataset stewards can review the results of the automatic data classification and override, remove or add any necessary tags. See the Data Inventory section for more details.

Implementing Custom Policies

Using the Custom Policy view of a dataset, enables data engineers or dataset stewards to implement custom data access policies using the Policy Engine.

Screenshot

See the Policy Engine Overview section for more details.