Skip to content

Datasets

A dataset is a collection of data assets such as tables or schemas from one or more data stores, that you wish to govern access to as a single unit.

Screenshot

For example, a set of tables in a Snowflake account which contain private customer information such as name, address and purchase history can be represented in Satori as a Customer Data dataset".

Data engineers create datasets as part of the data development lifecycle. Once a dataset is defined you can then assign a data stewards to manage the day to day operations of access to data.

When data consumers query data, Satori associates the query with the relevant datasets and applies the access rule permissions and policies that are defined on them.

Creating and Managing Datasets

To create a dataset, you require the Admin or the Editor role which is defined in the management console.

Dataset Stewards

To help you manage and maintain your dataset you can also assign dataset stewards to the dataset for performing the day to day operations of access to data.

The Dataset Steward can create, approve or deny user access rules, create security policies and masking profiles and assign them to the dataset. In addition, the dataset steward can edit the catagories in the data inventory tab of the dataset.

NOTE:The data steward cannot give Satori control over access to the dataset or change the default security policy.

Dataset Access Approvers

In addition to the dataset steward you can also assign dataset access approvers tasked with approving or denying access requests to the dataset. The access approvers do not have access to view or edit the dataset in the management console.

Adding a Dataset

To add a new dataset to Satori perform the following tasks:

  1. Go to the Datasets view and click the Add button.

  2. Provide a dataset name and description for the dataset, optionally assign dataset stewards and dataset access approvers.

  3. Select datastore locations to include in the dataset and optionally, define the locations to exclude.

Screenshot

Checking Data Store Locations

Satori uses the longest match approach when checking if a data store location is included in the dataset. See the following dataset examples:

Included Locations

Finance Snowflake Account / Forecast database / Q2 schema

Excluded Locations

Finance Snowflake Account / Forecast database / Q2 schema / Orders

When querying any table other than the "Orders" table in the Q2 schema, Satori associates the query with this dataset and applies any permissions or policies that are defined on it.

User Access Rules Tab

Permissions to access datasets are defined for individual users or groups and can be limited to a predefined time range. In addition, Satori can automatically revoke permissions if they are unused. This helps organizations avoid excess and unused permissions.

Screenshot

Data Access Request Types

Satori provides three main capabilities for controlling dataset access. These access controls can be used in parallel to streamline the process of managing access to data.

Access Rules

Dataset access rules are similar to database privileges; they enable admins to grant immediate access to datasets without requiring users to ask for access. Satori recommends that you use this method for providing access if you know which users or groups require access to a dataset and your organization's policy does not require an approval process.

Screenshot

User Access Requests

User Access Requests allow users that do not have the required permissions to request access to the dataset. Users can request access in the data portal or the Slack application. In data stores that use the proxy-based integration, users receive a direct URL to the data portal to request access when their query is blocked by Satori.

Screenshot

Pending User Access Requests

When users make dataset access requests, approvers receive a notification about a pending access request with all the information they need to approve or deny the request. Approvers can configure their notification preferences in the user profile view.

Screenshot

Self-Service Access

Self-service access rules allow users to grant themselves access to a dataset based on a set of predefined permissions. Self-service access rules are similar to access request rules, except they do not require approval.

This method is the recommended alternative to the standard dataset user access rules. Users are required to provide a reason for access, which is then stored in the access history.

Screenshot

Data Portal User Access Requests

Enable access requests to allow users that do not have the required permissions to request access. When users query data they receive an access request notification via the Data Portal.

Screenshot

NOTE: When users query data, Satori searches for the required permissions, if available Satori sends the query to the datastore.

User Access Requests via Slack

User access requests can also be made via Slack. Users with access to the Satori Slack App can make data access requests by using the Slack command /satori access.

Data Inventory Tab

Satori provides you with a rich out-of-the-box taxonomy. The dataset data inventory provides a holistic view of the sensitive data and access patterns. In addition to the provided taxonomy, you can also add to it by creating customer classifier categories and custom classifiers.

Screenshot

Managing Technical Metadata

Using the Data Inventory view of a dataset - data engineers or dataset stewards can review the results of the automatic data classification and override, remove or add any necessary tags. See the Data Inventory section for more details.

Security Policies Tab

This is where you select which security policies you wish to assign to the selected dataset. You can assign multiple security policies to a single dataset.

Screenshot

Custom Policies Tab

Using the Custom Policy view of a dataset, enables data engineers or dataset stewards to implement custom data access policies using the Policy Engine.

Screenshot

See the Policy Engine Overview section for more details.

User Access History Tab

A dataset is a collection of data store objects such as tables or schemas from one or more data stores, that you wish to govern access to as a single unit.

You can create datasets as part of the data development lifecycle. Once a dataset is defined you can then assign data stewards to manage the day to day operations of access to data.

Screenshot-