Skip to main content

PrivaceraCloud Documentation

Databricks SQL Overview and Configuration

:

One purpose of PolicySync for Databricks SQL is to limit users access to your entire Databricks data source or portions thereof, such as Delta external tables, views, entire tables, or only certain columns or rows.

Planning and general process

The general process for connecting with JDBC to a Databricks SQL data source, creating policy, and limiting user access is as follows, You should plan to have the necessary information before you begin the specific steps described here.

  1. Add the privacera_tag service.

  2. Create an endpoint in Databricks SQL for PrivaceraCloud to connect to, with JDBC username, password, and URL.

  3. Add Databricks SQL as a service in PrivaceraCloud.

  4. Define a data source for the Databricks SQL endpoint in PrivaceraCloud using the values from the first step and other required fields.

  5. Define the Databricks SQL service.

  6. Determine the users, groups, or roles who need access from PrivaceraCloud to your Databricks SQL.

    1. Ensure that all users in PrivaceraCloud who will access Databricks SQL have an email address in their PrivaceraCloud account.

    2. Define those users with appropriate permissions in Databricks.

    3. Create a resource policy to assign users, groups, or roles the necessary permissions to access the Databricks SQL data source at the appropriate depth.

    4. Decide the depth of the data access you will give to users: views, source tables, columns, or rows. See Allowable Privileges.

Prerequisites

Make sure the Privacera Tag Service and Databricks SQL Endpoint configuration are updated before you configure Databricks SQL PolicySync.

Enable PrivaceraCloud tag service

In PrivaceraCloud, the administrator must add the privacera_tag service to enable PolicySync with Databricks SQL.

See the steps in Add the privacera_tag Service.

Create endpoint in Databricks SQL

In Databricks SQL, an administrator must create a Databricks SQL endpoint for connecting from PrivaceraCloud. This process is described in Create an Endpoint in Databricks SQL.

Make note of the following values for entering into the fields in PrivaceraCloud as detailed in Connect Application and Databricks SQL PolicySync Fields:

  • The email address of the user defined in the endpoint. This is the value of the JDBC username (Service jdbc username) in PrivaceraCloud.

  • The Databricks generated access token. This is the value of the JDBC password (Service jdbc password) for the defined JDBC username in PrivaceraCloud.

  • The JDBC URL (Service jdbc url) defined for the endpoint.

Databricks SQL with Privacera Hive

To use Databricks SQL with Privacera Hive, see Databricks SQL Hive Service Def.

Connect Databricks SQL application

With the values for the JDBC username, JDBC password, and JDBC URL that you noted in ???, define the data source connection in PrivaceraCloud to the Databricks SQL endpoint.

Follow these steps to connect the Databricks SQL application to the PrivaceraCloud:

  1. Go the Setting > Applications.

  2. In the Applications screen, select Databricks SQL.

  3. Select the platform type (AWS or Azure) on which you want to configure the Databricks application.

  4. Enter the application Name and Description, and then click Save.

  5. Click the toggle button either to enable the Access Management or Data Discovery for Databricks SQL.

    Note

    If you don't see Data Discovery in your application, enable it in Settings > Account > Discovery.

  6. In the BASIC tab, enter values in the fields. For more information on the Fields and it's values, see Databricks SQL PolicySync Fields.

  7. Click Save.

  8. In the ADVANCED tab, you can add custom properties.

  9. Using the IMPORT PROPERTIES button, you can browse and import application properties.

Grant Databricks SQL permissions to PrivaceraCloud users

For each PrivaceraCloud user that needs access to Databricks SQL, the administrator needs to define that user with appropriate access permissions in Databricks.

Ensure all PrivaceraCloud users have an email address

All PrivaceraCloud users who will access Databricks SQL must have an email address in their user account on PrivaceraCloud. This email address is required to login to Databricks SQL.

Grant Databricks SQL access

  1. In your Databricks account, navigate to Data science and engineering.

  2. Click Workspace on the top right.

  3. To open the Admin Console, go to the top right of the Workspace, click the user account icon, and select Admin Console.

  4. In the Databricks SQL access column, select the checkbox for the user.

Grant Databricks SQL endpoint access

  1. In the Databricks SQL Dashboard, navigate to SQL > Endpoints

  2. Click the name of the Endpoint for which you want to add user permission.

  3. In the top right, click Permissions.

  4. In the SQL Endpoint Permissions dialog, select the intended user from drop down

  5. Give the user Can Use permission.

  6. Click Add.

  7. Click Save.

Define a resource policy

In PrivaceraCloud, define a resource policy to grant access to the Databricks SQL data source to users, groups, or roles.

Follow the steps in Resource Policies and the details about allowed privileges described here.

Allowable privileges

The following privileges can be specified for a Databricks SQL resource policy:

  • SELECT: Allows read access to an object.

  • CREATE: Provides ability to create an object (for example, a table in a database).

  • MODIFY: Provides ability to add, delete, and modify data to or from an object.

  • USAGE: An additional requirement to perform any action on a database object.

  • READ_METADATA: Provides ability to view an object and its metadata.

  • CREATE_NAMED_FUNCTION: Provides ability to create a named UDF in an existing catalog or database.

  • ALL PRIVILEGES: Gives all privileges, equivalent to all the above privileges.

  • Data_Admin Privilege for Secure Views: With the Data_Admin privilege, access policies are applied to source tables. If you want to restrict the access policies only to the views and not to the source tables, enable the following property in the PolicySync configuration, as detailed in Connect Application and Databricks SQL PolicySync Fields:

    Secure view Access by Table policies: true

Test the policy

To assign privileges to users, groups, or roles, follow the steps in Resource Policies.

This can be tested with a non-administrator user.

Databricks SQL PolicySync fields

For a description of all fields that must or can be set for resource policy, see Databricks SQL PolicySync Fields.

Configuring column-level access control

To enable column-level access control, set the following fields when you define the PolicySync fields:

  • Column Level Access Control: true.

  • In custom fields, add the following, where # REDACTED # is any string of your choice:

    ranger.policysync.connector.4.access.control.number.value=0
    ranger.policysync.connector.4.access.control.double.value=0
    ranger.policysync.connector.4.access.control.text.value='# REDACTED #'      

View-based masking functions and row-level filtering

For supported masking functions and supported row-level filtering, see Databricks SQL Masking Functions.