Databricks is a data analytics platform that lets you easily integrate with open source libraries. It offers a simple collaborative environment to run interactive and scheduled data analysis workloads.
RudderStack supports Databricks as a source from which you can ingest data and route it to your desired downstream destinations.
Granting permissions
RudderStack requires you to grant certain user permissions on Databricks to successfully access data from it.
Follow the steps listed in the following sections in the exact order to grant these permissions:
Step 1: Add a user
- Add a new user (for example, user@example.com) by following the steps in the Databricks documentation.
Step 2: Creating the RudderStack schema and granting permissions
- Create a dedicated schema
_rudderstack
.
CREATE SCHEMA `_rudderstack`;
_rudderstack
schema is used by RudderStack for storing the state of each data sync. This name should not be changed.- Grant full access to the schema
_rudderstack
for the user created in step 1.
GRANT ALL PRIVILEGES ON SCHEMA `_rudderstack` TO `user@example.com`
Replace user@example.com
with the user created in step 1.
Setting up the Databricks source in RudderStack
To set up Databricks as a source in RudderStack, follow these steps:
Naming the source
- Log into your RudderStack dashboard.
- From the left panel, go to Source > New Source > Reverse ETL. Then, select Databricks, as shown:
- Assign a name to your source.
Configuring the connection credentials
- Enter the relevant settings from Databricks in the Connection Credentials section as shown below:
- Host - Enter the server hostname.
- Port - Enter the port number.
- Path - Enter the HTTP path.
- Token - Enter the personal access token.
- Click on Continue to proceed.
Schedule settings
- Specify the Schedule Settings to schedule the data syncs from your Databricks source.
- After specifying the schedule type and run settings, click on Continue to finish the setup.
Databricks is now successfully configured as a source in your RudderStack dashboard. You can further connect this source to your preferred destination by clicking on Add Destination button, as shown:
Specifying the data to import
While connecting a destination to your Databricks source, you can use the default JSON mapping feature.
FAQ
Where can I obtain the connection credentials for Databricks?
To obtain the Host, Path, and Port number, go to your Databricks account and follow these steps:
- Go to the Compute tab and select your Databricks cluster.
- Click on Advanced options > JDBC/ODBC tab to find the required settings:
To obtain the Token, go to the Settings > User Settings in your Databricks account and generate a new personal access token, as shown:
What do the three validations under Verifying Credentials imply?
When setting up a Reverse ETL source, once you proceed after entering the connection credentials, you will see the following three validations under the Verifying Credentials option:
These options are explained below:
- Verifying Connection: This option indicates that RudderStack is trying to connect to the warehouse with the information specified in the connection credentials.
- Able to List Schema: This option checks if RudderStack is able to fetch all the schema details using the provided credentials.
- Able to Access RudderStack Schema: This option implies that RudderStack is able to access the
_rudderstack
schema you have created by successfully running all the commands in the Creating the RudderStack schema and granting permissions section.
_rudderstack
schema and given RudderStack the required permissions to access it. For more information, refer to the Creating the RudderStack schema and granting permissions section.Contact us
For queries on any of the sections covered in this guide, you can contact us or start a conversation in our Slack community.