ClickHouse is an open-source, column-oriented database management system mainly used for online analytical processing (OLAP). It is fast, and allows for real-time analysis of your data.
RudderStack lets you configure ClickHouse as a destination to which you can send your event data seamlessly.
Setting user permissions in ClickHouse
Make sure your ClickHouse user has read access to system.columns
table for fetching the table schemas.
You also to need to set date_time_input_format
to best_effort
for ClickHouse to parse all the ISO 8601 date and time formats, as shown:
set date_time_input_format = 'best_effort';
UInt8
datatype to set Boolean values and map UInt8
to Boolean internally. So when a schema is fetched from ClickHouse, RudderStack treats UInt8
as Boolean.UInt8
as a datatype except for Boolean values (0,1).Configuring ClickHouse in RudderStack
To send event data to ClickHouse, you first need to add it as a destination in RudderStack and connect it to your data source. Once the destination is enabled, events will automatically start flowing to ClickHouse via RudderStack.
To configure ClickHouse as a destination in RudderStack, follow these steps:
- In your RudderStack dashboard, set up the data source. Then, select ClickHouse from the list of destinations.
- Assign a name to your destination and then click on Next.
Connection Settings
Enter the following credentials in the Connection Credentials page:
- Host: The host name of your ClickHouse database.
- Port: The TCP port of your ClickHouse host. If you want the connection to be secure, use the secure TCP port
9440
. Refer to the ClickHouse guide for more information. - Database: The database name in your ClickHouse instance where the data gets loaded.
- Cluster: The name of your ClickHouse cluster. If you are running a single host ClickHouse cluster, leave this field blank.
- User: The name of the user with the required read/write access to the above database.
- Password: The password for the above user.
- Secure: Enable this setting to establish a secure connection.
- Sync Frequency: Specify how often RudderStack should sync the data to your ClickHouse database.
- Sync Starting At: This optional setting lets you specify the particular time of the day (in UTC) when you want RudderStack to sync the data to the warehouse.
- Exclude Window: This optional setting lets you set a time window when RudderStack will not sync the data to your database.
Configuring the object storage
RudderStack lets you configure the following object storage configuration settings while setting up your ClickHouse destination:
- Use RudderStack-managed object storage: Enable this setting to use RudderStack-managed buckets for object storage.
Choose your storage provider: If Use RudderStack-managed object storage is disabled in the dashboard, select the cloud provider for your object storage and enter the relevant settings:
IPs to be allowlisted
To enable network access to RudderStack, you will need to allowlist the following RudderStack IPs:
- 3.216.35.97
- 34.198.90.241
- 54.147.40.62
- 23.20.96.9
- 18.214.35.254
- 35.83.226.133
- 52.41.61.208
- 44.227.140.138
- 54.245.141.180
- 3.66.99.198
- 3.64.201.167
- 3.66.99.198
- 3.64.201.167
FAQ
How does RudderStack de-duplicate the events that are loaded into the warehouse?
RudderStack creates tables with the engine ReplacingMergeTree order by (received_at, id)
and column dataType
as Nullable(dataType)
.ReplacingMergeTree
replaces the latest event which has the same received_at, id
while merging. Note that Nullable
is not applicable for sortKeys
.
How does RudderStack merge the user properties in the user's table?
For the user's table, RudderStack creates a table with an engine AggregatingMergeTree
ordered by id
and a column dataType
as SimpleAggregateFunction(anyLast, Nullable(dataType))
. Merging the columns with the sameid
picks the last value which is not null. Note that Nullable
is not applicable for sortKeys
.
How does RudderStack handle cases when loading data into ClickHouse?
RudderStack converts the event keys into the lower case before exporting the data into ClickHouse. This is so that it does not create two tables in case the event name has two different cases.
Contact us
For queries on any of the sections covered in this guide, you can contact us or start a conversation in our Slack community.