Apache Kafka connector
Apache Kafka is an open-source distributed data streaming platform that was originally developed by LinkedIn and later donated to the Apache Software Foundation. Kafka's designed to handle high-throughput, fault-tolerant, and real-time data streaming.
StarTree customers can use the Apache Kafka connector to create a table in Apache Pinot by mapping it to a particular topic. As soon as the integration is set up, StarTree will automatically begin to consume data from the connected topic in Confluent Cloud.
To begin, from the StarTree Data Manager overview page, click Create a dataset. Then, on the Select Connection Type page, click Kafka.
Connect StarTree to Apache Kafka
You will need the following information to create a connection with Apache Kafka:
- Broker URL: You can find this information from your Kafka cluster. The broker URL contains the port number at the end.
- Authentication Type:
- Default: Select this option if you've deployed StarTree Cloud BYOC version and Kafka in the same VPC. In this case, connect to your Kafka cluster without any authentication. When selecting this option, no further information is needed to create a connection between StarTree Data Manager and the Kafka broker.
- SASL: SASL is a framework for data security and authentication. Choose this option to ensure that only authenticated actors can connect to the Kafka cluster. This option supports both unencrypted and encrypted data. Enter the following information when selecting this option:
- Security Protocol: Your chosen communication type, select from the following available options:
- SASL_PLAINTEXT: The communication between the client (StarTree Data Manager in this case) and Kafka broker is not encrypted.
- SASL_SSL: The communication between the client (StarTree Data Manager in this case) and Kafka broker is encrypted.
- SASL Mechanism: This is the authentication mechanism and not coupled with security protocol. Select from the following options:
- PLAIN: In this mode, authentication credentials are exchanged between StarTree Data Manager and Kafka broker as entered in the UI.
- SCRAM-SHA-256: In this case, the authentication is established by passing a 32-byte hash token which is generated based on the username and password.
- SCRAM-SHA-512: In this case, the authentication is established by passing a 64-byte hash token which is generated based on the username and password.
- Username: Username to connect to the broker.
- Password: Password associated with the username to connect to broker.
- Security Protocol: Your chosen communication type, select from the following available options:
Advanced Options are available if you are using a schema registry. This is only supported for Confluent Cloud currently. The schema registry stored in other services like AWS Glue, and others, is not supported. For Confluent Cloud, enter the following information:
- Schema Registry URL:
- Username:
- Password:
SSL Setup: SSL can be set up in two different ways. One is using an open certificate authority like 'Let’s Encrypt' and the other is using a commercial certificate authority. In case your SSL is set up using a open certificate authority, you can connect StarTree Data Manager with your Kafka cluster without any additional steps. In case the SSL is set up using a commercial certificate authority, StarTree Data Manager can't be used to ingest data from this Kafka cluster. Work with StarTree support to help configure the connection between StarTree Data Manager and your Kafka cluster, then use the Rest API to create a table.
Select a Topic and map it to a table in Apache Pinot
When you provide schema registry information, only topics associated with that schema are displayed in the topic selection screen.
-
From the first dropdown menu, select the desired Topic Name as it exists in Confluent Cloud.
-
From the second dropdown menu, select the appropriate Data Format. This will map the content from Confluent Cloud to the new table in Apache Pinot. Note: If schema registry information is not provided and you select AVRO as the data format, this will result in error. If this happens, go back to the previous screen to provide schema registry information. The following data formats are currently supported in Data Manager:
- Avro
- JSON
- Parquet
- protobuf
-
(Optional) Record reader config is for Advanced Users to provide additional details about the source schema.
-
(Optional) Improve query performance by adding indexes to the appropriate columns and choose encoding types for each column.
-
(Optional) Configure unique details such as tenants, scheduling, data retention, and a primary key for upsert.
-
Click Next.
-
Check the details and preview data. When ready, click Create Dataset.