Skip to main content

Apache Kafka connector

Apache Kafka is an open-source distributed data streaming platform that was originally developed by LinkedIn and later donated to the Apache Software Foundation. Kafka's designed to handle high-throughput, fault-tolerant, and real-time data streaming.

StarTree customers can use the Apache Kafka connector to create a table in Apache Pinot by mapping it to a particular topic. As soon as the integration is set up, StarTree will automatically begin to consume data from the connected topic in Confluent Cloud.

To begin, from the StarTree Data Manager overview page, click Create a dataset. Then, on the Select Connection Type page, click Kafka.

Connect StarTree to Apache Kafka

You will need the following information to create a connection with Apache Kafka:

  • Broker URL: You can find this information in Apache Kafka.
  • Security Protocol: Your chosen communication type, select from the list of available options.
  • SASL Mechanism: Your chosen authentication protocol implementation, select from the list of available options.
  • Username:
  • Password:

Advanced Options are available if you are using a schema registry. You must then provide the following information:

  • Schema Registry URL:
  • Username:
  • Password:

Apache Kafka connection settings

Select a Topic and map it to a table in Apache Pinot

When you provide schema registry information, only topics associated with that schema are displayed in the topic selection screen.

Apache Kafka input format

  1. From the first dropdown menu, select the desired Topic Name as it exists in Confluent Cloud.

  2. From the second dropdown menu, select the appropriate Data Format. This will map the content from Confluent Cloud to the new table in Apache Pinot. Note: If schema registry information is not provided and you select AVRO as the data format, this will result in error. If this happens, go back to the previous screen to provide schema registry information. The following data formats are currently supported in Data Manager:

    • Avro
    • JSON
    • Parquet
    • protobuf
  3. (Optional) Record reader config is for Advanced Users to provide additional details about the source schema.

  4. (Optional) Improve query performance by adding indexes to the appropriate columns and choose encoding types for each column.

  5. (Optional) Configure unique details such as tenants, scheduling, data retention, and a primary key for upsert.

  6. Click Next.

  7. Check the details and preview data. When ready, click Create Dataset.