Confluent Cloud

Confluent Cloud connector

Confluent Cloud is a fully-managed, cloud-native data streaming platform built on Apache Kafka. Confluent Cloud lets you build real-time data pipelines and stream data between different applications. It offers a scalable and reliable infrastructure that can handle massive amounts of data streams and process them in real-time.

Use the Confluent Cloud connector to create a table in Apache Pinot by mapping it to a particular topic. As soon as the integration is set up, StarTree automatically begins to consume data from the connected topic in Confluent Cloud.

Connect StarTree to Confluent Cloud

Create a dataset

Let's go through the steps to create a dataset with Confluent Cloud as a Source.

  1. In StarTree Cloud, on the Data Manager overview page, click Create a dataset. Then, on the Select Connection Type page, click Confluent Cloud.
  2. Enter the following information to create a connection with Confluent Cloud:
  • Broker URL: Obtain the Bootstrap Server endpoint in Cluster Settings of your Confluent Cloud environment. For more information, see how to obtain the Bootstrap server endpoint.
  • API Key: The API key extracted in the above steps from your Confluent Cloud account. For more information, see how to [create the API key and secret](#Create the API key and secret).
  • API Secret: The API secret extracted in the above steps from your Confluent Cloud account.
  • Authentication Type: SASL
  1. (Optional) To use Confluent Cloud Schema Registry (opens in a new tab), choose Advanced Options when creating a connection and provide the following information:
  • Schema Registry URL
  • Schema Registry API Key
  • Schema Registry API Secret
  1. Click Test Connection to verify the connection, and then click Create Connection. When you provide schema registry information, only topics associated with that schema are displayed in the topic selection screen.
  2. In the Topic Name drop-down list, select the topic name that exists in Confluent Cloud. Note: When you provide schema registry information, only topics associated with that schema are displayed in the Topic Selection page.
  3. In the Data Format drop-down list, select the appropriate format. This maps the content from Confluent Cloud to the new table in Apache Pinot.

If schema registry information isn't provided, and you select AVRO as the data format, this results in an error. If this happens, go back to the previous screen to provide schema registry information.

  1. (Optional) In Record reader config, provide additional details about the source schema.
  2. (Optional) Improve query performance by adding indexes to the appropriate columns and choose encoding types for each column.
  3. (Optional) Configure unique details such as tenants, scheduling, data retention, and a primary key for upsert.
  4. Click Next.
  5. Check the details and preview data. When ready, click Create Dataset.

Obtain bootstrap server endpoint

  1. Log in to your Confluent Cloud account (opens in a new tab).
  2. In the left navigation pane, click click Environments, and then click the name of your Environment. Confluent Environments page
  3. In the Environment page, select the Clusters tab, and then click the name of your Cluster. Confluent Clusters tab
  4. In the left navigation pane, under Cluster Overview, click Cluster Settings. Confluent Cluster Settings navigation
  5. On the Cluster settings page, copy the Bootstrap server endpoint and save it securely like any other password. You'll need this endpoint to configure your Broker URL connection (Step 2 in creating a dataset). Confluent Cluster Settings navigation
  6. Continue to the next procedure to create the API key and secret.

Create the API key and secret

  1. Follow steps 1-3 in the section above to open the Confluent Cloud Cluster page.

  2. In the left navigation pane, under Cluster Overview, click API Keys.

    Confluent API Keys navigation
  3. On the API keys page, click Create key.

    Confluent API Keys page
  4. On the Create key page, do the following:

    a. For Access control, select Granular access, and click Next.

    Confluent create granular API key

    b. For Service account, specify the following to create a service account, and then click Next:

    • New service account name: A unique name for your service account.
    • Description: A brief description of your service account. c. To add ACLs to service account, specify details for the Cluster, Consumer group, and Topic ACLs, including the following:
    • For Cluster, select the following:
      • ACL category: Cluster
      • Operation: Describe
      • Permission: Allow
    • For Consumer group, select the following:
      • Consumer group ID: startree-ingestion
      • Pattern type: Prefixed
      • Operation: Describe or Read
      • Permission: Allow
    • For Topic, select the following:
      • Topic name: Select the topic to grant permissions to StarTree. By default, all topics can be accessed.
      • Pattern type: Literal or Prefixed. If the specified topic name is *, select Literal.
      • Operation: Read or Describe_configs
      • Permission: Allow Confluent Add ACLs to service account

    Note: ACLs are required to assign permissions for each category of the API key. To define additional account control lists, click +Add ACLs.

    d. Click Next, and then under Get your API key, click the Copy icon next to the key and secret, and save them securely like any other password.

    e. (Optional) Click Download and Continue.

    Confluent copy and download API key

    Note: Once you exit this screen, you cannot see the same API key and secret.

  5. Continue to the next procedure to create a dataset.