Custom connector

Create a custom connection

  • On the StarTree Data Manager overview page, click Create a dataset, you'll be prompted to complete the following steps:

    1. Select the connection type
    2. Enter dataset details
    3. Perform data modeling.
    4. Add indexes and additional configuration
    5. Review and create your dataset

This document covers steps 1-2 above. For information about steps 3-5, see the following sections in the Ingest data doc:

Select the custom connection type

  • Select Custom Connection as the connection type.

Enter dataset details

  1. Under Dataset Details, enter the dataset name and optional description of the dataset.

  2. Under Datasource Details, the GCS data source template is selected by default. Do one of the following:

    • If you do not want to use the GPC template, click Clear Template, and then enter the connection details to connect to your data source.

    • If you’re connecting to a Google Cloud Storage data source and want to use the template, enter the following connection and input configuration details in JSON format:

      Configure
      Option
      json key
      Desciption
      Connection configuration detailsCustom connection namenameProvide a name for your custom connection.
      GCP project IDprojectIdEnter your Google project ID. Find your project ID in the Google Cloud console (opens in a new tab)
      GPC credentialsgcpKey
      jsonKey
      jsonNodeKey
      To obtain a string-encoded service account key, see Google documentation on how to create a service account key (opens in a new tab). Include the approprate format for your selected json key.
      Input configuration detailsInput file formatformatSpecify your input file format: csv, json, avro, parquet, or orc
      inputUriEnter the path to input file(s)
      includeFilePatternMatchOptional. Use to only fetch data from files in the inputUri path that match this pattern.
      excludeFilePatternMatchOptional. Use to only fetch data from files in the inputUri path that do not match this pattern.
      recordReaderConfig:
      {key: ... }
      Provide additional details about the source schema. For more information, see supported input formats (opens in a new tab).
  3. Click Check Connection & Sample Data to validate the connection is successful and fetch sample data, and then click Next to configure the schema.

Update dataset configuration

You can configure the kind of Pinot table and a few properties like replication factor, retention period, primary keys, dataset type, and tenants. A sample configuration is shown below:

    {
        "replication": 1,
        "retentionInDays": 180,
        "primaryKey": [],
        "primaryTimeColumn": "timestamp",
        "datasetType": "standard",
        "tenants": {
            "brokerTenant": "DefaultTenant",
            "serverTenant": "DefaultTenant"
        }
    }

Configure ingestion

Set the *Mode to append or sync, and then specify the ingestion schedule as a cron expression. The following sample configuration appends ingested data on the 20th minute of every hour, every day:

{
  "mode": "append",
  "schedule": "0 20 * ? * * *"  
}

Review and verify the dataset

  1. Review and verify the Pinot schema and table configuration, and make edits as needed.
  2. When ready, click Create Dataset.