Ingest data
Complete the following steps to ingest data from any supported source:
- Create a connection to the source
- Perform data modeling
- Add indexes and perform additional configuration
- Review and submit
Create a connection to the source
The data source and connection are separate so you can reuse connection configurations.
- Select the type of data source from the data source catalog, for example S3 or Confluent Cloud.
- Drill down to select the exact data source from that connection, for example, the directory in S3 or topic in Confluent Cloud. The data source is mapped to a specific Pinot table.
- Click Next.
- Enter configuration information specific to the data source you selected. For more information, see about configuration details,
see your selected connector:
- Apache Kafka
- AWS Kinesis
- AWS S3
- Confluent Cloud
- Delta Lake
- Google BigQuery
- Snowflake
- File Upload
- Custom connection
Perform data modeling
Update the schema by adding new fields, removing fields from the source, or changing the fields that already exist in the source.
-
Do the following as needed to model the data:
- Delete a column: Click the Delete button at the end of the row of the column to delete.
- Edit a column: To change the column's field type, data type, or to specify whether the column is multi-value, find the column to update, and then click the Edit button at the end of the row. Make changes as needed, including transformations. For information about transformation functions, see supported transformations in Apache Pinot (opens in a new tab).
- Add a new column: This will open a modal window similar to edit column and you can create a new column and use the transformation functions to provide the logic to fill values for that column.
- Alternatively, provide the schema in JSON format, and preview the changes to the data model.
-
Click Next.
Add indexes and perform additional configuration
- Do the following as needed:
- Select indexes from a variety of available indexes for one or many fields in the schema. Configure the star-tree index as needed, and apply advanced configurations (like upsert, data retention, and batch schedules).
- Provide the applicable configuration as JSON to apply all the indexes and other configurations.
- Improve query performance by adding indexes to the appropriate columns and choose encoding types for each column.
- Configure unique details such as tenants, scheduling, data retention, and a primary key for upsert.
- Click Next.
Review and submit
Preview and create a Pinot table for your dataset.
- Review the schema and table configuration details and preview data.
- When ready, click Create Dataset.