Recipes
Recipes help you learn how to solve common problems with Apache Pinot. To use recipes, do the following:
- Download recipes
- Browse recipes by category, to see where you'd like to start:
Download recipes
To download Apache Pinot recipes, do one of the following:
- Clone the
pinot-recipes
repository using SSH - Clone the
pinot-recipes
repository using HTTPS - If you don't have a Git client or don't want to clone the repository, click to download a zip file (opens in a new tab) that contains the recipes.
Clone the pinot-recipes
repository using SSH
To clone the pinot-recipes
repository using SSH, do the following:
- Set up SSH keys associated with your GitHub account on your local machine. For more information, see Connecting to GitHub with SSH (opens in a new tab).
- Run the following command:
git clone git@github.com:startreedata/pinot-recipes.git
Clone the pinot-recipes
repository using HTTPS
To clone the pinot-recipes
repository using HTTPS, do the following:
- Run the following command:
git clone https://github.com/startreedata/pinot-recipes.git
- At the prompt, enter your GitHub username and a personal access token. If you forgot your GitHub username, see Remembering your GitHub username or email (opens in a new tab). If you need a personal access token, see Creating a personal access token (opens in a new tab).
Batch ingestion
To learn about batch ingestion in Pinot, see the following recipes:
- Importing CSV files with columns containing spaces
- Import data files from different directories
- Ingest CSV files from a S3 bucket
- Ingest JSON files
- Ingest Parquet Files from a S3 Bucket into Pinot Using Spark
- Backfill offline segment
Streaming ingestion
To learn about stream ingestion in Pinot, see the following recipes:
- Ingest simple JSON data from Kafka
- Ingest data from Kafka configured with SASL authentication
- Ingest data from Kafka configured with SSL and SASL authentication
- Ingest GitHub API events using Kinesis
- Ingest data from Pulsar
- Configuring segment threshold
- Ingest Avro messages with Confluent Schema registry
- Ingest CDC data using PostgreSQL, Kafka, and Debezium
- Ingest CDC data using DynamoDB and Kinesis
Transformation functions
To learn about transforming data with functions in Pinot, see the following recipes:
- Groovy transformation functions
- JSON transformation functions
- Chaining transformation functions
- Filtering functions
- DateTime strings to timestamps
- Combine source fields
Deep storage
To learn about deep storage for Pinot, see the following recipes:
Upserts
To learn about upserts in Pinot, see the following recipes:
Real-time to offline job
To learn about a real-time (streaming) to offline job (batch job) in Pinot, see the following recipes:
- Manually scheduling real-time to offline job
- Automatically scheduling real-time to offline job
- Upserts and the real-time to offline job
JSON documents
To learn about handling JSON data in Pinot, see the following recipes:
- Unnest arrays in JSON documents
- Rename fields when unnesting arrays in JSON documents
- Flattening nested objects
- Index JSON columns
- Update JSON index
Geospatial
To learn about geospatial data in Pinot, see the following recipes:
Merge and roll up
To learn about merge and roll ups in Pinot, see the following recipes:
Clickstream Analytics Dashboard App
This guide will walk you through creating a real-time clickstream analytics dashboard using StarTree Cloud Free Tier (Apache Pinot) and Streamlit.
Other
See other recipes available: