StarTree Cloud Coding Competition With Grand Prizes

How to get the time boundary for a hybrid table

Hybrid tables consist of real-time and offline tables with the same name. When querying these tables, the Pinot broker uses the time boundary to determine which records to read from the offline table and which to read from the real-time table.

To learn how to get the time boundary for a hybrid table, watch the following video, or complete the tutorial below, starting with Prerequites.

Pinot Version	1.0.0
Code	startreedata/pinot-recipes/time-boundary-hybrid-table

💡

For more information on time boundaries, see Concepts: Time Boundary

Prerequisites

To follow the code examples in this guide, you must install Docker (opens in a new tab) locally and download recipes.

Navigate to recipe

If you haven't already, download recipes.
In terminal, go to the recipe by running the following command:

cd pinot-recipes/recipes/time-boundary-hybrid-table

Makefile

The Makefile contains all of the commands need to start up Pinot and Kafka. Run the make command below.

make recipe

This command will also:

Create a Kafka topic called events.
Create a hybrid Pinot events table.
Batch load data into the offline events table in Pinot.
Generate stream data using the Pinot schema to Kafka and ultimately into the realtime events table in Pinot.

When you go to the table list (opens in a new tab) in Pinot, you will see an events_REALTIME and an events_OFFLINE table. When you go to the query console (opens in a new tab) in Pinot, you will only see one table: events.

Select Pinot Segments

The stream data generator will generate 1000 records into the events_REALTIME table. The batch loader will load 10 records into the events_OFFLINE table for a total of 1010 records.

If you count the number of records in this table, you will only get 1000.

select count(*) from events

If you look at the query response stats you'll see 1000 documents scanned from 1010 totalDocs. When querying hybrid tables, the Pinot Broker must decide which records to read from the offline table and which to read from the real-time table.

If you run the SQL below, you'll see that there are no OFFLINE segments. They are only realtime segments.

select $segmentName, count(*) from events
group by $segmentName

Check the current time boundary:

curl "http://localhost:8099/debug/timeBoundary/events" -H "accept: application/json" 2>/dev/null | jq '.'

Execute the API call below to force Pinot to update the event table's time boundary.

curl -X POST \
  "http://localhost:9000/tables/events/timeBoundary" \
  -H "accept: application/json" | \
  jq

Run the query again below. This time, you should see both offline and realtime segments.

select $segmentName, count(*) from events
group by $segmentName

Configuring the segment threshold Filtering records during ingestion