StarTree Cloud Coding Competition With Grand Prizes

Configuring the segment threshold

To learn how to configure the segment threshold for real-time tables, watch the following video, or complete the tutorial below, starting with Prerequites.

Pinot Version	1.0.0
Code	startreedata/pinot-recipes/configuring-segment-threshold

💡

To learn what the segment threshold is and why it's important, see Concepts: Segment Threshold.

Configuration parameters

The segment threshold is configured using the following parameters:

Property	Description
`realtime.segment.flush.threshold.rows`	Row count flush threshold.
`realtime.segment.flush.threshold.time`	Time threshold that will keep a segment open for it is flushed.
`realtime.segment.flush.threshold.segment.size`	The desired size of a completed segment.

There are two main approaches for configuring these parameters:

Desired row threshold

We can define a manual row threshold by specifying a value for realtime.segment.flush.threshold.rows.

Pinot will complete/flush segments as soon as the consuming segment contains the specified number of rows. This will generally result in each segment having the same number of rows.

However, if the time threshold defined by realtime.segment.flush.threshold.time is reached, a segment will be completed even if the row count flush threshold has not yet been reached.

💡

If realtime.segment.flush.threshold.rows is set to a value greater than 0, realtime.segment.flush.threshold.segment.size is ignored.

Desired segment size

Alternatively we can set realtime.segment.flush.threshold.rows to 0, in which case Pinot will instead attempt to make sure that every segment has the desired size defined by realtime.segment.flush.threshold.segment.size.

When configuring the segment threshold this way, the minimum number of rows in a segment is 10,000.

The first segment for a new partition will have 100,000 rows. For subsequent segments Pinot will slowly adjust the number of rows to get closer to the desired segment size. This means that the first few segments might differ in size, but over time the segment size will approach the desired size.

The algorithm used in this approach is described in more detail in the Auto-tuning Pinot real-time consumption (opens in a new tab) blog post.

A worked example

Let's see how this works with a worked example.

Prerequisites

To follow the code examples in this guide, you must install Docker (opens in a new tab) locally and download recipes.

Navigate to recipe

If you haven't already, download recipes.
In terminal, go to the recipe by running the following command:

cd pinot-recipes/recipes/configuring-segment-threshold

Launch Pinot Cluster

You can spin up a Pinot Cluster by running the following command:

docker-compose up

This command will run a single instance of the Pinot Controller, Pinot Server, Pinot Broker, and Zookeeper. You can find the docker-compose.yml (opens in a new tab) file on GitHub.

Pinot Schema and Table

Let's create a Pinot Schema and Table.

The schema is defined below:

{
  "schemaName":"events",
  "dimensionFieldSpecs":[
    {
      "name":"uuid",
      "dataType":"STRING"
    }
  ],
  "metricFieldSpecs":[
    {
      "name":"count",
      "dataType":"INT"
    }
  ],
  "dateTimeFieldSpecs":[
    {
      "name":"ts",
      "dataType":"TIMESTAMP",
      "format":"1:MILLISECONDS:EPOCH",
      "granularity":"1:MILLISECONDS"
    }
  ]
}

config/schema.json

And the table config below:

{
  "tableName":"events",
  "tableType":"REALTIME",
  "segmentsConfig":{
    "timeColumnName":"ts",
    "schemaName":"events",
    "replication":"1",
    "replicasPerPartition":"1"
  },
  "tableIndexConfig":{
    "loadMode":"MMAP",
    "streamConfigs":{
      "streamType":"kafka",
      "stream.kafka.topic.name":"events",
      "stream.kafka.broker.list":"kafka-segment:9093",
      "stream.kafka.consumer.type":"lowlevel",
      "stream.kafka.consumer.prop.auto.offset.reset":"smallest",
      "stream.kafka.consumer.factory.class.name":"org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
      "stream.kafka.decoder.class.name":"org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
      "realtime.segment.flush.threshold.rows":"0",
      "realtime.segment.flush.threshold.time":"1h",
      "realtime.segment.flush.threshold.segment.size":"5M"
    }
  },
  "ingestionConfig":{
    "batchIngestionConfig":{
      "segmentIngestionType":"APPEND",
      "segmentIngestionFrequency":"DAILY"
    }
  },
  "tenants":{
    
  },
  "metadata":{
    
  }
}

config/table.json

Pinot will try to ensure that the size of our segments is 5MB. It will complete a segment after 1 hour if it hasn't already been flushed.

Create the table and schema by running the following command:

docker run \
   --network segment \
   -v $PWD/config:/config \
   apachepinot/pinot:1.0.0 AddTable \
     -schemaFile /config/schema.json \
     -tableConfigFile /config/table.json \
     -controllerHost "pinot-controller-segment" \
    -exec

Ingesting Data

Next, we're going to ingest some data into Kafka:

python datagen.py | 
kcat -P -b localhost:9092 -t events

Inspecting segment sizes

After the ingestion script has run for a while we can inspect the size of the segments that have been completed.

You can return the size of each completed segment by running the following command:

curl -X GET "http://localhost:9000/tables/events/size?detailed=true" \
  -H "accept: application/json" 2>/dev/null | 
  jq -r '.realtimeSegments .segments | .[] | .serverInfo | .[]' | 
  jq -sc 'sort_by(.segmentName) | .[]'

Output

{"segmentName":"events__0__0__20220301T1457Z","diskSizeInBytes":5144515}
{"segmentName":"events__0__1__20220301T1504Z","diskSizeInBytes":5242744}
{"segmentName":"events__0__2__20220301T1511Z","diskSizeInBytes":5242744}
{"segmentName":"events__0__3__20220301T1518Z","diskSizeInBytes":5242744}
{"segmentName":"events__0__4__20220301T1525Z","diskSizeInBytes":5242744}
{"segmentName":"events__0__5__20220301T1532Z","diskSizeInBytes":5242744}
{"segmentName":"events__0__6__20220301T1539Z","diskSizeInBytes":5242797}
{"segmentName":"events__0__7__20220301T1546Z","diskSizeInBytes":5242797}

We can see that all the segments have a size of just over 5 million bytes, which is close to 5 MB. The first segment had a size of 4.9 MB, but after that the size increased to 4.9998 MB until the last two segments where it's increased to 4.9999 MB. Presumably adding an extra row would then take us further away from 5 MB.

Now let's see how many rows are contained in each segment. You can do that for one of the segments by running the following command:

curl -X GET "http://localhost:9000/segments/events/events__0__1__20220301T1504Z/metadata" \
  -H "accept: application/json" 2>/dev/null

Output

{
  "segment.crc": "1090576205",
  "segment.creation.time": "1646147045318",
  "segment.end.time": "1646147463422",
  "segment.flush.threshold.size": "101912",
  "segment.index.version": "v3",
  "segment.name": "events__0__1__20220301T1504Z",
  "segment.realtime.download.url": "http://192.168.0.3:9000/segments/events/events__0__1__20220301T1504Z",
  "segment.realtime.endOffset": "201912",
  "segment.realtime.numReplicas": "1",
  "segment.realtime.startOffset": "100000",
  "segment.realtime.status": "DONE",
  "segment.start.time": "1646147042226",
  "segment.table.name": "events",
  "segment.time.unit": "MILLISECONDS",
  "segment.total.docs": "101912",
  "segment.type": "REALTIME"
}

The segment.flush.threshold.size property indicates that this segment contains 101,912 rows.

We can check how many rows are stored in all segments by running the following script:

segments=$(
  curl "http://localhost:9000/tables/events/size?detailed=true" 2>/dev/null | 
  jq -r '.realtimeSegments .segments | .[] | .serverInfo | .[]' | 
  jq -sc 'sort_by(.segmentName)'
)
 
for segment in $(echo $segments | jq -c '.[]'); do
  segmentName=$(echo $segment | jq -r '.segmentName')
  rows=$(
    curl "http://localhost:9000/segments/events/${segmentName}/metadata" 2>/dev/null |
    jq '."segment.flush.threshold.size" | tonumber'
  )
  echo $segment | jq -c --arg rows $rows '. + {"rows": $rows}'
done

Output

{"segmentName":"events__0__0__20220301T1457Z","diskSizeInBytes":5144515,"rows":"100000"}
{"segmentName":"events__0__1__20220301T1504Z","diskSizeInBytes":5242744,"rows":"101912"}
{"segmentName":"events__0__2__20220301T1511Z","diskSizeInBytes":5242744,"rows":"101912"}
{"segmentName":"events__0__3__20220301T1518Z","diskSizeInBytes":5242744,"rows":"101912"}
{"segmentName":"events__0__4__20220301T1525Z","diskSizeInBytes":5242744,"rows":"101912"}
{"segmentName":"events__0__5__20220301T1532Z","diskSizeInBytes":5242744,"rows":"101912"}
{"segmentName":"events__0__6__20220301T1539Z","diskSizeInBytes":5242797,"rows":"101913"}
{"segmentName":"events__0__7__20220301T1546Z","diskSizeInBytes":5242797,"rows":"101913"}

It looks like Pinot has settled in on a size of 101,913 rows per segment. The number of rows would be reduced if we imported much bigger strings into the uuid column, but for now it looks fairly stable.

Upserts and the real-time to offline job Get time boundary