StarTree Cloud Coding Competition With Grand Prizes

Release Note: The Prometheus Message Decoder feature is available in StarTree Platform releases after 0.9.0.

Prometheus Message Decoder

Prometheus is a widely-used open-source monitoring and alerting toolkit that collects and stores time series data from various sources.

The Prometheus Message Decoder in Pinot provides native support for ingesting Prometheus-formatted metrics data into pinot tables.

Prometheus Message Format

<metric_name>{<label1>=<value1>,<label2>=<value2>,...} <metric_value> <timestamp>

This format encapsulates the four main components of our schema:

<metric_name> corresponds to the metric field (STRING)
{<label1>=<value1>,<label2>=<value2>,...} represents the labels field (JSON)
<metric_value> maps to the value field (DOUBLE)
<timestamp> corresponds to the ts field (TIMESTAMP)

Table schema

Column	Type	Field type	Description
metric	STRING	Dimension	The name of the Prometheus metric
labels	JSON	Dimension	A JSON object containing key-value pairs of Prometheus labels
value	DOUBLE	Metric	The numeric value of the metric
ts	TIMESTAMP	Date-Time	The timestamp when the metric was recorded, in milliseconds since the Unix epoch

The schema created for ingesting Prometheus metrics MUST adhere to the format specified above, particularly with respect to the field names and data types.
If you require different field names in your Pinot table, you can add transform functions to alias the names.
- However, the incoming data must match this specified schema for the Prometheus Message Decoder to function correctly.

Schema Configuration

{
  "schemaName": "events",
  "dimensionFieldSpecs": [
    {
      "name": "metric",
      "dataType": "STRING"
    },
    {
      "name": "labels",
      "dataType": "JSON"
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "value",
      "dataType": "DOUBLE"
    }
  ],
"dateTimeFieldSpecs": [{
    "name": "ts",
    "dataType": "TIMESTAMP",
    "format" : "1:MILLISECONDS:EPOCH",
    "granularity": "1:MILLISECONDS"
  }]
}

Sample Table Stream Configuration

When ingesting a Prometheus Message formatted payload from a stream, the decoder used for the stream must be ai.startree.pinot.plugin.inputformat.prometheus.PrometheusMessageDecoder

{
  "tableName": "events",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "ts",
    "schemaName": "events",
    "replicasPerPartition": "1"
  },
  "tenants": {},
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "lowlevel",
      "stream.kafka.topic.name": "events",
      "stream.kafka.decoder.class.name": "ai.startree.pinot.plugin.inputformat.prometheus.PrometheusMessageDecoder",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
      "stream.kafka.broker.list": "localhost:9092",
      "realtime.segment.flush.threshold.rows": "0",
      "realtime.segment.flush.threshold.time": "24h",
      "realtime.segment.flush.threshold.segment.size": "50M",
      "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
    }
  }
}

In the above sample, the Kafka consumer factory used is org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory and the decoder associated with this stream is ai.startree.pinot.plugin.inputformat.prometheus.PrometheusMessageDecoder".

Dynamodb Message Decoder Query data