Skip to main content

Release Version 0.5.0: Jun-Aug 2022

Apache Pinot

Enhanced null value support

Users can now push segments to a real-time table, thus simplifying onboarding when ingesting from a hybrid source (realtime+offline). This makes it very easy to bootstrap or backfill a real-time table. For more details, see the GitHub issue.

Deduplication support in real-time Pinot table

Added proper null value support in all Pinot layers for raw and dictionary encoded single value columns, for all data types. For more information, see the Github issue documentation.

Upsert Improvements

Added the following enhancements:

  • Prevent data corruption when getting out-of-order messages. See PR #9159
  • Improve performance when offloading segments. See PR #8674 and PR #9132

Multi-stage query engine enhancements

Made further progress in implementing full JOIN support as well as supporting existing query constructs with multi-stage engine, including:

  • Distributed shuffle joins
  • Group by
  • Sub-query
  • Complex queries using a mix of the above

See GitHub issue #8260

For more details please see V2 Multi-Stage Query Engine documentation.

tip

Requires separate configuration to enable and is in beta release Feature support is limited to those listed above No concurrent multi-stage query support

API for segment reload status

Added a new controller API for checking status of segment reload operation. More details in this PR and docs.

Helix version upgrade to 1.0.4

Upgraded Helix version to 1.0.4 in Pinot.

Query cancellation support

Added support for cancelling a submitted query by its ID. This is useful to cancel long running queries in order to conserve resources on the Pinot cluster. This is controlled by the following config:

  • pinot.server.enable.query.cancellation, false by default
  • pinot.broker.enable.query.cancellation, false by default

More details in this PR or read about it in the docs.

Declarative support for query options

Added support for declaratively specifying query options in the SQL query string. Example:

SET skipUpsert = 'false';
SELECT count(*) FROM tbl

More details in this PR and docs.

Pause support for realtime tables

Added support to pause / resume ingestion for a real-time Pinot table.

More details in this PR.

Bloom filter for no dictionary columns

Added support for configuring bloom filter for no dictionary single valued columns in Pinot table config.

Protobuf decoder

Added Protobuf decoder for realtime ingestion with file or schema registry based descriptor support.

More details in this wiki.

Kafka ingestion from specific period/timestamp

Ability to begin consumption from a specific period (2d ago, 12h ago) or timestamp (in format 2022-08-09T12:31:38.222Z) string.

More details in PR and docs.

Minion UI in the Pinot Admin UI

Added Minion tab in Pinot UI, to show more details about task queues, tasks and sub tasks, along with exceptions.

More details in PR.

Hide Query Console from Pinot UI

Added a cluster config to hide Query Console from Pinot UI. Cluster config to disable can be found here.

StarTree Extensions for Apache Pinot

info

Only available in StarTree Cloud service

Tier storage enhancements

Following enhancements were made to StarTree’s cloud tiered storage feature:

Performance

  1. Async S3 client for supporting higher fetch volume and parallelism

Observability

  1. API to check tier status and metrics

Ease of use

  1. Ability to use same bucket as deepstore with custom path
  2. Ability to use same tenant for local and tier

Pubsub Correctness and operational improvements

Following enhancements were made to the PubSub connector

  1. Auto delete old snapshots to reduce cost
  2. Handling message re-delivery for better correctness

StarTree Cloud - includes BYOC (Bring Your Own Cloud) and SaaS

Self serve Portal

Launched a self serve portal for creating a StarTree workspace (BYOC and SaaS) without any involvement from StarTree personnel. You can find the portal here: https://startree.cloud/.

The corresponding documentation can be found here: https://dev.startree.ai/docs/startree-enterprise-edition/startree-cloud/getting-started/

Authentication Service

Created a new unified authentication mechanism across platform and services within a StarTree workspace. This new model makes it safer and easier for all components to implement authentication.

Improved Pinot Token generation workflow

Created a new, simpler workflow for generating tokens required for securely accessing a Pinot cluster. The earlier workflow required a cluster restart (transparent to user).

Support for custom storage profile

Added support for overriding the pre-configured storage spec (based on cluster size) with a custom spec. This is useful to customize Persistent Volume spec for different components like Pinot server, minion, Zookeeper and so on.

Improved monitoring and alerting for StarTree Cloud components

Added new metrics and corresponding alerts for monitoring critical things such as Zookeeper health and I/O profile.

SSL/TLS

Announcing GA for encrypted network access to StarTree workspace via SSL/TLS

Dataset Manager: Self Service Ingestion tool

BigQuery connector

Users can now self-serve data ingestion from BigQuery using Data Manager no-code experience with few clicks. See documentation.

Dataset ingestion status

Users can now monitor the data ingestion status and view ingestion logs after submitting the ingestion jobs. This will help users to debug issues and fix them as needed.

See documentation.

Kinesis connector

Users can now self-serve data ingestion from AWS Kinesis using Data Manager no-code experience with few clicks.

Please see this doc for more details on the connector itself.

Automated segment name generator support for time columns with simple date format

Users can now self-serve data ingestion from AWS Kinesis using Data Manager no-code experience with few clicks.

Please see this doc for more details on the connector itself.

Automated segment name generator support for time columns with simple date format

The segment name generator type is automatically updated for time columns in simple date format. Users no longer need to configure this manually.

Support enableSync for offline batch ingestion

Now Data Manager supports enableSync for offline batch ingestion to refresh the data in Pinot whenever needed or at certain schedule.

ThirdEye: Anomaly Detection and Root Cause Analysis Tool

StarTree ThirdEye Community Edition

Users can now explore and learn or evaluate before purchasing for free using StarTree Community Edition. Useful links:

Alert creation (Derived/transformed metrics support)

Users can now create “derived or transformed metrics” during alert creation. See documentation.

Alert creation (Improvements to Timeout and Timezone support)

During alert creation:

  • Users no longer need to update the time column format and the time column name during alert creation. It will be automatically derived based on Pinot configurations. Users can always overwrite it if needed.
  • Users of StarTree ThirdEye now have an increased timeout window and can use long-running aggregations during anomaly detection.

Holiday pattern recognization

Users can now identify holiday patterns during anomaly detection to improve detection accuracy. (See Event fetcher and StarTree ThirdEye template)

Slack notification and subscription to slack channels

Users can now subscribe to slack channels using StarTree ThirdEye subscription user interface to send anomaly notifications to the slack channels. See documentation

Authentication on StarTree ThirdEye APIs

Users can now use a generated token to get secure access to StarTree ThirdEye APIs. These tokens have an extended life so users can use these tokens for automation such as bulk creation of alerts or ingestion of “Events” to StarTree ThirdEye using external scripts. See documentation.

Auto onboard datasets

Users can now auto onboard all datasets from “Pinot” to ThirdEye with one click during data source creation. See documentation.

Bulk delete anomalies

Users can now bulk delete anomalies from the anomalies list view.

Duplicate and reset alerts

Users can now create “Duplicate and Reset Alerts” using StarTree ThirdEye from the Alerts detail page or Alerts list view page.

Create alert (No Code)

Users can now create alerts and detect anomalies in a few seconds without writing a single line of code. See documentation.

RCA events

Now users can perform root-cause analysis in ThirdEye using a self-serve events UI. Using this users can identify events that caused the anomalous events. Users can upload any type of event (ex: Holidays, Jira events etc using ThirdEye APIs (link))

Example of events that helps with “Root-cause analysis” of an anomalous event in a KPI are:

  • Custom events (Ex: A/B test-related specific events)
  • Public events (Ex: holiday or political event or competition etc)
  • Internal events (Ex: Feature ramp, software deployment, metric definition changes etc)

Anomaly filtering

Now users can filter anomalies by alerts/dataset/metrics, etc. in the anomaly list/report view