Release Version 0.5.0: Jun-Aug 2022
Apache Pinot
Enhanced null value support
Added proper null value support in all Pinot layers for raw and dictionary encoded single value columns, for all data types. For more information, see the Github issue (opens in a new tab) documentation.
Deduplication support in real-time Pinot table
Users can now push segments to a real-time table, thus simplifying onboarding when ingesting from a hybrid source (real-time and offline). This makes it very easy to bootstrap or backfill a real-time table. For more details, see the GitHub issue (opens in a new tab).
Upsert Improvements
Added the following enhancements:
- Prevent data corruption when getting out-of-order messages. See PR #9159 (opens in a new tab)
- Improve performance when offloading segments. See PR #8674 (opens in a new tab) and PR #9132 (opens in a new tab)
Multi-stage query engine enhancements
Made further progress in implementing full JOIN support as well as supporting existing query constructs with multi-stage engine, including:
- Distributed shuffle joins
- Group by
- Sub-query
- Complex queries using a mix of the above
See GitHub issue #8260 (opens in a new tab)
For more details, see V2 Multi-Stage Query Engine documentation (opens in a new tab).
Requires separate configuration (opens in a new tab) to enable and is in beta release Feature support is limited to those listed above No concurrent multi-stage query support
API for segment reload status
Added a new controller API for checking status of segment reload operation. More details in this PR (opens in a new tab) and docs (opens in a new tab).
Helix version upgrade to 1.0.4
Upgraded Helix version to 1.0.4 in Pinot.
Query cancellation support
Added support for cancelling a submitted query by its ID. This is useful to cancel long running queries in order to conserve resources on the Pinot cluster. This is controlled by the following config:
- pinot.server.enable.query.cancellation, false by default
- pinot.broker.enable.query.cancellation, false by default
More details in this PR (opens in a new tab) or read about it in the docs (opens in a new tab).
Declarative support for query options
Added support for declaratively specifying query options in the SQL query string. Example:
SET skipUpsert = 'false';
SELECT count(*) FROM tbl
More details in this PR (opens in a new tab) and docs (opens in a new tab).
Pause support for real-time tables
Added support to pause / resume ingestion for a real-time Pinot table.
More details in this PR (opens in a new tab).
Bloom filter for no dictionary columns
Added support for configuring bloom filter for no dictionary single valued columns in Pinot table config.
Protobuf decoder
Added Protobuf decoder for realtime ingestion with file or schema registry based descriptor support.
More details in this wiki (opens in a new tab).
Kafka ingestion from specific period/timestamp
Ability to begin consumption from a specific period (2d ago, 12h ago) or timestamp (in format 2022-08-09T12:31:38.222Z) string.
More details in PR (opens in a new tab) and docs (opens in a new tab).
Minion UI in the Pinot Admin UI
Added Minion tab in Pinot UI, to show more details about task queues, tasks and sub tasks, along with exceptions.
More details in PR (opens in a new tab).
Hide Query Console from Pinot UI
Added a cluster config to hide Query Console from Pinot UI. Cluster config to disable can be found here (opens in a new tab).
StarTree Extensions for Apache Pinot
Available only in StarTree Cloud
Tier storage enhancements BETA
Following enhancements were made to StarTree’s cloud tiered storage feature:
Performance
- Async S3 client for supporting higher fetch volume and parallelism
Observability
- API to check tier status and metrics
Ease of use
- Ability to use same bucket as deepstore with custom path
- Ability to use same tenant for local and tier
Pubsub Correctness and operational improvements ALPHA
Following enhancements were made to the PubSub connector
- Auto delete old snapshots to reduce cost
- Handling message re-delivery for better correctness
StarTree Cloud - includes BYOC (Bring Your Own Cloud) and SaaS
Self serve Portal BETA
Launched a self serve portal for creating a StarTree workspace (BYOC and SaaS) without any involvement from StarTree personnel. You can find the portal here: https://startree.cloud/ (opens in a new tab).
The corresponding documentation can be found here: https://dev.startree.ai/docs/startree-enterprise-edition/startree-cloud/getting-started/ (opens in a new tab)
Authentication Service BETA
Created a new unified authentication mechanism across platform and services within a StarTree workspace. This new model makes it safer and easier for all components to implement authentication.
Improved Pinot Token generation workflow ALPHA
Created a new, simpler workflow for generating tokens required for securely accessing a Pinot cluster. The earlier workflow required a cluster restart (transparent to user).
Support for custom storage profile ALPHA
Added support for overriding the pre-configured storage spec (based on cluster size) with a custom spec. This is useful to customize Persistent Volume spec for different components like Pinot server, minion, Zookeeper and so on.
Improved monitoring and alerting for StarTree Cloud components ALPHA
Added new metrics and corresponding alerts for monitoring critical things such as Zookeeper health and I/O profile.
SSL/TLS GA
Announcing GA for encrypted network access to StarTree workspace via SSL/TLS
Data Manager: Self-Service Ingestion tool
BigQuery connector BETA
Users can now self-serve data ingestion from BigQuery using Data Manager no-code experience with few clicks. See documentation (opens in a new tab).
Dataset ingestion status BETA
Users can now monitor the data ingestion status and view ingestion logs after submitting the ingestion jobs. This will help users to debug issues and fix them as needed.
See documentation (opens in a new tab).
Kinesis connector BETA
Users can now self-serve data ingestion from AWS Kinesis using Data Manager no-code experience with few clicks.
See this doc (opens in a new tab) for more details on the connector itself.
Automated segment name generator support for time columns with simple date format GA
Users can now self-serve data ingestion from AWS Kinesis using Data Manager no-code experience with few clicks.
See this doc (opens in a new tab) for more details on the connector itself.
Automated segment name generator support for time columns with simple date format GA
The segment name generator type is automatically updated for time columns in simple date format. Users no longer need to configure this manually.
Support enableSync for offline batch ingestion GA
Now Data Manager supports enableSync for offline batch ingestion to refresh the data in Pinot whenever needed or at certain schedule.
ThirdEye: Anomaly Detection and Root Cause Analysis Tool
StarTree ThirdEye Community Edition GA
Users can now explore and learn or evaluate before purchasing for free using StarTree Community Edition. Useful links:
- Blog to learn more (opens in a new tab)
- Quick start guide (opens in a new tab)
- StarTree Community Channel (opens in a new tab)
- StarTree ThirdEye Community vs Enterprise edition features (opens in a new tab)
Alert creation (Derived/transformed metrics support) GA
Users can now create “derived or transformed metrics” during alert creation. See documentation (opens in a new tab).
Alert creation (Improvements to Timeout and Timezone support) GA
During alert creation:
- Users no longer need to update the time column format and the time column name during alert creation. It will be automatically derived based on Pinot configurations. Users can always overwrite it if needed.
- Users of StarTree ThirdEye now have an increased timeout window and can use long-running aggregations during anomaly detection.
Holiday pattern recognization GA
Users can now identify holiday patterns during anomaly detection to improve detection accuracy. (See Event fetcher (opens in a new tab) and StarTree ThirdEye template (opens in a new tab))
Slack notification and subscription to slack channels GA
Users can now subscribe to slack channels using StarTree ThirdEye subscription user interface to send anomaly notifications to the slack channels. See documentation (opens in a new tab)
Authentication on StarTree ThirdEye APIs GA
Users can now use a generated token to get secure access to StarTree ThirdEye APIs. These tokens have an extended life so users can use these tokens for automation such as bulk creation of alerts or ingestion of “Events” to StarTree ThirdEye using external scripts. See documentation (opens in a new tab).
Auto onboard datasets GA
Users can now auto onboard all datasets from “Pinot” to ThirdEye with one click during data source creation. See documentation (opens in a new tab).
Bulk delete anomalies GA
Users can now bulk delete anomalies from the anomalies list view.
Duplicate and reset alerts GA
Users can now create “Duplicate and Reset Alerts” using StarTree ThirdEye from the Alerts detail page or Alerts list view page.
Create alert (No Code) BETA
Users can now create alerts and detect anomalies in a few seconds without writing a single line of code. See documentation (opens in a new tab).
RCA events BETA
Now users can perform root-cause analysis in ThirdEye using a self-serve events UI. Using this users can identify events that caused the anomalous events. Users can upload any type of event (ex: Holidays, Jira events etc using ThirdEye APIs (link))
Example of events that helps with “Root-cause analysis” of an anomalous event in a KPI are:
- Custom events (Ex: A/B test-related specific events)
- Public events (Ex: holiday or political event or competition etc)
- Internal events (Ex: Feature ramp, software deployment, metric definition changes etc)
Anomaly filtering BETA
Now users can filter anomalies by alerts/dataset/metrics, etc. in the anomaly list/report view