Release version 0.10.0: November 2024
Apache Pinot updates since the last StarTree release
For details on Pinot changes, see Releases (opens in a new tab).
- Enhanced support for backfilling data in an upsert enabled table. Users can now upload externally partitioned segments to an upsert-enabled table. [link] (opens in a new tab)
- Added several improvements in UI load times to improve user experience, especially in environments where there are a large number of tables and/or very large tables. [link] (opens in a new tab)
- Added support for OOM protection in multi-stage query engine. This feature will prevent servers from crashing by killing expensive queries. This ensures that other applications are not disrupted. [link] (opens in a new tab) [link] (opens in a new tab)
- Improved scalability by allowing multiple segments to be uploaded together instead of one by one. This is especially useful in cases where a large number of segments are added during initial ingestion or data backfill. [link] (opens in a new tab) [link] (opens in a new tab)
- Improved observability and stability by providing an API for checking the segment state. Users can now determine if some segments need to be reloaded to ensure all replicas are in the correct state. [link] (opens in a new tab)
- Introduced query rate limiter at the database level, which will apply to all the tables in the database at the aggregate level. [link] (opens in a new tab) [link] (opens in a new tab)
- Added MAP type support with string keys and typed values along with the MapItem function, which can extract map values using a key. The support for map type is also added to the Pinot UI. [link] (opens in a new tab) [link] (opens in a new tab)
- Added implementations for comparison (=, !=, >, >=, <, <=, BETWEEN) and binary arithmetic scalar functions for multi-stage query engine. This resolves issues like string comparison failure due to the lack of polymorphism support and incorrect result types for numeric arithmetic. [link] (opens in a new tab)
- Added parameters to support aggregation functions like DISTINCTCOUNTHLL (log2m) in startree index. [link] (opens in a new tab) [link] (opens in a new tab)
- Added Lookup Join strategy as a hint to improve performance when the right table in the JOIN is a Dimension table. [link] (opens in a new tab) [link] (opens in a new tab)
- Introduced a more detailed query execution plan that also provides detailed information about the physical operators being used in the multi-stage query engine. [link] (opens in a new tab) [link] (opens in a new tab)
StarTree Cloud
StarTree extensions for Apache Pinot
- Added flexibility for users to provide the number of retries in case of failures when atomic sync is configured. This allows users to also upgrade their StarTree Cloud environment while data is being ingested.
- Improved performance while ingesting data from Delta Lake or using SegmentImportTask by changing the default value of parameter "push.mode" to "metadata".
- Added several improvements to Delta Lake 3.0 connector to support Delta Protocol Reader version 3 and Writer version 7.
- Added support for ingesting data from DynamoDB CDC streams using the DynamoDB message decoder. [link] (opens in a new tab)
- Added native support for ingesting Prometheus-formatted metrics data into tables in StarTree Cloud. Users can now leverage the price/performance of StarTree Cloud for their metrics solution built on Prometheus. [link] (opens in a new tab)
- Added the ability to merge smaller segments into large segments to improve performance in an upsert enabled table leveraging SegmentRefreshTask. [link] (opens in a new tab)
- Added TTL for metadata and deleted keys for upsert-enabled tables using Offheap upsert. This will improve scalability and manageability by reducing the size of managed keys and metadata.
- Added data consistency guarantees when running queries while upserts are being processed. Sometimes the result set would not be consistent without this guarantee. [link] (opens in a new tab)
- Improved the server restart time, when needed, by preloading a snapshot of primary keys in an upsert enabled table. In absence of this feature, the primary keys will be built during the server restart, resulting in long restart times. [link] (opens in a new tab)
- Improved scalability and reliability for Dedup by moving the metadata from on-heap implementation to off-heap implementation, similar to off-heap upsert.
- Added several health checks to ensure tables in StarTree Cloud are always optimized for best performance. The list of health checks includes a check to ensure no table in production is running with a single replica of data. [link] (opens in a new tab)
Data Manager
- Added the ability for users to modify the schema and table configuration even after a table has been created, enabling greater flexibility. Users can optimize their table for better performance using Data Manager.
- Added enhanced validation to ensure accurate field type and data type configurations during table creation, reducing errors and improving data integrity.
ThirdEye
- ThirdEye is now available in the StarTree Cloud Free Tier (opens in a new tab).
- Improved onboarding with the new alert creation flow. Creating alerts is now simpler and faster. Creating dimension exploration alerts is now possible in no-code.
- Added new Impact dashboard. This dashboard provides managers and alert owners a clear and intuitive understanding of the health and performance of all monitored metrics.