Release Version 0.7.0: August 2023
Apache Pinot updates since last StarTree release
For details on Pinot changes, see Releases.
- Delete records in upsert-enabled tables
- Pipeline breaker and dynamic broadcast support for semi-joins
- Null value support in aggregate and filters
- Order null values
- Support ranking window functions for v2 engine:
ROW_NUMBERwindow function support
- Multistage query updates:
- Table-level access validation
- Queries per second (QPS) quota check for all tables
- Phase timers for
The following updates apply to both bring-your-own-cloud (BYOC) and SaaS deployments.
- Configure tags to add to all StarTree-managed AWS resources, which can be useful for auditing and compliance initiatives.
- StarTree's Pinot deep store S3 bucket has additional default protections. The bucket policy now actively denies any non-SSL traffic.
- All new and existing clusters will be upgraded to use Kubernetes version 1.24.
- StarTree-provisioned Google Cloud Platorm (GCP) clusters are now VPC-native. Specify subnet CIDRs for nodes, pods, and services.
- We've added automated certification renewal to TLS certiticates to improve the architecture's stability and reliability.
- In a disaster, recover a workspace in from a region failure by recovering the StarTree cluster state.
- Administrators can release individual components, like Pinot or Data Manager, without requiring a full release.
StarTree extensions for Apache Pinot
- Delete selected records from a Pinot table to meet GDPR requirements. See Segment purge for details.
- Pinot server restart time is optimized for large upsert-enabled tables.
- Tiered storage:
- Tiered storage supports sparse indexes, leading to performance improvements for queries running against high cardinality columns. For details, see [Use sparse index](https://dev.startree.ai/docs/procedures/set-up-tiered-storage/use-sparse-index)
- Google Cloud Services (GCS) is available as a cloud tier.
- Stale segments are automatically cleaned up from the object tier.
- Trace remote reads with newly available statistics.
- Overwrite index configurations for the object tier.
- Push data into Pinot to create a table with the write API. Send data from any application to a REST API endpoint.
- Data Manager now masks connection credentials to improve security.
- We've optimized minion segment creation to avoid out-of-memory (OOM) and out-of-disk (OOD) issues during ingestion, increasing batch ingestion reliability.
- The segment refresh task is more stable and reliabile when users change table configuration of an existing table.
- Integrate with Delta Lake and Databricks Lakehouse to ingest data and keep it in sync in Pinot. For details, see Delta Lake connector.
- Configure batch ingestion to be performed atomically (either all or none of the data is ingested), preventing inconsistency in query results. For details, see Sync mode with atomic switch.
- Continuously backfill data from BigQuery with each periodic sync.
- Connect to Amazon Web Services (AWS) Kinesis and S3 using an AWS cross-account IAM role. For details, see AWS Kinesis connector and AWS S3 connector.
- Integrate with Confluent Cloud. For details, see Confuent Cloud connector.
- Backfill data from SQL sources on an ad hoc basis.
- We've simplified the workflow for creating and configuring new alerts.
- ThirdEye alerts support data mutability, meaning they can handle recent changes in underlying data. For details, see Streaming upsert use case.
- Configure subscription groups on a per-dimension basis within an alert to alert specific teams to for anomalies in different slices of the underlying dataset. For details, see Create a subscription group.
- Filter and sort dimensions by name and latest anomaly from the UI. We've also improve performance for dimension exploration.
- Anomaly detector algorithm updates:
- Default alert templates are now automatically refreshed on new releases instead of requiring an API request.