Try StarTree Cloud: 30-day free trial

Release Version 0.7.0: August 2023

Apache Pinot updates since last StarTree release

For details on Pinot changes, see Releases (opens in a new tab).

  • Delete records in upsert-enabled tables
  • Pipeline breaker and dynamic broadcast support for semi-joins
  • Null value support in aggregate and filters
  • Order null values
  • Support UNION, INTERSECT, and EXCEPT operators
  • Support ranking window functions for v2 engine:
    • ROW_NUMBER window function support
    • RANK
    • DENSE_RANK
  • Multistage query updates:
    • Table-level access validation
    • Queries per second (QPS) quota check for all tables
    • Phase timers for REQUEST_COMPILATION, AUTHORIZATION, and QUERY_EXECUTION

StarTree Cloud

The following updates apply to both bring-your-own-cloud (BYOC) and SaaS deployments.

  • Configure tags to add to all StarTree-managed AWS resources, which can be useful for auditing and compliance initiatives.
  • StarTree's Pinot deep store S3 bucket has additional default protections. The bucket policy now actively denies any non-SSL traffic.
  • All new and existing clusters will be upgraded to use Kubernetes version 1.24.
  • StarTree-provisioned Google Cloud Platorm (GCP) clusters are now VPC-native. Specify subnet CIDRs for nodes, pods, and services.
  • We've added automated certification renewal to TLS certiticates to improve the architecture's stability and reliability.
  • In a disaster, recover a workspace in from a region failure by recovering the StarTree cluster state.
  • Administrators can release individual components, like Pinot or Data Manager, without requiring a full release.

StarTree extensions for Apache Pinot

  • Delete selected records from a Pinot table to meet GDPR requirements. See Segment purge for details.

Beta extensions

  • BETA Pinot server restart time is optimized for large upsert-enabled tables.
  • BETA Tiered storage:
    • Tiered storage supports sparse indexes, leading to performance improvements for queries running against high cardinality columns. For details, see Use sparse index
    • Google Cloud Services (GCS) is available as a cloud tier.
    • Stale segments are automatically cleaned up from the object tier.
    • Trace remote reads with newly available statistics.
    • Overwrite index configurations for the object tier.

Alpha extensions

  • ALPHA Push data into Pinot to create a table with the write API. Send data from any application to a REST API endpoint.

Data Manager

  • Data Manager now masks connection credentials to improve security.
  • We've optimized minion segment creation to avoid out-of-memory (OOM) and out-of-disk (OOD) issues during ingestion, increasing batch ingestion reliability.
  • The segment refresh task is more stable and reliabile when users change table configuration of an existing table.
  • Integrate with Delta Lake and Databricks Lakehouse to ingest data and keep it in sync in Pinot. For details, see Delta Lake connector.
  • Configure batch ingestion to be performed atomically (either all or none of the data is ingested), preventing inconsistency in query results. For details, see Sync mode with atomic switch.
  • Continuously backfill data from BigQuery with each periodic sync.
  • Connect to Amazon Web Services (AWS) Kinesis and S3 using an AWS cross-account IAM role. For details, see AWS Kinesis connector and AWS S3 connector.
  • Integrate with Confluent Cloud. For details, see Confuent Cloud connector.
  • BETA Backfill data from SQL sources on an ad hoc basis.

ThirdEye

  • We've simplified the workflow for creating and configuring new alerts.
  • ThirdEye alerts support data mutability, meaning they can handle recent changes in underlying data. For details, see Streaming upsert use case.
  • Configure subscription groups on a per-dimension basis within an alert to alert specific teams to for anomalies in different slices of the underlying dataset. For details, see Create a subscription group.
  • Filter and sort dimensions by name and latest anomaly from the UI. We've also improve performance for dimension exploration.
  • Anomaly detector algorithm updates:
  • Default alert templates are now automatically refreshed on new releases instead of requiring an API request.