Try StarTree Cloud: 30-day free trial
Off-heap upsert

Off-Heap upserts

Off-heap upserts scale easily on each server while using a fraction of the memory of on-heap upserts. Perform upserts on large datasets at a fraction of the cost.

Because on-heap upserts use an in-memory map on each server to store metadata for every unique key in the table, in heap memory usage quickly increase as the number of primary keys increase in the table. Also, persisting the state Another limitation of keeping the upsert state in memory is that the system must recreate the state every time servers restart.

With off-hep upserts, the upsert state is now managed on disk. Because the system uses a cache on top of the disk storage, all recent reads go to memory, leading to a negligible slowdown in ingestion speed. Only the latest writes are appended to the memory store, and flushed to disk later, improving the write speed.

The query path remains unchanged, so you should still get the same p99 latencies. Latencies may improve with reduced pressure on the on-heap memory.

The off-heap upsert implementation supports any data store as the state backend.

Usage

Off-heap upsert supports two deployment modes:

By default, we recommend relying on the server-level store and only using the table-level store in case of extremely rare performance bottlenecks.

  • Server-level store: Here each server has a single backend and all tables update their state to this backend. Efficiency improves due to less overall CPU and memory utilization of the backend.

  • Table-level store: Here each table has a separate backend, which provides better performance for the table.

Enable offheap in a new table

There are two deployment modes: server-level store and table-level store.

Use server-level store

Under development to make this configurable. Currently, the server-level store requires a change in server configuration.

  • To enable the off-heap store in StarTree Cloud, raise a request in the Support portal. The ticket should contain the approximate count of distinct primary keys present in all of the tables where off-heap upsert is enabled. Just an approximate global count is required and not a granular table-level count.

  • Once the off-heap store is enabled in your deployment, you can start using it in your tables.

For a new table, add the following configuration in the table config (opens in a new tab):

    "upsertConfig" : {
        "mode": "FULL",
        "hashFunction": "NONE",
        "enableSnapshot": true,
        "metadataManagerClass": "ai.startree.pinot.upsert.rocksdb.RocksDBTableUpsertMetadataManager"
    }

Partial upserts are also supported. The configuration above is just an example. You can add enableSnapshot and metadataManagerClass to your existing upsert configuration.

Use table-level store

Not recommended. Use only if you experience major ingestion lag because of off-heap upsert. Leads to higher CPU and memory usage, but does provides better performance.

For a new table, add the following configuration in the table config (opens in a new tab):

    "upsertConfig" : {
        "mode": "FULL",
        "hashFunction": "NONE",
        "enableSnapshot": true,
        "metadataManagerClass": "ai.startree.pinot.upsert.rocksdb.RocksDBTableUpsertMetadataManager",
        "metadataManagerConfigs": {
            "useIsolatedStore": "true"
        }
    }

(Optional) Tune the backend as needed for performance. For example, to change the cache size to 2GB (default is 1GB), include the following:

  "metadataManagerConfigs": {
    "rocksdb.blockcache.size_bytes": "2147483648",
}

Partial upserts are also supported. The configuration above is just an example. To do so, add enableSnapshot, metadataManagerClass, and metadataManagerConfigs to your existing upsert configuration.

Enable off-heap in an existing table

The steps to enable in an existing table are the same as for a new table. However, you must restart the server to transfer the older metadata to the new backend.

In your support ticket, mention the existing table names to enable off-heap upserts for.