Skip to main content

Off-Heap upsert

Motivation

The current implementation of upsert relies on an in-memory map on each server to store metadata. This metadata is stored for every unique key in the table. This quickly leads to an increase in heap memory usage as the number of primary keys increase in the table.

Another limitation of keeping this state in memory is that we have to recreate it every time the servers need a restart.

Solution

We designed off-heap upsert to solve this particular problem. With the help of this feature, Upsert can now be scaled easily on each server while using a fraction of the memory.

The upsert state is now managed on disk. This however leads to a negligible slowdown in ingestion speed. The reason being we use a cache on top of the disk storage so all the recent reads should go to memory. We also only append the latest writes to the memory store which are flushed to disk later thus improving the write speeds as well.

The query path is not changed at all so users should still get the same p99 latencies. In fact, the latencies might be better due to reduced pressure on heap memory.

The off-heap upsert implementation follows the extensible plugin ecosystem principle of Apache Pinot. We have designed it to be able to support any data store as the state backend.

Usage

Off-heap upsert supports two deployment modes:

info

By default, users should rely on the server-level store and only use the table-level store in case of extremely rare performance bottlenecks.

  • Server-level store - Here each server has a single backend and all tables update their state to this backend. This is pretty efficient as it leads to less overall CPU and memory utilization of the backend.

  • Table-level store - Here each table has a separate backend. It provides better performance for that table.

Enable offheap in a new table

There are two deployment modes: server level store and table level store. Each has its own instructions below.

Use server-level store

The server-level store requires a change in server configs at the moment. StarTree is working to make this configurable by the user.

  • Raise a request via the Support portal to enable the off-heap store in StarTree. The ticket should contain approximate count of distinct primary keys that will be present in all the tables in which offheap-upsert will be enabled. Just an approximate global count is required and not a granular table-level count.

  • Once the off-heap store is enabled in your deployment, you can start using it in your tables.

For a new table, add the following configuration in the table config:

    "upsertConfig" : {
"mode": "FULL",
"hashFunction": "NONE",
"enableSnapshot": true,
"metadataManagerClass": "ai.startree.pinot.upsert.rocksdb.RocksDBTableUpsertMetadataManager"
}

Partial upserts are also supported. The configuration above is just an example. You can just add enableSnapshot and metadataManagerClass to your existing upsert configuration.

Use table-level store

Not recommended. Use only if you experience major ingestion lag because of off-heap upsert. Leads to higher CPU and memory usage, but does provides better performance.

For a new table, add the following configuration in the table config:

    "upsertConfig" : {
"mode": "FULL",
"hashFunction": "NONE",
"enableSnapshot": true,
"metadataManagerClass": "ai.startree.pinot.upsert.rocksdb.RocksDBTableUpsertMetadataManager",
"metadataManagerConfigs": {
"useIsolatedStore": "true"
}
}

(Optional) Tune the backend as needed for performance. For example, to change the cache size to 2GB (default is 1GB), include the following:

  "metadataManagerConfigs": {
"rocksdb.blockcache.size_bytes": "2147483648",
}

Partial upserts are also supported. The configuration above is just an example. To do so, add enableSnapshot, metadataManagerClass, and metadataManagerConfigs to your existing upsert configuration.

Enable offheap in an existing table

The steps to enable in an existing table are the same as for a new table. However, it requires a server restart to transfer the older metadata to this new backend.

In the same support ticket, You can also mention the existing table names in which the feature needs to be enabled.

Conclusion

With off-heap upsert, users should now be able to use Apache Pinot's powerful real-time upsert feature on even larger datasets at a fraction of the cost.