Skip to main content

presto-connection-proxy

Motivation

In a StarTree deployment, Pinot components are hosted within a Kubernetes cluster. As part of this setup - the Pinot server endpoints are not exposed outside the Kubernetes cluster. While this is fine for most use cases, it creates a problem for external services like Presto/Trino.

Pinot Proxy is our solution to enable PrestoDB/Trino connection into the Pinot cluster when Pinot is inside Kubernetes cluster. Without Pinot Proxy we will have to create dedicated Kubernetes routes for each single Pinot Server host, and such a solution is not scalable for Kubernetes.

High Level Architecture

At a very high level, Pinot Proxy behaves very similarly to a reverse proxy like nginx. It proxies RPC requests into Pinot clusters inside Kubernetes.

The special part of Pinot Proxy is that it is capable of forwarding PrestoDB/Trino gRPC requests into specific Pinot Server instead of routing the gRPC requests by load balancer algorithms.

Below is the diagram of its routing when talking to PrestoDB Pinot drivers.

There are detailed slides explaining the technical aspect of this design.

What does it mean for Pinot Clients

The main benefit is that with Pinot Proxy, PrestoDB/Trino are finally able to run queries in streaming mode ( pinot.use-streaming-for-segment-queries=true in pinot.properties file, doc link) when connecting to Pinot inside a Kubernetes cluster.

Connection Settings for PrestoDB/Trino

First of all, PrestoDB has to be 0.273 or higher to be able to connect to Pinot in Kubernetes properly.

To connect via Pinot Proxy, the PrestoDB connection URLs need to be changed to point to Pinot Proxy host names. Suppose your pinot cluster is named pug, the environment name is prod, and StarTree cloud domain is awesome-company.startree.cloud, Below are the configs needed in pinot.properties config in PrestoDB

## Please replace the pug.prod.awesome-company.startree.cloud with the link to your pinot cluster
## For the clusters without TLS enabled, the port number will be 80
pinot.controller-urls=proxy.broker.pug.prod.awesome-company.startree.cloud:443

pinot.proxy-enabled=true
# Enable Query Pinot streaming server for segment queries via gRPC
pinot.use-streaming-for-segment-queries=true

## Please replace the pug.prod.awesome-company.startree.cloud with the link to your pinot cluster
pinot.grpc-host=proxy-grpc.broker.pug.prod.awesome-company.startree.cloud
## For the clusters without TLS enabled, the port number will be 80
pinot.grpc-port=443

Extra settings for clusters with TLS enabled

If the cluster has TLS enabled, we will need the extra properties in the config

## Enable secure connection for all traffic
pinot.secure-connection=true

#### Extra settings for clusters with "Secure" flag enabled (Auth enabled)
## The bearer token will have to be retrieved manually and long-lived
pinot.extra-http-headers=Authorization: Bearer eyJhbGciOiJSUz(truncated)"
pinot.extra-grpc-metadata=Authorization: Bearer eyJhbGciOiJSUz(truncated)"