Try StarTree Cloud: 30-day free trial
Connecting via Python and Java

Connecting via Python and Java to StarTree Cloud

Python

Applications can use this Python client library to query Apache Pinot.

Prerequisite

Ensure you have the latest pinotdb library installed. If it is not installed, run the following command in your terminal to install the library.

pip install pinotdb

Usage

You can query Pinot Broker directly using the DB API

from pinotdb import connect

conn = connect(host='localhost', port=8099, path='/query/sql', scheme='http')
curs = conn.cursor()
curs.execute("""
    SELECT place,
           CAST(REGEXP_EXTRACT(place, '(.*),', 1) AS FLOAT) AS lat,
           CAST(REGEXP_EXTRACT(place, ',(.*)', 1) AS FLOAT) AS lon
      FROM places
     LIMIT 10
""")
for row in curs:
    print(row)

You could also use SQLAlchemy to query Pinot

The db engine connection string is formated like this: pinot://:?controller=://:/

from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *

engine = create_engine('pinot://localhost:8099/query/sql?controller=http://localhost:9000/')  # uses HTTP by default
# engine = create_engine('pinot+http://localhost:8099/query/sql?controller=http://localhost:9000/')
# engine = create_engine('pinot+https://localhost:8099/query/sql?controller=http://localhost:9000/')

places = Table('places', MetaData(bind=engine), autoload=True)
print(select([func.count('*')], from_obj=places).scalar())

Connecting to StarTree Cloud to query Pinot

from pinotdb import connect

conn = connect(host='broker.pinot.docwrkspc.docorg.startree-staging.cloud', port=443, path='/query/sql', username='xxxx', password='xxxx=',scheme='https')
curs = conn.cursor()
curs.execute("""
    select * from website limit 10
""")
for row in curs:
    print(row)

Examples with Pinot Quickstart

Clone the Pinot DB repository

git clone git@github.com:python-pinot-dbapi/pinot-dbapi.git
cd pinot-dbapi

Pinot Batch Quickstart

Run below command to start Pinot Batch Quickstart in docker and expose Pinot controller port 9000 and Pinot broker port 8000.

docker run \
  --name pinot-quickstart \
  -p 2123:2123 \
  -p 9000:9000 \
  -p 8000:8000 \
  apachepinot/pinot:latest QuickStart -type batch

Once pinot batch quickstart is up, you can run the sample code snippet to query Pinot:

python3 examples/pinot-quickstart-batch.py

Sample Output:

Sending SQL to Pinot: SELECT * FROM baseballStats LIMIT 5
[0, 11, 0, 0, 0, 0, 0, 0, 0, 0, 'NL', 11, 11, 'aardsda01', 'David Allan', 1, 0, 0, 0, 0, 0, 0, 'SFN', 0, 2004]
[2, 45, 0, 0, 0, 0, 0, 0, 0, 0, 'NL', 45, 43, 'aardsda01', 'David Allan', 1, 0, 0, 0, 1, 0, 0, 'CHN', 0, 2006]
[0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 'AL', 25, 2, 'aardsda01', 'David Allan', 1, 0, 0, 0, 0, 0, 0, 'CHA', 0, 2007]
[1, 5, 0, 0, 0, 0, 0, 0, 0, 0, 'AL', 47, 5, 'aardsda01', 'David Allan', 1, 0, 0, 0, 0, 0, 1, 'BOS', 0, 2008]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 'AL', 73, 3, 'aardsda01', 'David Allan', 1, 0, 0, 0, 0, 0, 0, 'SEA', 0, 2009]

Sending SQL to Pinot: SELECT playerName, sum(runs) FROM baseballStats WHERE yearID>=2000 GROUP BY playerName LIMIT 5
['Scott Michael', 26.0]
['Justin Morgan', 0.0]
['Jason Andre', 0.0]
['Jeffrey Ellis', 0.0]
['Maximiliano R.', 16.0]

Sending SQL to Pinot: SELECT playerName,sum(runs) AS sum_runs FROM baseballStats WHERE yearID>=2000 GROUP BY playerName ORDER BY sum_runs DESC LIMIT 5
['Adrian', 1820.0]
['Jose Antonio', 1692.0]
['Rafael', 1565.0]
['Brian Michael', 1500.0]
['Alexander Emmanuel', 1426.0]

Using parameters:

from pinotdb import connect

conn = connect(host='localhost', port=8000, path='/query/sql', scheme='http')
curs = conn.cursor()

curs.execute("""
    SELECT * 
    FROM baseballStats
    WHERE league IN (%(leagues)s)
    """, {"leagues": ["AA", "NL"]})
for row in curs:
    print(row)
    
curs.execute("""
    SELECT *
    FROM baseballStats
    WHERE baseOnBalls > (%(score)d)
    """, {"score": 0})
for row in curs:
    print(row)

Pinot Hybrid Quickstart

Run the command below to start Pinot Hybrid Quickstart in docker and expose Pinot controller port 9000 and Pinot broker port 8000.

docker run \
  --name pinot-quickstart \
  -p 2123:2123 \
  -p 9000:9000 \
  -p 8000:8000 \
  apachepinot/pinot:latest QuickStart -type hybrid

Below is an example to query against Pinot Quickstart Hybrid:

python3 examples/pinot-quickstart-hybrid.py
Sending SQL to Pinot: SELECT * FROM airlineStats LIMIT 5
[171, 153, 19393, 0, 8, 8, 1433, '1400-1459', 0, 1425, 1240, 165, 'null', 0, 'WN', -2147483648, 1, 27, 17540, 0, 2, 2, 1242, '1200-1259', 0, 'MDW', 13232, 1323202, 30977, 'Chicago, IL', 'IL', 17, 'Illinois', 41, 861, 4, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-27', 402, 1, -2147483648, -2147483648, 1, -2147483648, 'BOS', 10721, 1072102, 30721, 'Boston, MA', 'MA', 25, 'Massachusetts', 13, 1, ['null'], -2147483648, 'N556WN', 6, 12, -2147483648, 'WN', -2147483648, 1254, 1427, 2014]
[183, 141, 20398, 1, 17, 17, 1302, '1200-1259', 1, 1245, 1005, 160, 'null', 0, 'MQ', 0, 1, 27, 17540, 0, -6, 0, 959, '1000-1059', -1, 'CMH', 11066, 1106603, 31066, 'Columbus, OH', 'OH', 39, 'Ohio', 44, 990, 4, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-27', 3574, 1, 0, -2147483648, 1, 17, 'MIA', 13303, 1330303, 32467, 'Miami, FL', 'FL', 12, 'Florida', 33, 1, ['null'], 0, 'N605MQ', 13, 29, -2147483648, 'MQ', 0, 1028, 1249, 2014]
[-2147483648, -2147483648, 20304, -2147483648, -2147483648, -2147483648, -2147483648, '2100-2159', -2147483648, 2131, 2005, 146, 'null', 0, 'OO', -2147483648, 1, 27, 17541, 1, 52, 52, 2057, '2000-2059', 3, 'COS', 11109, 1110902, 30189, 'Colorado Springs, CO', 'CO', 8, 'Colorado', 82, 809, 4, -2147483648, [11292], 1, [1129202], ['DEN'], -2147483648, 73, [9], 0, ['null'], [9], [-2147483648], [2304], 1, -2147483648, '2014-01-27', 5554, 1, -2147483648, -2147483648, 1, -2147483648, 'IAH', 12266, 1226603, 31453, 'Houston, TX', 'TX', 48, 'Texas', 74, 1, ['SEA', 'PSC', 'PHX', 'MSY', 'ATL', 'TYS', 'DEN', 'CHS', 'PDX', 'LAX', 'EWR', 'SFO', 'PIT', 'RDU', 'RAP', 'LSE', 'SAN', 'SBN', 'IAH', 'OAK', 'BRO', 'JFK', 'SAT', 'ORD', 'ACY', 'DFW', 'BWI'], -2147483648, 'N795SK', -2147483648, 19, -2147483648, 'OO', -2147483648, 2116, -2147483648, 2014]
[153, 125, 20436, 1, 41, 41, 1442, '1400-1459', 2, 1401, 1035, 146, 'null', 0, 'F9', 2, 1, 27, 17541, 1, 34, 34, 1109, '1000-1059', 2, 'DEN', 11292, 1129202, 30325, 'Denver, CO', 'CO', 8, 'Colorado', 82, 967, 4, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-27', 658, 1, 8, -2147483648, 1, 31, 'SFO', 14771, 1477101, 32457, 'San Francisco, CA', 'CA', 6, 'California', 91, 1, ['null'], 0, 'N923FR', 11, 17, -2147483648, 'F9', 0, 1126, 1431, 2014]
[-2147483648, -2147483648, 20304, -2147483648, -2147483648, -2147483648, -2147483648, '1400-1459', -2147483648, 1432, 1314, 78, 'B', 1, 'OO', -2147483648, 1, 27, 17541, -2147483648, -2147483648, -2147483648, -2147483648, '1300-1359', -2147483648, 'EAU', 11471, 1147103, 31471, 'Eau Claire, WI', 'WI', 55, 'Wisconsin', 45, 268, 2, -2147483648, [-2147483648], 0, [-2147483648], ['null'], -2147483648, -2147483648, [-2147483648], -2147483648, ['null'], [-2147483648], [-2147483648], [-2147483648], 0, -2147483648, '2014-01-27', 5455, 1, -2147483648, -2147483648, 1, -2147483648, 'ORD', 13930, 1393003, 30977, 'Chicago, IL', 'IL', 17, 'Illinois', 41, 1, ['null'], -2147483648, 'N903SW', -2147483648, -2147483648, -2147483648, 'OO', -2147483648, -2147483648, -2147483648, 2014]

Sending SQL to Pinot: SELECT count(*) FROM airlineStats LIMIT 5
[17772]

Sending SQL to Pinot: SELECT AirlineID, sum(Cancelled) FROM airlineStats WHERE Year > 2010 GROUP BY AirlineID LIMIT 5
[20409, 40.0]
[19930, 16.0]
[19805, 60.0]
[19790, 115.0]
[20366, 172.0]

Sending SQL to Pinot: select OriginCityName, max(Flights) from airlineStats group by OriginCityName ORDER BY max(Flights) DESC LIMIT 5
['Casper, WY', 1.0]
['Deadhorse, AK', 1.0]
['Austin, TX', 1.0]
['Chicago, IL', 1.0]
['Monterey, CA', 1.0]

Sending SQL to Pinot: SELECT OriginCityName, sum(Cancelled) AS sum_cancelled FROM airlineStats WHERE Year>2010 GROUP BY OriginCityName ORDER BY sum_cancelled DESC LIMIT 5
['Chicago, IL', 178.0]
['Atlanta, GA', 111.0]
['New York, NY', 65.0]
['Houston, TX', 62.0]
['Denver, CO', 49.0]

Sending Count(*) SQL to Pinot
17773

Sending SQL: "SELECT OriginCityName, sum(Cancelled) AS sum_cancelled FROM "airlineStats" WHERE Year>2010 GROUP BY OriginCityName ORDER BY sum_cancelled DESC LIMIT 5" to Pinot
[('Chicago, IL', 178.0), ('Atlanta, GA', 111.0), ('New York, NY', 65.0), ('Houston, TX', 62.0), ('Denver, CO', 49.0)]

Java

Pinot provides a native java client to execute queries on the cluster. Querying data is made easier by the client, which is tenant-aware and can thus direct queries to the correct broker.

Prerequisite

You can use the client by installing and including Maven or Gradle.

For information on how to install Maven, visit Installing Apache Maven (opens in a new tab).

Example of how to including the Maven depencency to use the client:

<dependency>
    <groupId>org.apache.pinot</groupId>
    <artifactId>pinot-java-client</artifactId>
    <version>0.9.3</version>
</dependency>

For information on how to install Gradle, visit Gradle Build Tool Installation (opens in a new tab).

Example of how to including the Gradle depencency to use the client:

include 'org.apache.pinot:pinot-java-client:0.5.0'

Usage

Here's an example of how to use the pinot-java-client to query Pinot.

import org.apache.pinot.client.Connection;
import org.apache.pinot.client.ConnectionFactory;
import org.apache.pinot.client.Request;
import org.apache.pinot.client.ResultSetGroup;
import org.apache.pinot.client.ResultSet;

/**
 * Demonstrates the use of the pinot-client to query Pinot from Java
 */
public class PinotClientExample {

  public static void main(String[] args) {

    // pinot connection
    String zkUrl = "localhost:2181";
    String pinotClusterName = "PinotCluster";
    Connection pinotConnection = ConnectionFactory.fromZookeeper(zkUrl + "/" + pinotClusterName);

    String query = "SELECT COUNT(*) FROM myTable GROUP BY foo";

    // set queryType=sql for querying the sql endpoint
    Request pinotClientRequest = new Request("sql", query);
    ResultSetGroup pinotResultSetGroup = pinotConnection.execute(pinotClientRequest);
    ResultSet resultTableResultSet = pinotResultSetGroup.getResultSet(0);

    int numRows = resultTableResultSet.getRowCount();
    int numColumns = resultTableResultSet.getColumnCount();
    String columnValue = resultTableResultSet.getString(0, 1);
    String columnName = resultTableResultSet.getColumnName(1);

    System.out.println("ColumnName: " + columnName + ", ColumnValue: " + columnValue);
  }
}

Connection Factory

The client provides a ConnectionFactory class to create connections to a Pinot cluster. The factory supports the following methods to create a connection -

  • Zookeeper (Recommended) - Comma-separated list of Zookeeper of the cluster. This is the recommended method which can redirect queries to appropriate brokers based on tenant/table.
  • Broker list - Comma-separated list of the brokers in the cluster. This should only be used in standalone setups or for POC, unless you have a load balancer set up for brokers.
  • Properties file - You can also put the broker list as brokerList in a properties file and provide the path to that file to the factory. This should only be used in standalone setups or for POC, unless you have a load balancer set up for brokers.

Here's an example demonstrating all methods of Connection factory -

Connection connection = ConnectionFactory.fromZookeeper
  ("some-zookeeper-server:2191/zookeeperPath");
 
Connection connection = ConnectionFactory.fromProperties("demo.properties");
 
Connection connection = ConnectionFactory.fromHostList
  ("broker-1:1234", "broker-2:1234", ...);

Query Methods

You can run the query in both blocking as well as async manner. Use

  • Connection.execute(org.apache.pinot.client.Request) for blocking queries
  • Connection.executeAsync(org.apache.pinot.client.Request) for asynchronous queries that return a future object.
ResultSetGroup resultSetGroup = 
  connection.execute(new Request("sql", "select * from foo..."));
// OR
Future<ResultSetGroup> futureResultSetGroup = 
  connection.executeAsync(new Request("sql", "select * from foo..."));

You can also use PreparedStatement to escape query parameters. We don't store the Prepared Statement in the database and hence it won't increase the subsequent query performance.

PreparedStatement statement = 
    connection.prepareStatement(new Request("sql", "select * from foo where a = ?"));
statement.setString(1, "bar");
 
ResultSetGroup resultSetGroup = statement.execute();
// OR
Future<ResultSetGroup> futureResultSetGroup = statement.executeAsync();

Result Set

Results can be obtained with the various get methods in the first ResultSet, obtained through the getResultSet(int) method:

Request request = new Request("sql", "select foo, bar from baz where quux = 'quuux'");
ResultSetGroup resultSetGroup = connection.execute(request);
ResultSet resultTableResultSet = pinotResultSetGroup.getResultSet(0);
 
for (int i = 0; i < resultSet.getRowCount(); ++i) {
  System.out.println("foo: " + resultSet.getString(i, 0));
  System.out.println("bar: " + resultSet.getInt(i, 1));
}

PQL Queries

If queryFormat pql is used in the Request, there are some differences in how the results can be accessed, depending on the query.

In the case of aggregation, each aggregation function is within its own ResultSet. A query with multiple aggregation function will return one result set per aggregation function, as they are computed in parallel.

ResultSetGroup resultSetGroup = 
    connection.execute(new Request("pql", "select max(foo), min(foo) from bar"));
 
System.out.println("Number of result groups:" +
    resultSetGroup.getResultSetCount(); // 2, min(foo) and max(foo)
ResultSet resultSetMax = resultSetGroup.getResultSet(0);
System.out.println("Max foo: " + resultSetMax.getInt(0));
ResultSet resultSetMin = resultSetGroup.getResultSet(1);
System.out.println("Min foo: " + resultSetMin.getInt(0));

In case of aggregation with GROUP BY, there will be as many ResultSets as the number of aggregations, each of which will contain multiple results grouped by a grouping key.

ResultSetGroup resultSetGroup = 
    connection.execute(
        new Request("pql", "select min(foo), max(foo) from bar group by baz"));
 
System.out.println("Number of result groups:" +
    resultSetGroup.getResultSetCount(); // 2, min(foo) and max(foo)
 
ResultSet minResultSet = resultSetGroup.getResultSet(0);
for(int i = 0; i < minResultSet.length(); ++i) {
    System.out.println("Minimum foo for " + minResultSet.getGroupKeyString(i, 1) +
        ": " + minResultSet.getInt(i));
}
 
ResultSet maxResultSet = resultSetGroup.getResultSet(1);
for(int i = 0; i < maxResultSet.length(); ++i) {
    System.out.println("Maximum foo for " + maxResultSet.getGroupKeyString(i, 1) +
        ": " + maxResultSet.getInt(i));
}
⚠️

This section is only applicable for PQL endpoint, which is deprecated and will be deleted soon. For more information about the endpoints, visit Querying Pinot.

Authentication

Pinot supports basic HTTP authorization (opens in a new tab), which can be enabled for your cluster using configuration. To support basic HTTP authorization in your client-side Java applications, make sure you are using Pinot Java Client 0.10.0+ or building from the latest Pinot snapshot. The following code snippet shows you how to connect to and query a Pinot cluster that has basic HTTP authorization enabled when using the Java client.

final String username = "admin";
final String password = "verysecret";
 
// Concatenate username and password and use base64 to encode the concatenated string
String plainCredentials = username + ":" + password;
String base64Credentials = new String(
    Base64.getEncoder().encode(plainCredentials.getBytes()));
 
String authorizationHeader = "Basic " + base64Credentials;
 
Map<String, String> headers = new HashMap();
headers.put("Authorization", authorizationHeader);
JsonAsyncHttpPinotClientTransportFactory factory = 
    new JsonAsyncHttpPinotClientTransportFactory();
factory.setHeaders(headers);
PinotClientTransport clientTransport = factory
    .buildTransport();
 
Connection connection = ConnectionFactory.fromProperties(
        Collections.singletonList("localhost:8000"), clientTransport);
String query = "select count(*) FROM baseballStats limit 1";
 
ResultSetGroup rs = connection.execute(query);
System.out.println(rs);
connection.close();

Configuring client time-out

The following timeouts can be set:

  • brokerConnectTimeoutMs (default 2000)
  • brokerReadTimeoutMs (default 60000)
  • brokerHandshakeTimeoutMs (default 2000)
  • controllerConnectTimeoutMs (default 2000)
  • controllerReadTimeoutMs (default 60000)
  • controllerHandshakeTimeoutMs (default 2000)

Timeouts for the Java connector can be added as a connection properties. The following example configures a very low timeout of 10ms:

Properties connectionProperties = new Properties();
connectionProperties.setProperty("controllerReadTimeoutMs", "10");
connectionProperties.setProperty("controllerHandshakeTimeoutMs", "10");
connectionProperties.setProperty("controllerConnectTimeoutMs", "10");
connectionProperties.setProperty("brokerReadTimeoutMs", "10");
connectionProperties.setProperty("brokerHandshakeTimeoutMs", "10");
connectionProperties.setProperty("brokerConnectTimeoutMs", "10");
 
// Register new Pinot JDBC driver
DriverManager.registerDriver(new PinotDriver());
 
// Get a client connection and set the connection timeouts
Connection connection = DriverManager.getConnection(DB_URL, connectionProperties);
 
// Test that your query successfully times out
Statement statement = connection.createStatement();
ResultSet rs = statement.executeQuery("SELECT count(*) FROM baseballStats LIMIT 1;");
 
while (rs.next()) {
    String result = rs.getString("count(*)");
    System.out.println(result);
}

Configuring client time-out

The following timeouts can be set:

  • brokerConnectTimeoutMs (default 2000)
  • brokerReadTimeoutMs (default 60000)
  • brokerHandshakeTimeoutMs (default 2000)
  • controllerConnectTimeoutMs (default 2000)
  • controllerReadTimeoutMs (default 60000)
  • controllerHandshakeTimeoutMs (default 2000)

Timeouts for the Java connector can be added as a connection properties. The following example configures a very low timeout of 10ms:

Properties connectionProperties = new Properties();
connectionProperties.setProperty("controllerReadTimeoutMs", "10");
connectionProperties.setProperty("controllerHandshakeTimeoutMs", "10");
connectionProperties.setProperty("controllerConnectTimeoutMs", "10");
connectionProperties.setProperty("brokerReadTimeoutMs", "10");
connectionProperties.setProperty("brokerHandshakeTimeoutMs", "10");
connectionProperties.setProperty("brokerConnectTimeoutMs", "10");
 
// Register new Pinot JDBC driver
DriverManager.registerDriver(new PinotDriver());
 
// Get a client connection and set the connection timeouts
Connection connection = DriverManager.getConnection(DB_URL, connectionProperties);
 
// Test that your query successfully times out
Statement statement = connection.createStatement();
ResultSet rs = statement.executeQuery("SELECT count(*) FROM baseballStats LIMIT 1;");
 
while (rs.next()) {
    String result = rs.getString("count(*)");
    System.out.println(result);
}