Extract parts of date

Extract parts of date

In this recipe we'll learn how to extract parts of a date from a timestamp column.

Launch Pinot Cluster

You can spin up a Pinot Cluster by running the following command, which starts up the Hybrid QuickStart:

docker run \
    -p 9000:9000 \
    apachepinot/pinot:1.0.0 QuickStart \
    -type hybrid

This command will run a single instance of the Pinot Controller, Pinot Server, Pinot Broker, Kafka, and Zookeeper all in one container.

The Hybrid Quickstart

The Hybrid Quickstart contains an offline and real-time table called airlineStats. These tables contain details of various flights from 2014.

We can view the schema of the table by calling the /schemas/<schemaName> command.

To get a list of the dimension fields, we can filter the results using jq like this:

curl -X GET "http://localhost:9000/schemas/airlineStats" 2>/dev/null | jq -c '.dimensionFieldSpecs[]'

Output:

{"name":"ActualElapsedTime","dataType":"INT"}
{"name":"AirTime","dataType":"INT"}
{"name":"AirlineID","dataType":"INT"}
{"name":"ArrDel15","dataType":"INT"}
{"name":"ArrDelay","dataType":"INT"}
...

To get a list of date time fields, we can filter the results using jq like this:

curl -X GET "http://localhost:9000/schemas/airlineStats" 2>/dev/null | jq -c '.dateTimeFieldSpecs[]'

Output:

{"name":"DaysSinceEpoch","dataType":"INT","format":"1:DAYS:EPOCH","granularity":"1:DAYS"}
{"name":"ts","dataType":"TIMESTAMP","format":"1:MILLISECONDS:TIMESTAMP","granularity":"1:SECONDS"}
{"name":"tsRaw","dataType":"TIMESTAMP","format":"1:MILLISECONDS:TIMESTAMP","granularity":"1:SECONDS"}

ts and tsRaw are the timestamp columns in this table.

Extracting date components

We can extract various components of the ts column using the EXTRACT (opens in a new tab) function that was added in Apache Pinot 0.12. An example of all the available components is shown below:

select ts, 
       EXTRACT(YEAR FROM ts) AS year,
       EXTRACT(MONTH FROM ts) AS month,
       EXTRACT(DAY FROM ts) AS day, 
       EXTRACT(HOUR FROM ts) AS hour,
       EXTRACT(MINUTE FROM ts) AS min,
       EXTRACT(SECOND FROM ts) AS sec
from airlineStats
order by ts DESC
limit 10
tsyearmonthdayhourminsec
2014-02-14 00:00:00.02014214000
2014-02-14 00:00:00.02014214000
2014-02-14 00:00:00.02014214000
2014-02-14 00:00:00.02014214000
2014-02-14 00:00:00.02014214000
2014-02-14 00:00:00.02014214000
2014-02-14 00:00:00.02014214000
2014-02-14 00:00:00.02014214000
2014-02-14 00:00:00.02014214000
2014-02-14 00:00:00.02014214000

Query Results

We can also group by the extracted values, as shown below:

select EXTRACT(MONTH FROM ts) AS month,
       EXTRACT(DAY FROM ts) AS day, 
	   count(*)
from airlineStats
group by month, day
order by count(*) DESC
limit 10
monthdaycount(*)
15422
13409
12403
16354
124343
19340
110338
130333
129322
126319

Query Results