Using a star-tree index

Using a star-tree index

In this recipe we'll learn how to use the StarTree index.

To understand how this recipe processes data, examine the commands executed in the Makefile.

Makefile

Run this recipe using make.

make recipe

This will build up the infrastructure: Pinot and Kafka, create the tables, and produce streaming data.

Three tables will be generated:

  • webtraffic - a real-time table with any Pinot indexes.
  • webstraffic_inverted - a real-time table with an inverted index on the columns: country, browserType, and deviceBrand.
"invertedIndexColumns": [
  "country",
  "browserType",
  "deviceBrand"
],
  • webtraffic_startree - a real-time table with a startree index.
"starTreeIndexConfigs": [
  {
    "dimensionsSplitOrder": [
      "country",
      "browserType",
      "deviceBrand"
    ],
    "skipStarNodeCreationForDimensions": [],
    "functionColumnPairs": [
      "COUNT__*",
      "SUM__timeSpent",
      "AVG__timeSpent"
    ],
    "maxLeafRecords": 10000
  }
],

Open your browser to the Pinot console (opens in a new tab) and execute the SQL statements below.

select browserType, count(*)
from webtraffic 
WHERE country = 'Uruguay'
GROUP BY browserType
limit 10
select country, sum(timeSpent) AS totalTime
from webtraffic
group by country
order by totalTime DESC
limit 10
select count(*), sum(timeSpent) AS totalTime
from webtraffic
where country = 'United Kingdom'
order by totalTime DESC
limit 10
select browserType, count(*)
from webtraffic
WHERE country IN ('Germany', 'United Kingdom', 'Spain')
GROUP BY browserType
limit 10

Try changing FROM webtraffic to FROM webtraffic_inverted or FROM webtraffic_stree

Clean up

make clean

Troubleshooting

To clean up old Docker installations that may be interfering with your testing of this recipe, run the following command:

docker system prune