StarTree Cloud Coding Competition With Grand Prizes

How to combine source fields into one column

In this recipe we'll learn how to combine the data from fields in our data source into a single column in Apache Pinot.

Pinot Version	0.10.0
Code	startreedata/pinot-recipes/combine-fields

Prerequisites

To follow the code examples in this guide, you must install Docker (opens in a new tab) locally and download recipes.

Navigate to recipe

If you haven't already, download recipes.
In terminal, go to the recipe by running the following command:

cd pinot-recipes/recipes/combine-fields

Launch Pinot Cluster

You can spin up a Pinot Cluster by running the following command:

docker-compose up

This command will run a single instance of the Pinot Controller, Pinot Server, Pinot Broker, and Zookeeper. You can find the docker-compose.yml (opens in a new tab) file on GitHub.

Dataset

We're going to import the following JSON file:

{"name":"Pete", "surname": "Smith"}
{"name":"John", "surname": "Jones"}

data/movies.json

Pinot Schema and Table

Now let's create a Pinot Schema and Table.

First, the schema:

{
  "schemaName": "people",
  "dimensionFieldSpecs": [
    {
      "name": "fullName",
      "dataType": "STRING"
    }
  ]
}

config/schema.json

You can create the schema by running the following command:

docker exec -it pinot-controller-json bin/pinot-admin.sh AddSchema \
  -schemaFile /config/schema.json \
  -exec

We'll also have the following table config:

{
  "tableName": "people",
  "tableType": "OFFLINE",
  "segmentsConfig": {
    "replication": 1,
    "schemaName": "people"
  },
  "ingestionConfig": {
    "transformConfigs": [
      {
        "columnName": "fullName",
        "transformFunction": "concat(name, surname, ' ')"
      }
    ],
    "batchIngestionConfig": {
      "segmentIngestionType": "APPEND",
      "segmentIngestionFrequency": "DAILY"
    }
  },
  "tenants": {
    "broker": "DefaultTenant",
    "server": "DefaultTenant"
  },
  "tableIndexConfig": {
    "loadMode": "MMAP"
  },
  "metadata": {}
}

config/table.json

The highlighted section contains a transformation function (opens in a new tab) that concatenates the name and surname fields, separated by a space.

You can create the table by running the following command:`

docker exec -it pinot-controller-json bin/pinot-admin.sh AddTable \
  -tableConfigFile /config/table.json \
  -exec

Ingestion Job

Now we’re going to import the JSON file into Pinot. We'll do this with the following ingestion spec:

executionFrameworkSpec:
  name: 'standalone'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: '/data'
includeFileNamePattern: 'glob:**/import.json'
outputDirURI: '/opt/pinot/data/people/'
overwriteOutput: true
pinotFSSpecs:
  - scheme: file
    className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
  dataFormat: 'json'
  className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'
tableSpec:
  tableName: 'people'
pinotClusterSpecs:
  - controllerURI: 'http://pinot-controller-combine:9000'
pushJobSpec:
  pushAttempts: 2
  pushRetryIntervalMillis: 1000

config/job-spec.yml

You can run the following command to run the import:

docker exec -it pinot-controller-json bin/pinot-admin.sh LaunchDataIngestionJob \
  -jobSpecFile /config/job-spec.yml

Querying

Once that's completed, navigate to localhost:9000/#/query (opens in a new tab) and click on the people table or copy/paste the following query:

select * 
from people

You will see the following output:

fullName
Pete Smith
John Jones

Query Results

We can see that the name and surname fields from our JSON file have been combined into a single fullName column for each person.

Handle null values MinIO as Deep Store