Combine fields during import

How to combine source fields into one column

In this recipe we'll learn how to combine the data from fields in our data source into a single column in Apache Pinot.

Prerequisites

To follow the code examples in this guide, you must install Docker (opens in a new tab) locally and download recipes.

Navigate to recipe

  1. If you haven't already, download recipes.
  2. In terminal, go to the recipe by running the following command:
cd pinot-recipes/recipes/combine-fields

Launch Pinot Cluster

You can spin up a Pinot Cluster by running the following command:

docker-compose up

This command will run a single instance of the Pinot Controller, Pinot Server, Pinot Broker, and Zookeeper. You can find the docker-compose.yml (opens in a new tab) file on GitHub.

Dataset

We're going to import the following JSON file:

{"name":"Pete", "surname": "Smith"}
{"name":"John", "surname": "Jones"}

data/movies.json

Pinot Schema and Table

Now let's create a Pinot Schema and Table.

First, the schema:

{
  "schemaName": "people",
  "dimensionFieldSpecs": [
    {
      "name": "fullName",
      "dataType": "STRING"
    }
  ]
}

config/schema.json

You can create the schema by running the following command:

docker exec -it pinot-controller-json bin/pinot-admin.sh AddSchema \
  -schemaFile /config/schema.json \
  -exec

We'll also have the following table config:

{
  "tableName": "people",
  "tableType": "OFFLINE",
  "segmentsConfig": {
    "replication": 1,
    "schemaName": "people"
  },
  "ingestionConfig": {
    "transformConfigs": [
      {
        "columnName": "fullName",
        "transformFunction": "concat(name, surname, ' ')"
      }
    ],
    "batchIngestionConfig": {
      "segmentIngestionType": "APPEND",
      "segmentIngestionFrequency": "DAILY"
    }
  },
  "tenants": {
    "broker": "DefaultTenant",
    "server": "DefaultTenant"
  },
  "tableIndexConfig": {
    "loadMode": "MMAP"
  },
  "metadata": {}
}
 

config/table.json

The highlighted section contains a transformation function (opens in a new tab) that concatenates the name and surname fields, separated by a space.

You can create the table by running the following command:`

docker exec -it pinot-controller-json bin/pinot-admin.sh AddTable \
  -tableConfigFile /config/table.json \
  -exec

Ingestion Job

Now we’re going to import the JSON file into Pinot. We'll do this with the following ingestion spec:

executionFrameworkSpec:
  name: 'standalone'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: '/data'
includeFileNamePattern: 'glob:**/import.json'
outputDirURI: '/opt/pinot/data/people/'
overwriteOutput: true
pinotFSSpecs:
  - scheme: file
    className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
  dataFormat: 'json'
  className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'
tableSpec:
  tableName: 'people'
pinotClusterSpecs:
  - controllerURI: 'http://pinot-controller-combine:9000'
pushJobSpec:
  pushAttempts: 2
  pushRetryIntervalMillis: 1000
 

config/job-spec.yml

You can run the following command to run the import:

docker exec -it pinot-controller-json bin/pinot-admin.sh LaunchDataIngestionJob \
  -jobSpecFile /config/job-spec.yml

Querying

Once that's completed, navigate to localhost:9000/#/query (opens in a new tab) and click on the people table or copy/paste the following query:

select * 
from people 

You will see the following output:

fullName
Pete Smith
John Jones

Query Results

We can see that the name and surname fields from our JSON file have been combined into a single fullName column for each person.