How to combine source fields into one column
In this recipe we'll learn how to combine the data from fields in our data source into a single column in Apache Pinot.
Pinot Version | 0.10.0 |
Code | startreedata/pinot-recipes/combine-fields |
Prerequisites
To follow the code examples in this guide, you must install Docker (opens in a new tab) locally and download recipes.
Navigate to recipe
- If you haven't already, download recipes.
- In terminal, go to the recipe by running the following command:
cd pinot-recipes/recipes/combine-fields
Launch Pinot Cluster
You can spin up a Pinot Cluster by running the following command:
docker-compose up
This command will run a single instance of the Pinot Controller, Pinot Server, Pinot Broker, and Zookeeper. You can find the docker-compose.yml (opens in a new tab) file on GitHub.
Dataset
We're going to import the following JSON file:
{"name":"Pete", "surname": "Smith"}
{"name":"John", "surname": "Jones"}
data/movies.json
Pinot Schema and Table
Now let's create a Pinot Schema and Table.
First, the schema:
{
"schemaName": "people",
"dimensionFieldSpecs": [
{
"name": "fullName",
"dataType": "STRING"
}
]
}
config/schema.json
You can create the schema by running the following command:
docker exec -it pinot-controller-json bin/pinot-admin.sh AddSchema \
-schemaFile /config/schema.json \
-exec
We'll also have the following table config:
{
"tableName": "people",
"tableType": "OFFLINE",
"segmentsConfig": {
"replication": 1,
"schemaName": "people"
},
"ingestionConfig": {
"transformConfigs": [
{
"columnName": "fullName",
"transformFunction": "concat(name, surname, ' ')"
}
],
"batchIngestionConfig": {
"segmentIngestionType": "APPEND",
"segmentIngestionFrequency": "DAILY"
}
},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant"
},
"tableIndexConfig": {
"loadMode": "MMAP"
},
"metadata": {}
}
config/table.json
The highlighted section contains a transformation function (opens in a new tab) that concatenates the name
and surname
fields, separated by a space.
You can create the table by running the following command:`
docker exec -it pinot-controller-json bin/pinot-admin.sh AddTable \
-tableConfigFile /config/table.json \
-exec
Ingestion Job
Now we’re going to import the JSON file into Pinot. We'll do this with the following ingestion spec:
executionFrameworkSpec:
name: 'standalone'
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: '/data'
includeFileNamePattern: 'glob:**/import.json'
outputDirURI: '/opt/pinot/data/people/'
overwriteOutput: true
pinotFSSpecs:
- scheme: file
className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
dataFormat: 'json'
className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'
tableSpec:
tableName: 'people'
pinotClusterSpecs:
- controllerURI: 'http://pinot-controller-combine:9000'
pushJobSpec:
pushAttempts: 2
pushRetryIntervalMillis: 1000
config/job-spec.yml
You can run the following command to run the import:
docker exec -it pinot-controller-json bin/pinot-admin.sh LaunchDataIngestionJob \
-jobSpecFile /config/job-spec.yml
Querying
Once that's completed, navigate to localhost:9000/#/query (opens in a new tab) and click on the people
table or copy/paste the following query:
select *
from people
You will see the following output:
fullName |
---|
Pete Smith |
John Jones |
Query Results
We can see that the name
and surname
fields from our JSON file have been combined into a single fullName
column for each person.