How to combine source fields into one column
In this recipe we'll learn how to combine the data from fields in our data source into a single column in Apache Pinot.
Pinot Version | 0.10.0 |
Code | startreedata/pinot-recipes/combine-fields |
Pre-requisites
You will need to install Docker locally to follow the code examples in this guide.
Download Recipe
First, clone the GitHub repository to your local machine and navigate to this recipe:
git clone git@github.com:startreedata/pinot-recipes.git
cd pinot-recipes/recipes/combine-fields
If you don't have a Git client, you can also download a zip file that contains the code and then navigate to the recipe.
Launch Pinot Cluster
You can spin up a Pinot Cluster by running the following command:
docker-compose up
This command will run a single instance of the Pinot Controller, Pinot Server, Pinot Broker, and Zookeeper. You can find the docker-compose.yml file on GitHub.
Dataset
We're going to import the following JSON file:
{"name":"Pete", "surname": "Smith"}
{"name":"John", "surname": "Jones"}
Pinot Schema and Table
Now let's create a Pinot Schema and Table.
First, the schema:
{
"schemaName": "people",
"dimensionFieldSpecs": [
{
"name": "fullName",
"dataType": "STRING"
}
]
}
You can create the schema by running the following command:
docker exec -it pinot-controller-json bin/pinot-admin.sh AddSchema \
-schemaFile /config/schema.json \
-exec
We'll also have the following table config:
{
"tableName": "people",
"tableType": "OFFLINE",
"segmentsConfig": {
"replication": 1,
"schemaName": "people"
},
"ingestionConfig": {
"transformConfigs": [
{
"columnName": "fullName",
"transformFunction": "concat(name, surname, ' ')"
}
],
"batchIngestionConfig": {
"segmentIngestionType": "APPEND",
"segmentIngestionFrequency": "DAILY"
}
},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant"
},
"tableIndexConfig": {
"loadMode": "MMAP"
},
"metadata": {}
}
The highlighted section contains a transformation function that concatenates the name
and surname
fields, separated by a space.
You can create the table by running the following command:`
docker exec -it pinot-controller-json bin/pinot-admin.sh AddTable \
-tableConfigFile /config/table.json \
-exec
Ingestion Job
Now we’re going to import the JSON file into Pinot. We'll do this with the following ingestion spec:
executionFrameworkSpec:
name: 'standalone'
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: '/data'
includeFileNamePattern: 'glob:**/import.json'
outputDirURI: '/opt/pinot/data/people/'
overwriteOutput: true
pinotFSSpecs:
- scheme: file
className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
dataFormat: 'json'
className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'
tableSpec:
tableName: 'people'
pinotClusterSpecs:
- controllerURI: 'http://localhost:9000'
You can run the following command to run the import:
docker exec -it pinot-controller-json bin/pinot-admin.sh LaunchDataIngestionJob \
-jobSpecFile /config/job-spec.yml
Querying
Once that's completed, navigate to localhost:9000/#/query and click on the people
table or copy/paste the following query:
select *
from people
You will see the following output:
fullName |
---|
Pete Smith |
John Jones |
Query Results
We can see that the name
and surname
fields from our JSON file have been combined into a single fullName
column for each person.