Skip to main content

S3

info

In this guide we'll learn how to import files stored in S3. You should have created an environment. You will need to upload your data files to a S3 bucket and create an AWS user that has access to that bucket.

The StarTree Data Manager can import files stored in S3 in CSV, Avro, JSON, or Parquet format.

Select data source

Click on the S3 button under the Which data do you want to use? heading:

Configure S3

Configure S3

Enter the name of your S3 bucket, the region where it resides, and your AWS Access Key ID and AWS Secret Key ID. Click TEST CONNECTION to check that StarTree Cloud can access the bucket.

Test Connection

image

You will see a success message if the bucket has been configured correctly. Click NEXT to go to the next screen.

Data Transformation

Next you'll need to specify the location of the files that you'd like to ingest from the S3 bucket.

tip

The files that you want to ingest must be placed in a folder inside the S3 bucket. Files at the root level will be ignored.

Data Transformation

URIFormat

Our file is in the events folder inside the bucket, so we'll enter that into the box, as well as selecting the CSV format. The Dataset Manager will then make an educated guess at the field and data types for each of the columns in the CSV file.

Columns and field/data types

Columns

We'll change the ts field type to be DATETIME. The updated data transformation is shown below:

Updated Columns and field/data types

Columns

Once you're happy with the data transformations, scroll down, and click on the NEXT button.

Advanced Settings

On this screen you'll be able to configure indexes, tennats, ingestion scheduling, and data retention on this data source.

Configure indexes, tenants, ingestion scheduling, and data retention

Advanced

For more information on the different types of indexes and when to use them, see the Apache Pinot Indexing Documentation.

Once you're happy with the configuration, scroll down, and click on the NEXT button.

Review

You'll now see the review and submit screen, where you can review everything that we've configured in the previous steps.

Review Data Source

Review

If anything doesn't look right, click on the PREV button to go back to the previous screen.

Once you're happy ready to create the data source, click on the FINISH button. You'll then see the following screen:

Data Source Created

DataSourceCreated

Query Data Source

To have a look at the data that we've imported, click on the Query Console link, which will open the Pinot Data Explorer. Click on the events table and then click RUN QUERY to run a basic query against the data source:

Query events Data Source

Query