StarTree ThirdEye getting started checklist
To get started with StarTree ThirdEye (opens in a new tab), do the following:
Prerequisites
- Apache Pinot must be installed and running. To install Pinot, do one of the following:
- Install the open source version of Apache Pinot (opens in a new tab) on your own infrastructure (called BYOC for bring your own cloud)
- Use StarTree Cloud (opens in a new tab), which includes Pinot hosted as software-as-a-service (SaaS) along with a few other perks.
- StarTree ThirdEye must be installed and running. To install ThirdEye, do one of the following:
- Install the open source version of ThirdEye. For more information, see infrastructure requirements.
- Use StarTree Cloud, which includes an available instance of ThirdEye, ready to connect to data you've uploaded into your StarTree Cloud instance of Pinot.
- Upload your data into Apache Pinot
Data to be used must have the right data architecture in order to be useful. See ThirdEye data requirements for more information.
- Work through the ThirdEye use case planning template to ensure you understand what intend to accomplish in the following checklist.
- Data readiness and validation checklist:
- Timestamp needs to be in epoch millis for ThirdEye to work efficiently. If there is a non-epoch millis column then convert it to epoch millis using the “derived column” and apply time range indexes based on time granularity used in ThirdEye (Daily, hourly, 15 mins, 1 min).
- Cannot have data gaps Link to completeness delay (opens in a new tab), definition (opens in a new tab), Link to external blog how to do DQ check using ThirdEye (opens in a new tab)
- Based on Query patterns from ThirdEye to Pinot - apply those specific indexing.
- Joins are not supported in ThirdEye. Make sure derived columns pre-ingestion are created for those to support TE use cases.
- ThirdEye needs a denormalized schema for the following reasons:
- TE can run any SQL query Pinot can support for just running alerts. We can do this with a custom template. Link to ThirdEye templates (opens in a new tab). One can clone this to create a custom template and update the dag in it (query against Pinot). However, all the templates need to be cloned to add this custom query)
- The complexity is more on: how the metric is then defined, how people can choose different detection algorithms on it, how it is exposed in the UI, how that is shared in RCA, notifications, etc., alerts might work, but the entire experience is not great. RCA may break. If RCA top contributors, heatmap needs to provide accurate insights at a quicker speed then transformed data will be ideal.
- If there are too many dimensions in the table how to make RCA more performant? Ans: Number of dimensions in a schema - indexes on dimensions (too many dimensions will slow down RCA UI) - rcaExcludedDimensions Link to the doc (opens in a new tab)
Make your data accessible by ThirdEye
To make your data accessible by ThirdEye, you must connect ThirdEye to a datasource.
If you're using StarTree Cloud, see an example of how to load a sample dataset section.
If you are using StarTree Cloud, when you open ThirdEye, ThirdEye automatically connects to Pinot and guides you through the connection process.
Add a notification system
Create one or more groups for alerts to be sent to, each with one or more routes for sending the notifications, such as email, Slack, or PagerDuty. These groups are called Subscription Groups in ThirdEye.
Notifications are sent to subscription groups. The entire set of subscription groups and routes is referred to as a notification system.
Create your notification system first so that you have a place to send alert notifications when you create the alerts.
Create alerts
Read Create your first alert for a basic example you can use to guide you through the process.
The Create alerts section of the docs contains some practical recipes and patterns you can use.
To learn more about how to configure alerts, see Alert configuration and execution.