Skip to main content

Dimension Exploration

Dimension Exploration gives users the ability to create alerts for every value of a certain dimension or combination of dimensions.

Please see the concept doc if some of these terms don't look familiar.

Dimension Exploration uses the enumerator, fork join and combiner operators to builld loops inside the detection DAG.

Here's a simple pipeline assuming threshold detection.

 "nodes": [
{
"name": "enumerator",
"type": "Enumerator",
"params": {
"items": [
{
"name":"US Only",
"params": {
"queryFilters": " AND country = 'US'",
"max": 300000,
"min": 100000
}
},
{
"name": "Overall",
"params": {
"queryFilters": "",
"max": 900000,
"min": 300000
}
}
]
}
},
{
"name": "combiner",
"type": "Combiner"
},
{
"name": "root",
"type": "ForkJoin",
"params": {
"dryRun": false,
"enumerator": "enumerator",
"root": "anomalyDetector",
"combiner": "combiner"
}
},
{
"name": "anomalyDetector",
"type": "AnomalyDetector",
"params": {
"component.min": "${min}",
"component.metric": "met",
"component.monitoringGranularity": "${monitoringGranularity}",
"component.max": "${max}",
"anomaly.dataset": "${dataset}",
"component.timestamp": "ts",
"anomaly.metric": "${aggregationColumn}",
"anomaly.source": "threshold-template/root",
"type": "THRESHOLD"
},
"inputs": [
{
"targetProperty": "current",
"sourcePlanNode": "currentDataFetcher",
"sourceProperty": "currentData"
}
],
"outputs": []
},
{
"name": "currentDataFetcher",
"type": "DataFetcher",
"params": {
"component.dataSource": "${dataSource}",
"component.query": "SELECT __timeGroup(\"${timeColumn}\", '${timeColumnFormat}', '${monitoringGranularity}') as ts, ${aggregationFunction}(\"${aggregationColumn}\") as met FROM ${dataset} WHERE __timeFilter(\"${timeColumn}\", '${timeColumnFormat}') ${queryFilters} GROUP BY ts ORDER BY ts LIMIT ${queryLimit}"
},
"inputs": [],
"outputs": [
{
"outputKey": "currentData",
"outputName": "currentData"
}
]
}
]

Enumeration Item Templatization

This is a simple default enumerator which lays out all the enumeration items.

The same json can be templatized by using thirdeye's ability to replace objects in strings. In this case, the enumerator node lists all the enumerations and can be split into the template and the value

The template can expose the items array using a variable "${enumerationItems}".

        {
"name": "enumerator",
"type": "Enumerator",
"params": {
"items": "${enumerationItems}"
}
}

ThirdEye can detect that enumerationItems is an object and replaces the entire string with an object and behaves exactly like the initial pipeline json as shown above.

{
"templateProperties": {
"enumerationItems": [
{
"name":"US Only",
"params": {
"queryFilters": " AND country = 'US'",
"max": 300000,
"min": 100000
}
},
{
"name": "Overall",
"params": {
"queryFilters": "",
"max": 900000,
"min": 300000
}
}
]
}
}

Sample Alert

Using these tricks dimension exploration can be simply templatized and expressed with templateProperties. Here is a sample dimension exploration alert look at pageviews with different combinations of threshold parameters and query filters.

{
"name": "pageviews-dx-country",
"description": "Sample description payload for testing",
"template": {
"name": "startree-threshold-dx"
},
"templateProperties": {
"dataSource": "pinot",
"dataset": "pageviews",
"aggregationFunction": "sum",
"aggregationColumn": "views",
"monitoringGranularity": "P1D",
"timeColumn": "date",
"timeColumnFormat": "1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd",
"max": "${max}",
"min": "${min}",
"queryFilters": "${queryFilters}",
"enumerationItems": [
{
"name": "US",
"params": {
"queryFilters": " AND country = 'US'",
"max": 300000,
"min": 100000
}
},
{
"name": "Overall",
"params": {
"queryFilters": "",
"max": 900000,
"min": 300000
}
},
{
"name": "IN",
"params": {
"queryFilters": " AND country = 'IN'",
"max": 900000,
"min": 0
}
}
]
},
"cron": "0 0 * * * ? *"
}