Dimension exploration

Dimension exploration lets you create alerts for a specified dimension or combination of dimensions.

See the procedural doc if some of these terms don't look familiar.

Dimension exploration uses the enumerator, fork join and combiner operators to build loops inside the detection DAG.

Here's a simple pipeline for threshold detection.

 "nodes": [
        {
          "name": "enumerator",
          "type": "Enumerator",
          "params": {
            "items": [
              {
                "name":"US Only",
                "params": {
                "queryFilters": " AND country = 'US'",
                  "max": 300000,
                  "min": 100000
                }
              },
              {
                "name": "Overall",
                "params": {
                "queryFilters": "",
                  "max": 900000,
                  "min": 300000
                }
              }
            ]
          }
        },
        {
          "name": "combiner",
          "type": "Combiner"
        },
        {
          "name": "root",
          "type": "ForkJoin",
          "params": {
            "dryRun": false,
            "enumerator": "enumerator",
            "root": "anomalyDetector",
            "combiner": "combiner"
          }
        },
        {
          "name": "anomalyDetector",
          "type": "AnomalyDetector",
          "params": {
            "component.min": "${min}",
            "component.metric": "met",
            "component.monitoringGranularity": "${monitoringGranularity}",
            "component.max": "${max}",
            "anomaly.dataset": "${dataset}",
            "component.timestamp": "ts",
            "anomaly.metric": "${aggregationColumn}",
            "anomaly.source": "threshold-template/root",
            "type": "THRESHOLD"
          },
          "inputs": [
            {
              "targetProperty": "current",
              "sourcePlanNode": "currentDataFetcher",
              "sourceProperty": "currentData"
            }
          ],
          "outputs": []
        },
        {
          "name": "currentDataFetcher",
          "type": "DataFetcher",
          "params": {
            "component.dataSource": "${dataSource}",
            "component.query": "SELECT __timeGroup(\"${timeColumn}\", '${timeColumnFormat}', '${monitoringGranularity}') as ts, ${aggregationFunction}(\"${aggregationColumn}\") as met FROM ${dataset} WHERE __timeFilter(\"${timeColumn}\", '${timeColumnFormat}') ${queryFilters} GROUP BY ts ORDER BY ts LIMIT ${queryLimit}"
          },
          "inputs": [],
          "outputs": [
            {
              "outputKey": "currentData",
              "outputName": "currentData"
            }
          ]
        }
      ]

Enumeration item templatization

This is a simple default enumerator which lays out all the enumeration items.

The same json can be templatized by using thirdeye's ability to replace objects in strings. In this case, the enumerator node lists all the enumerations and can be split into the template and the value.

The template can expose the items array using a variable "${enumerationItems}".

        {
          "name": "enumerator",
          "type": "Enumerator",
          "params": {
            "items": "${enumerationItems}"
          }
        }

ThirdEye can detect that enumerationItems is an object and replaces the entire string with an object and behaves exactly like the initial pipeline json as shown above.

{
  "templateProperties": {
   "enumerationItems": [
     {
       "name":"US Only",
       "params": {
         "queryFilters": " AND country = 'US'",
         "max": 300000,
         "min": 100000
       }
     },
     {
       "name": "Overall",
       "params": {
         "queryFilters": "",
         "max": 900000,
         "min": 300000
       }
     }
   ]
  }
}

Sample Alert

Using these methods, dimension exploration can be templatized and expressed with templateProperties. Here is a sample dimension exploration alert look at pageviews with different combinations of threshold parameters and query filters.

{
  "name": "pageviews-dx-country",
  "description": "Sample description payload for testing",
  "template": {
    "name": "startree-threshold-dx"
  },
  "templateProperties": {
    "dataSource": "pinot",
    "dataset": "pageviews",
    "aggregationFunction": "sum",
    "aggregationColumn": "views",
    "monitoringGranularity": "P1D",
    "timeColumn": "date",
    "timeColumnFormat": "1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd",
    "max": "${max}",
    "min": "${min}",
    "queryFilters": "${queryFilters}",
    "enumerationItems": [
      {
        "name": "US",
        "params": {
          "queryFilters": " AND country = 'US'",
          "max": 300000,
          "min": 100000
        }
      },
      {
        "name": "Overall",
        "params": {
          "queryFilters": "",
          "max": 900000,
          "min": 300000
        }
      },
      {
        "name": "IN",
        "params": {
          "queryFilters": " AND country = 'IN'",
          "max": 900000,
          "min": 0
        }
      }
    ]
  },
  "cron": "0 0 * * * ? *"
}