StarTree Cloud Coding Competition With Grand Prizes

startree-matrix-profile-query-dx

Description

Detect an anomaly if the anomaly score returned by a matrix profile (opens in a new tab) algorithm is above the threshold. Aggregation function with 1 operand: SUM, MAX,etc... Use the enumeratorQuery property to feed in a query that outputs different dimensions to explore.

Flowchart

Parameters

DATA

name	description	default value
aggregationColumn	The column to aggregate. Can be a derived metric.	-
aggregationFunction	The aggregation function to apply on the aggregationColumn. Example: `AVG`.	-
dataSource	The Pinot datasource to use.	-
dataset	The dataset to query.	-
monitoringGranularity	The period of aggregation of the timeseries. In ISO-8601 format. Example: `PT1H`.	-
timezone	Timezone used to group by time. In TZ-identifier (opens in a new tab) format. For instance, `UTC` or `US/Pacific`.	UTC
timeColumn	TimeColumn used to group by time. If set to AUTO (the default value), the Pinot primary time column is used.	AUTO
timeColumnFormat	Required if timeColumn is not AUTO. Learn more (opens in a new tab).
completenessDelay	The time for your data to be considered complete and ready for anomaly detection. In ISO-8601 format. Example: `PT2H`. Learn more (opens in a new tab).	P0D
queryFilters	Filters to apply when fetching data. Prefix with `AND`. Example: `AND country='US'`	${queryFilters}
queryLimit	Maximum number of timeseries point to fetch.	100000001

DETECTION

name	description	default value
lookback	Historical time period to use to train the model. In ISO-8601 format. Example: `P21D`.	-
seasonalityPeriod	Biggest seasonality period to learn. In ISO-8601 format. Example: `P7D`.
matrixProfileDistance	For advanced users. Type of distance to use for the matrix profile computation. `NORMALIZED` or `NON_NORMALIZED`.	NORMALIZED
sensitivity	Anomaly score threshold. If the anomaly score returned by the algorithm is bigger than this threshold, return an anomaly. The smaller, the more anomalies are detected.	1
computeBounds	For advanced users. If false, the algorithm returns the anomaly scores instead of the predicted upper/lower bounds. Helpful for alert finetuning.	true
scoringMethod	Experimental. Post-processing of the matrix profile anomaly scores. `FIRST_ORDER_DIFFERENCE` is good at detecting spikes. `DIRECT` is good at detecting drifts.	FIRST_ORDER_DIFFERENCE

FILTER

Time of week

name	description	default value
daysOfWeek	Used to ignore anomalies that happen at specific time periods. A list of days. Anomalies happening on these days are ignored if timeOfWeekIgnore is true. Example: `["MONDAY", "SUNDAY"]`.	[]
hoursOfDay	Used to ignore anomalies that happen at specific time periods. A list of hours. Anomalies happening on these hours are ignored. Example: `[0,1,2,23]`	[]
dayHoursOfWeek	Used to ignore anomalies that happen at specific time periods. A mapping of `{DAY: [hours]}`. Anomalies happening on these timeframes are ignored if timeOfWeekIgnore is true. Example: `{"FRIDAY": [22, 23], "SATURDAY": [0, 1, 2]}`	{}

Threshold

name	description	default value
thresholdFilterMin	Used to ignore anomalies that don't meet the thresholdFilter min and max. Example: set `thresholdFilterMin = 10` to ignore anomalies when the metric is smaller than 10. Can help ignore anomalies happening in low data regimes. Filter threshold minimum. If `-1`, no minimum threshold is applied.	-1
thresholdFilterMax	Used to ignore anomalies that don't meet the thresholdFilter min and max. Example: set `thresholdFilterMin = 10` to ignore anomalies when the metric is smaller than 10. Can help ignore anomalies happening in low data regimes. Filter threshold maximum. If `-1`, no maximum threshold is applied.	-1

Guardrail metric

name	description	default value
guardrailMetricMin	Used to ignore anomalies that don't meet the guardrail threshold. Minimum threshold of the guardrail metric. If `-1`, no minimum threshold is applied.	-1
guardrailMetricMax	Used to ignore anomalies that don't meet the guardrail threshold. Maximum threshold of guardrailMetric. If `-1`, no maximum threshold is applied.	-1
guardrailMetric	Used to ignore anomalies that don't meet the guardrail threshold. Metric to use as a threshold guardrail. Example: `COUNT(*)` and set `guardrailMetricMin = 100` to ignore anomalies detected when there is less than 100 observations in the period.	COUNT(*)

Simple baseline

name	description	default value
offsetBaselineFilterPattern	Used to ignore anomalies that are not detected as anomalies by a simple model. Whether to detect an anomaly if it's a drop, a spike or any of the two.	UP_OR_DOWN
offsetBaselineFilterSensitivity	Used to ignore anomalies that are not detected as anomalies by a simple model. Detection sensitivity. For instance with `offsetBaselineFilterIntervalsMethod=PERCENTAGE`, set 50 for a 50% percentage change threshold. With `offsetBaselineFilterIntervalsMethod=ABSOLUTE`, set 200 for a 200 absolute difference threshold between the metric and the baseline.	-1
offsetBaselineFilterIntervalsMethod	Used to ignore anomalies that are not detected as anomalies by a simple model. Method to compute intervals. `PERCENTAGE` or `ABSOLUTE`.	ABSOLUTE
offsetBaselineFilterModelOffsets	Used to ignore anomalies that are not detected as anomalies by a simple model. A list of offsets in ISO-8601 format to use as baseline. Eg `[P7D, P14D]` will compare the current value to the aggregation of the values of the 2 previous weeks.	['P7D']
offsetBaselineFilterModelAggregation	Used to ignore anomalies that are not detected as anomalies by a simple model. The aggregation function to use to combine historical values. In `MEDIAN`, `AVERAGE`, `MIN`, `MAX` and any of `PCTXXXXX` eg `PCT05` (5th percentile), `PCT95`, `PCT999` (99.9th percentile).	MEDIAN

Special events

name	description	default value
eventFilterSqlFilter	Used to ignore anomalies that happen during events. Sql filter to apply on the events. Learn more (opens in a new tab)
eventFilterLookaround	Used to ignore anomalies that happen during events. Offset to apply on startTime and endTime to look around the timeframe. In ISO-8601 format. Example: `P1D`.	P2D
eventFilterTypes	Used to ignore anomalies that happen during events. List of event types to fetch by. Example: `["HOLIDAY", "DEPLOYMENT"]`. `[]` fetches all events. Use `["__NO_EVENTS"]` to disable.	['__NO_EVENTS']
eventFilterBeforeEventMargin	Used to ignore anomalies that happen during events. A period in ISO-8601 format that corresponds to a period that is also impacted by the event. Example: if beforeEventMargin is `P1D`, if event happens on `[Dec 24 0:00, Dec 25 0:00[`, the label will be applied to anomalies happening on `[Dec 23 0:00 and Dec 25 0:00[`	P0D
eventFilterAfterEventMargin	Used to ignore anomalies that happen during events. Same as eventFilterBeforeEventMargin at the end of the event.	P0D

Impact

name	description	default value
impactThreshold	Used to ignore anomalies that don't meet the impact threshold. Impact filter threshold.	-1

POSTPROCESS

Data mutability

name	description	default value
mutabilityPeriod	Use if your data is mutable. ThirdEye will maintain the detection results up to date on the mutable period. For instance, if your last 10 days of data is mutable, set `P10D`. At each cron detection job, the detection results for the last 10 days will be updated.	P0D
reNotifyPercentageThreshold	For detection replay when data is mutable. If the percentage difference between an existing anomaly and a new anomaly on the same time frame is above this threshold, renotify. Combined with `reNotifyAbsoluteThreshold`. Both thresholds must pass to be re-notified. If zero, always renotify. If null or negative, never re-notifies.	-1
reNotifyAbsoluteThreshold	For detection replay when data is mutable. If the absolute difference between an existing anomaly and a new anomaly on the same time frame is above this threshold, renotify. Combined with `reNotifyPercentageThreshold`. Both thresholds must pass to be re-notified. If zero, always renotify. If null or negative, never re-notifies.	-1

Anomaly merger

name	description	default value
mergeMaxGap	Maximum gap between 2 anomalies for anomalies to be merged. In ISO-8601 format. Example: `PT2H`. The default behavior is to merge consecutive anomalies only. To disable anomaly merging entirely, set this value to `P0D`.
mergeMaxDuration	Maximum duration of an anomaly merger. At merge time, if an anomaly merger would get bigger than this limit, the anomalies are not merged. In ISO-8601 format. Example: `P7D`.

RCA

name	description	default value
rcaAggregationFunction	The aggregation function to use for RCA. If the detection metric name is known to ThirdEye, this parameter is optional.
rcaIncludedDimensions	List of the dimensions (columns in the dataset) to use in RCA drill-downs. If not set or empty, all dimensions of the table are used. Learn more (opens in a new tab).	[]
rcaExcludedDimensions	List of dimensions (columns in the dataset) to ignore in RCA drill-downs. If not set or empty, all dimensions of the table are used. rcaExcludedDimensions and rcaIncludedDimensions cannot be used at the same time.	[]
rcaEventTypes	A list of type to filter on for RCA. Only events that match such types will be shown in the RCA related events tab. Learn more (opens in a new tab).	[]
rcaEventSqlFilter	A Sql filter for RCA events. Only events that match the filter will be shown in the RCA related events tab. Learn more (opens in a new tab).

DIMENSION_EXPLORATION

name	description	default value
enumeratorQuery	This is a SQL query that will run on the data source and build enumeration items from that queryExample: "SELECT DISTINCT country, device from pageviews LIMIT 100". In this case, the enumerator will generate one enumeration item for each country/device combination.	-
enumerationItemIdKeys	List of keys to use to identify the enumeration. The format is the following: `[ "queryFilters" ]` The keys must be present in the `params` object of each enumeration. The keys will be used to generate the dimension exploration id. The id will be used to identify the enumeration in the detection pipeline.	['queryFilters']

startree-matrix-profile-percentile-query-dx startree-mean-variance