Anomaly Detection in Time Series
In recent years there has been an explosion of papers on time series anomaly detection appearing in SIGKDD and other data mining, machine learning and database conferences. Most of these papers test on one or more of a handful of benchmark datasets, including datasets created by NASA, Yahoo, Numenta and Tsinghua-OMNI (Pei’s Lab) etc.
While the community should greatly appreciate the efforts of these teams to share data, a handful of recent papers [a], have suggested that these are unsuitable datasets for gauging progress in anomaly detection.
In brief, the two most compelling arguments against using these datasets are:
· Triviality : Almost all the benchmark datasets mentioned above can be perfected solved, without the need to look any at any training data, and with decade-old algorithms.
· Mislabeling : The possibility of mislabeling for anomaly detection benchmarks can never be completely eliminated. However, some of the datasets mentioned above seem to have a significant number of false positives and false negatives in the ground truth. Papers have been published arguing that method A is better than method B, because it is 5% more accurate on benchmark X. However, a careful examination of benchmark X suggests that more that 25% of the labels are wrong, a number that dwarfs the claimed difference between the algorithms being compared.
Beyond the issues listed above, and the possibility of file drawer effect [b] and/or cherry-picking [c], we believe that the community has been left with a set of unsuitable benchmarks. With this in mind, we have created new benchmarks for time series anomaly detection as part of this contest.
The benchmark datasets created for this contest are designed to mitigate this problem. It is important to note our claim is “mitigate”, not “solve”. We think it would be wonderful for a large and diverse group of researchers to address this issue, much in the spirit of CASP [d].
In the meantime, the 250 datasets that are part of this challenge reflect more than 20 years work surveying the time series anomaly detection literature and collecting datasets. Beyond the life of this competition, we hope that they can serve as a resource for the community for years to come, and to inspire deeper introspection about the evaluation of anomaly detection.
We hope you will enter the contest, and have lots of fun! Please use this forum to share, ask questions and collaborate.
Best wishes,
Prof. Eamonn Keogh, UC Riverside and Taposh Roy, Kaiser Permanente
[a] [2009.13807] Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress. Wu and Keogh
[b] Publication bias - Wikipedia
[c] Cherry picking - Wikipedia
[d] CASP - Wikipedia
