tags : Database, Prometheus, Machine Learning, Statistics, Probability, Representing Time and Date

See TS section in Machine Learning

FAQ

What are the different methods

Comparison Table

MethodCategoryDesc/FeaturesUse Cases/StrengthsWeaknessesIntroduced
Classical/Statistical
Naive/Seasonal NaiveStatisticalForecast = last / last seasonal valueBaseline; simple; fastAssumes persistence; often inaccurateFoundational
Simple Exp. Smooth (SES)StatisticalWeighted avg; models levelUnivar; no trend/season; simpleNo trend/season handling~1950s
Holt’s Linear TrendStatisticalSES + linear trendUnivar; trend; simpleAssumes linear trend; no season~1957
Holt-WintersStatisticalHolt + seasonality (add/mult)Univar; trend & season; good benchmarkAssumes fixed patterns~1960
ETSStatisticalState-space framework for ES; auto-selectsGeneral univar; robust; auto-select; prob.Univar only; assumes state-space~2002
ARIMA/SARIMAStatisticalModels autocorrelation (AR+I+MA); SARIMA=seasonalUnivar; models autocorrelation; benchmark; prob.Requires stationarity; param tuning~1970
Theta MethodStatisticalDecompose + damped linear extrapolationUnivar; strong M3/M4 perf.; simpleLess intuitive; mainly univar~2000
VARStatisticalMultivariate AR; models linear interdep.Multivar linear interactions; interp.Assumes linearity; needs stationarity~1980
TAR/SETAR/STARStatisticalThreshold AR; regime-switching; nonlinearNonlinear univar w/ regimesComplex thresholds; mainly univar~1978
INLABayesian Stat.Approx. Bayesian inference; latent GaussianComplex models; hierarchy; uncertainty (prob.)Approx. method; learning curve~2009
ProphetStatistical/Curve FitDecompose trend/season/holidays; BayesianUnivar; strong season/holidays; robust; prob.Less accurate on some benchmarks~2017
Machine Learning & DL(Often need more data; less interpretable)(Can model complex nonlinearity/interactions)(Compute intensive; tuning crucial)
Tree-based (RF, XGB…)MLUses lagged/derived features in trees/ensemblesNonlinearity/interactions; feature imp.; robustNeeds features; no trend extrap.~1984+
SVRMLSVM for regression; uses tolerance marginRobust to outliers; high-dim featuresLess intuitive; kernel/param sensitive~1996
Gaussian Processes (GP)Bayesian MLNon-parametric; models distribution over func.Probabilistic; complex nonlinear; flex.Slow (cubic); kernel tuning difficult~2006
MLPDLFeedforward NN; needs lagged featuresGeneral nonlinear; covariatesNeeds features; tuning; can overfit~1980s
RNNDLNN w/ loops for sequence processingSequential data; time dependenciesVanishing gradients; often outperformed~1980s
LSTMDLRNN w/ gates for long dependenciesComplex seq; long dependency; multivarNeeds data; slow; tuning; can overfit~1997
GRUDLSimpler LSTM variant; similar perf.Like LSTM; potentially fasterLike LSTM; needs data; tuning~2014
CNN (1D)DLUses convolutions for sequence feature extractionFeature extraction; fast pattern recog.Less natural for long dependencies~1989/2012
DeepAR/DeepVARDLAutoregressive RNN outputs distribution paramsProbabilistic forecast; covariates; globalNeeds lots of data; complex; slow train~2017
N-BEATSDLNon-recurrent NN; basis expansion; interp.Univar; state-of-art M4/M3; interp.Mainly univar; compute intensive~2019
Transformer VariantsDLSelf-attention mechanism; parallel processingLong dependencies; parallel; multivarData hungry; quadratic complexity~2017+
SamformerDLTransformer variant(Specific capabilities TBD)(Likely transformer limitations)Recent
TabPFN (Time Series)DLTransformer for small tabular data; zero-shot TSSmall datasets; little tuning neededNewer; focus on specific niche~2024

Additional notes

For time-series forecasting, we can either use

  • Deep learning
  • Traditional ML/stats methods
  • “In my projects, DL models outperform both statistical and ML methods in datasets with higher frequencies (hourly or more). I use TFT, NHITS, and a customized TSMixer. The most underrated statistical model that I often use is DynamicOptimizedTheta.”
  • LLM Based

    The fundamental challenge is that LLMs like O1 and Claude 3.5 simply aren’t built for the unique structures of tabular data. When processing tables through LLMs, the inefficiencies quickly become apparent - tokenizing a 10,000 x 100 table as a sequence and numerical values as tokens creates massive inefficiencies.

    There’s some interesting work on using LLMs for tabular data (TabLLM: TabLLM: Few-shot Classification of Tabular Data with Large Language Models), but this only works for datasets with tens of samples rather than the thousands of rows needed in real-world applications.

    What o1 and other LLMs typically do is wrap around existing tabular tools like XGBoost or scikit-learn. While this works, they’re ultimately constrained by these tools’ limitations. We’re taking a fundamentally different approach - building foundation models that natively understand tabular relationships and patterns. Our approach combines the benefits of foundation models with architectures specifically designed for tabular data structures.

Things ppl say

  • An aha moment for me was realizing that the way you can think of anomaly models working is that they’re effectively forecasting the next N steps, and then noticing when the actual measured values are “different enough” from the expected. This is simple to draw on a whiteboard for one signal but when it’s multi variate, pretty neat that it works.