Real-Time Anomaly Detection

GitHub commit activity License

This is the documentation site for the Real-Time Anomaly Detection service. The service is part of the wider DataGEMS platform.

The Real-Time Anomaly Detection service is designed to monitor meteorological stations and distinguish between genuine device failures and extreme weather events using a dual-verification strategy combining temporal and spatial analysis.

Key Features

Dual-Verification Strategy: Combines temporal self-checks with spatial neighbor verification to minimize false alarms
Long-Term Health Monitoring 🆕: Tracks sensor health over days/weeks to detect chronic issues like stalled sensors, data loss, and degradation
Multi-Method Support: Includes ARIMA, Z-Score, MAD, IQR, Isolation Forest, STL, and LOF detection methods
Real-Time Processing: Streaming architecture with 10-minute data ingestion intervals
Spatial Intelligence: Automatically detects and correlates anomalies across neighboring stations within 100km radius
Scalable Architecture: Supports both SQLite for standalone deployment and TimescaleDB for enterprise scale
Interactive Visualization: Generates station network maps showing spatial relationships
JSON Export: Machine-readable reports for integration with monitoring dashboards

How It Works

The Real-Time Anomaly Detection service provides two complementary detection modes:

Mode 1: Short-Term Anomaly Detection (Hours)

Real-time detection using a two-step verification process:

Step 1: Temporal Detection
Analyzes each station's current readings against its own historical data using time series methods (e.g., ARIMA) to detect deviations from expected patterns.

Step 2: Spatial Verification
Compares the suspect station's behavior with neighboring stations to determine if the anomaly is:

Weather Event: Neighboring stations show similar patterns (high correlation > 0.6)
Device Failure: Only this station is anomalous (low correlation < 0.3)

Mode 2: Long-Term Health Check (Days/Weeks) 🆕

Monitors sensor health over extended periods to detect chronic problems:

Stalled Sensors: Detects sensors stuck at zero (>30% zero readings)
Data Loss: Identifies excessive missing data (>50% loss rate)
Sensor Degradation: Flags abnormally low variance indicating stuck sensors
Completeness Tracking: Monitors overall data quality per station

This mode generates comprehensive JSON reports for integration with monitoring systems.

Data Flow

graph LR
    A[NOA API] -->|Every 10 min| B[Data Collector]
    B -->|Store| C[SQLite/TimescaleDB]
    C -->|Query Window| D[Anomaly Detector]
    D -->|Temporal Check| E{Anomalous?}
    E -->|No| F[Normal]
    E -->|Yes| G[Spatial Verify]
    G -->|High Corr| H[Weather Event]
    G -->|Low Corr| I[Device Failure]

Target Deployment

The service currently monitors 14 meteorological stations operated by the National Observatory of Athens (NOA), with data updates every 10 minutes from the NOA DataGEMS Feed.

Quick Start

# Install dependencies
pip install -r requirements.txt

# Start data collection
./manage_collector.sh start

# Short-term detection (real-time)
python anomaly_detector.py \
  --end "NOW" \
  --window 6 \
  --temporal-method arima \
  --spatial-verify

# Long-term health check (weekly)
python anomaly_detector.py \
  --health-check \
  --days 7 \
  --save health_report.json

For detailed installation instructions, see the Installation Guide.

Data Format

The service processes five core meteorological variables:

Variable	Description	Unit
`temp_out`	Outdoor Temperature	°C
`out_hum`	Outdoor Humidity	%
`wind_speed`	Wind Speed	km/h
`bar`	Barometric Pressure	hPa
`rain`	Rainfall Rate	mm

Architecture Overview

The system follows a pull-based streaming architecture:

Collector: Background daemon fetching data from NOA API every 10 minutes
Database: SQLite for standalone or TimescaleDB for enterprise deployment
Detector: On-demand or scheduled analysis using sliding window mechanism
Reporter: Console and JSON output with detailed anomaly classifications

For more details, see the Architecture documentation.

Support

For questions, issues, or contributions:

GitHub Issues: Report a bug or request a feature
Documentation: Browse this documentation site
FAQ: Check the Frequently Asked Questions