Key Features

Two Detection Modes

This system provides two complementary detection capabilities:

Short-Term Detection: Real-time anomaly detection (hours) using dual-verification
Long-Term Health Monitoring: Sensor health tracking (days/weeks) for chronic issues 🆕

Dual-Verification Strategy (Short-Term)

The core innovation of the short-term detection system is its ability to distinguish between device failures and extreme weather events through a two-step verification process.

Why This Matters

Traditional anomaly detection systems generate numerous false alarms when extreme weather events occur, because they cannot distinguish between:

Genuine Equipment Failure: Only one station is malfunctioning
Extreme Weather Event: Multiple stations show similar anomalous patterns

Our dual-verification approach solves this by combining:

Temporal Analysis (Self-Check): "Is this station behaving differently than usual?"
Spatial Verification (Neighbor-Check): "Are nearby stations behaving similarly?"

Multi-Method Detection

The system supports seven different temporal detection algorithms:

Method	Best For	Computational Cost
ARIMA	Complex trends, seasonal patterns	High
3-Sigma	Quick outlier detection	Low
MAD	Robust to outliers	Medium
IQR	Exploratory analysis	Low
Isolation Forest	Multidimensional patterns	High
STL	Strong seasonality	High
LOF	Density-based outliers	Medium

Recommended Method

For weather data, ARIMA provides the best balance between accuracy and false alarm reduction.

Real-Time Processing

Streaming Architecture

Ingestion Frequency: Data collected every 10 minutes
Detection Window: Configurable (default: 6 hours)
Sliding Stride: Moves forward with each new data point
Latency: Near real-time detection (< 1 minute processing time)

Memory Efficiency

The sliding window mechanism ensures:

Constant Memory Usage: O(1) regardless of database size
Fast Query Performance: Only queries relevant time ranges
Scalable Storage: Old data remains accessible but not loaded into memory

Spatial Intelligence

Neighbor Detection

The system automatically:

Calculates distances between all station pairs
Identifies neighbors within 100km radius
Computes correlation coefficients during anomalies
Interpolates missing data to ensure robust comparison

Correlation Thresholds

Correlation	Interpretation	Action
> 0.6	High correlation - Weather event	Ignore
0.3 - 0.6	Uncertain - Requires manual review	Flag as "Suspected"
< 0.3	Low correlation - Device failure	Alert

Missing Data Handling

When neighbor data has gaps, the system uses linear interpolation to fill missing values before computing correlations.

Scalable Database Backend

SQLite Mode (Default)

Perfect for:

Standalone deployment
Development and testing
Single-server installations
< 100 stations

TimescaleDB Mode (Enterprise)

Recommended for:

Multi-server deployment
100 stations
Years of historical data
Advanced analytics queries

Migration between backends requires minimal code changes - only the connection string needs updating.

Interactive Visualization

The system generates:

Station Network Maps: Interactive HTML maps showing station locations and neighbor connections
Anomaly Reports: JSON format for integration with monitoring dashboards
Console Output: Human-readable summaries for manual inspection

View an example: Station Network Map

API-First Design

While currently used as a command-line tool, the detection engine is designed with clear interfaces:

Input: Time window + detection parameters
Output: Structured anomaly reports
Future-ready: Can be easily wrapped in a REST API

Long-Term Health Monitoring 🆕

Overview

In addition to real-time anomaly detection, the system now provides long-term health monitoring to detect chronic sensor problems that develop over days or weeks.

What It Detects

Stalled Sensors

Detects sensors that are physically stuck or malfunctioning:

Metric: Zero Ratio - percentage of zero readings
Threshold: > 30% zero values over analysis period
Common Cause: Wind speed sensors stuck at zero due to mechanical failure
Example: Station "grevena" showed 71.6% zero readings over 7 days

Data Loss

Identifies communication failures or sensor outages:

Metric: Null Ratio - percentage of missing observations
Threshold: > 50% missing data
Common Cause: Network issues, power failures, sensor disconnection
Impact: Unreliable data for analysis and forecasting

Sensor Degradation

Flags sensors that are stuck or not responding to environmental changes:

Metric: Variance - statistical measure of data variability
Threshold: < 0.1 for variables that should naturally fluctuate
Common Cause: Sensor aging, calibration drift, physical obstruction
Example: Wind sensor showing constant low values despite changing conditions

Data Completeness

Tracks overall data availability per station:

Metric: Percentage of expected observations received
Expected: ~144 observations per day (10-minute intervals)
Analysis: Shows trends in data reliability over time
Use Case: Identify stations requiring maintenance

Usage

# Check all stations for the last 7 days
python anomaly_detector.py --health-check --days 7

# Check specific station over 30 days
python anomaly_detector.py --health-check --days 30 --station grevena

# Generate JSON report for monitoring integration
python anomaly_detector.py --health-check --days 7 --save health_report.json

Output Format

Console Summary

Station              Status       Completeness    Issues
--------------------------------------------------------------------------------
grevena              🔴 CRITICAL  58.0%           1 problems
  └─ wind_speed: High zero ratio (71.6%) - sensor may be stalled
dodoni               ✅ HEALTHY   57.6%           0 problems
volos                ✅ HEALTHY   57.9%           0 problems

JSON Report

{
  "station_id": "grevena",
  "analysis_period_days": 7,
  "data_completeness": 0.58,
  "total_data_points": 585,
  "overall_status": "critical",
  "variable_reports": [
    {
      "variable": "wind_speed",
      "zero_ratio": 0.716,
      "null_ratio": 0.0,
      "variance": 1.37,
      "issues": ["High zero ratio (71.6%) - sensor may be stalled"],
      "severity": "critical"
    }
  ]
}

Severity Levels

Level	Criteria	Action
Healthy	All metrics within normal ranges	Routine monitoring
Warning	Minor issues detected	Schedule inspection
Critical	Severe problems detected	Immediate maintenance required

Integration

The JSON reports are designed for easy integration with:

Monitoring Dashboards: Grafana, Kibana, custom dashboards
Alerting Systems: Email, Slack, PagerDuty notifications
Maintenance Scheduling: Automated ticket creation
Quality Assurance: Long-term performance tracking

Complementary to Short-Term Detection

Long-term health monitoring complements real-time detection:

Short-Term: Catches sudden failures (sensor crash, extreme events)
Long-Term: Identifies gradual degradation (sensor drift, increasing data loss)
Together: Comprehensive coverage of all failure modes

Extensibility

Adding new features is straightforward:

New Detection Methods: Implement the TemporalDetector interface
New Variables: Add to the database schema and detector configuration
New Spatial Methods: Extend the SpatialVerifier class
New Data Sources: Replace the collector module
New Health Metrics: Extend the HealthChecker class with custom thresholds