Key Features
Two Detection Modes
This system provides two complementary detection capabilities:
- Short-Term Detection: Real-time anomaly detection (hours) using dual-verification
- Long-Term Health Monitoring: Sensor health tracking (days/weeks) for chronic issues 🆕
Dual-Verification Strategy (Short-Term)
The core innovation of the short-term detection system is its ability to distinguish between device failures and extreme weather events through a two-step verification process.
Why This Matters
Traditional anomaly detection systems generate numerous false alarms when extreme weather events occur, because they cannot distinguish between:
- Genuine Equipment Failure: Only one station is malfunctioning
- Extreme Weather Event: Multiple stations show similar anomalous patterns
Our dual-verification approach solves this by combining:
- Temporal Analysis (Self-Check): "Is this station behaving differently than usual?"
- Spatial Verification (Neighbor-Check): "Are nearby stations behaving similarly?"
Multi-Method Detection
The system supports seven different temporal detection algorithms:
| Method | Best For | Computational Cost |
|---|---|---|
| ARIMA | Complex trends, seasonal patterns | High |
| 3-Sigma | Quick outlier detection | Low |
| MAD | Robust to outliers | Medium |
| IQR | Exploratory analysis | Low |
| Isolation Forest | Multidimensional patterns | High |
| STL | Strong seasonality | High |
| LOF | Density-based outliers | Medium |
Recommended Method
For weather data, ARIMA provides the best balance between accuracy and false alarm reduction.
Real-Time Processing
Streaming Architecture
- Ingestion Frequency: Data collected every 10 minutes
- Detection Window: Configurable (default: 6 hours)
- Sliding Stride: Moves forward with each new data point
- Latency: Near real-time detection (< 1 minute processing time)
Memory Efficiency
The sliding window mechanism ensures:
- Constant Memory Usage: O(1) regardless of database size
- Fast Query Performance: Only queries relevant time ranges
- Scalable Storage: Old data remains accessible but not loaded into memory
Spatial Intelligence
Neighbor Detection
The system automatically:
- Calculates distances between all station pairs
- Identifies neighbors within 100km radius
- Computes correlation coefficients during anomalies
- Interpolates missing data to ensure robust comparison
Correlation Thresholds
| Correlation | Interpretation | Action |
|---|---|---|
| > 0.6 | High correlation - Weather event | Ignore |
| 0.3 - 0.6 | Uncertain - Requires manual review | Flag as "Suspected" |
| < 0.3 | Low correlation - Device failure | Alert |
Missing Data Handling
When neighbor data has gaps, the system uses linear interpolation to fill missing values before computing correlations.
Scalable Database Backend
SQLite Mode (Default)
Perfect for:
- Standalone deployment
- Development and testing
- Single-server installations
- < 100 stations
TimescaleDB Mode (Enterprise)
Recommended for:
- Multi-server deployment
-
100 stations
- Years of historical data
- Advanced analytics queries
Migration between backends requires minimal code changes - only the connection string needs updating.
Interactive Visualization
The system generates:
- Station Network Maps: Interactive HTML maps showing station locations and neighbor connections
- Anomaly Reports: JSON format for integration with monitoring dashboards
- Console Output: Human-readable summaries for manual inspection
View an example: Station Network Map
API-First Design
While currently used as a command-line tool, the detection engine is designed with clear interfaces:
- Input: Time window + detection parameters
- Output: Structured anomaly reports
- Future-ready: Can be easily wrapped in a REST API
Long-Term Health Monitoring 🆕
Overview
In addition to real-time anomaly detection, the system now provides long-term health monitoring to detect chronic sensor problems that develop over days or weeks.
What It Detects
Stalled Sensors
Detects sensors that are physically stuck or malfunctioning:
- Metric: Zero Ratio - percentage of zero readings
- Threshold: > 30% zero values over analysis period
- Common Cause: Wind speed sensors stuck at zero due to mechanical failure
- Example: Station "grevena" showed 71.6% zero readings over 7 days
Data Loss
Identifies communication failures or sensor outages:
- Metric: Null Ratio - percentage of missing observations
- Threshold: > 50% missing data
- Common Cause: Network issues, power failures, sensor disconnection
- Impact: Unreliable data for analysis and forecasting
Sensor Degradation
Flags sensors that are stuck or not responding to environmental changes:
- Metric: Variance - statistical measure of data variability
- Threshold: < 0.1 for variables that should naturally fluctuate
- Common Cause: Sensor aging, calibration drift, physical obstruction
- Example: Wind sensor showing constant low values despite changing conditions
Data Completeness
Tracks overall data availability per station:
- Metric: Percentage of expected observations received
- Expected: ~144 observations per day (10-minute intervals)
- Analysis: Shows trends in data reliability over time
- Use Case: Identify stations requiring maintenance
Usage
# Check all stations for the last 7 days
python anomaly_detector.py --health-check --days 7
# Check specific station over 30 days
python anomaly_detector.py --health-check --days 30 --station grevena
# Generate JSON report for monitoring integration
python anomaly_detector.py --health-check --days 7 --save health_report.json
Output Format
Console Summary
Station Status Completeness Issues
--------------------------------------------------------------------------------
grevena 🔴 CRITICAL 58.0% 1 problems
└─ wind_speed: High zero ratio (71.6%) - sensor may be stalled
dodoni ✅ HEALTHY 57.6% 0 problems
volos ✅ HEALTHY 57.9% 0 problems
JSON Report
{
"station_id": "grevena",
"analysis_period_days": 7,
"data_completeness": 0.58,
"total_data_points": 585,
"overall_status": "critical",
"variable_reports": [
{
"variable": "wind_speed",
"zero_ratio": 0.716,
"null_ratio": 0.0,
"variance": 1.37,
"issues": ["High zero ratio (71.6%) - sensor may be stalled"],
"severity": "critical"
}
]
}
Severity Levels
| Level | Criteria | Action |
|---|---|---|
| Healthy | All metrics within normal ranges | Routine monitoring |
| Warning | Minor issues detected | Schedule inspection |
| Critical | Severe problems detected | Immediate maintenance required |
Integration
The JSON reports are designed for easy integration with:
- Monitoring Dashboards: Grafana, Kibana, custom dashboards
- Alerting Systems: Email, Slack, PagerDuty notifications
- Maintenance Scheduling: Automated ticket creation
- Quality Assurance: Long-term performance tracking
Complementary to Short-Term Detection
Long-term health monitoring complements real-time detection:
- Short-Term: Catches sudden failures (sensor crash, extreme events)
- Long-Term: Identifies gradual degradation (sensor drift, increasing data loss)
- Together: Comprehensive coverage of all failure modes
Extensibility
Adding new features is straightforward:
- New Detection Methods: Implement the
TemporalDetectorinterface - New Variables: Add to the database schema and detector configuration
- New Spatial Methods: Extend the
SpatialVerifierclass - New Data Sources: Replace the collector module
- New Health Metrics: Extend the
HealthCheckerclass with custom thresholds