Time Series Data at Scale
You start with PostgreSQL because you already run it. The monitoring data goes into a metrics table with a timestamp column, a metric name, a few label columns, and a value. Works fine for the first month when you’re tracking 500 time series from 10 services. A filing cabinet for a small office.
Then the Kubernetes migration happens. Suddenly 200 pods are emitting 30 metrics each at 15-second intervals. The metrics table hits 400 million rows. The filing cabinet for a small office is now handling the mail for a stadium. VACUUM takes 45 minutes. Dashboard queries that returned in 200ms now take 12 seconds. B-tree indexes are fighting the write load for I/O. Your DBA sends a pointed email about that one table eating most of the database’s I/O budget.
- PostgreSQL hits the wall around 100M time series rows. The filing cabinet for a stadium. VACUUM takes 45 minutes. Dashboards go from 200ms to 12 seconds. B-tree indexes fight the write load for I/O.
- Purpose-built TSDBs use columnar storage and time-partitioning to handle write-heavy, append-only workloads that relational engines were never designed for.
- Downsampling and retention policies are mandatory, not optional. Raw 15-second data older than 30 days should be downsampled to 5-minute averages. Storage costs drop by 20x or more.
- Cardinality explosion kills TSDBs. A single unbounded label like
user_idmultiplies your series count into the millions. One filing drawer per user. Monitor active series. Alert before the index exceeds memory. - Which TSDB matters less than getting partitioning, retention, and cardinality right. Prometheus, VictoriaMetrics, InfluxDB, ClickHouse all work. The design decisions around them determine success or failure.
Why General-Purpose Databases Break Under Time Series Workloads
A medium infrastructure footprint: 100 nodes, 50 pods each, 20 metrics at 15-second intervals. That generates 576 million data points per day from monitoring alone.
B-tree indexes were designed for random access patterns, not append-only, time-ordered, high-throughput writes. Forcing time series data through a relational engine means you’re fighting the storage architecture itself. Purpose-built TSDBs use time-based partitioning, in-memory write buffers, and compression algorithms (delta encoding on timestamps, XOR on values) for far better storage efficiency. A PostgreSQL row storing a single metric sample consumes roughly 16 bytes. VictoriaMetrics compresses that same sample to 1-2 bytes.
Don’t: Store time series data in a general-purpose relational database with B-tree indexes and no partitioning. Writes compete with reads for I/O, VACUUM becomes a bottleneck, and query latency degrades exponentially with table size.
Do: Use a purpose-built TSDB with time-based partitioning and columnar compression. If you must stay on PostgreSQL, use TimescaleDB’s hypertable extension for automatic time partitioning.
Cardinality: The Silent TSDB Killer
http_request_duration_seconds with service(10) + endpoint(50) + status_code(5) = 2,500 series. Manageable. Add user_id(80,000) = 200 million series. The TSDB’s index exceeds available memory, and query latency jumps from milliseconds to seconds. The explosion is invisible until it hits.Unbounded labels are the root cause of almost every TSDB outage. user_id, session_id, request_id. These fields create millions of new series per day. The correct architecture for request-level granularity is distributed tracing, not metrics. Metrics are for aggregated trends. Traces handle individual requests. Blurring that boundary is the most common mistake in observability
implementations.
| Label | Unique Values | Cumulative Cardinality |
|---|---|---|
| Base metric (e.g. http_requests_total) | 1 | 1 series |
| + method (GET, POST, PUT, DELETE) | 4 | 4 series |
| + status_code (200, 201, 400, 404, 500) | 5 | 20 series |
| + endpoint (/api/v1/users, /api/v1/orders, …) | 100 | 2,000 series |
| + user_id (unbounded) | 100,000 | 200,000,000 series |
Labels multiply, not add. One unbounded label (user_id, request_id, trace_id) turns a 20-series metric into 200 million. The TSDB falls over, and nobody understands why until they check cardinality.
Controls that catch it before it happens:
| Control | Implementation | Effect |
|---|---|---|
| Label allowlisting | Prometheus metric_relabel_configs drop unlisted labels | Prevents unbounded labels from entering the TSDB |
| Series limits per scrape | sample_limit in scrape config | Hard ceiling on new series per target |
| Active series monitoring | Alert on prometheus_tsdb_head_series | Early warning before index exhaustion |
| Label value bucketing | Replace high-cardinality values with ranges (e.g., latency buckets) | Bounded cardinality with minimal information loss |
Query Optimization and Recording Rules
A Grafana panel scanning 30 days of raw data with no aggregation will DDoS your own monitoring stack. During an incident. When you need it most.
Three query disciplines prevent this. Aggregation functions must match the visualization. A 7-day chart doesn’t need 40 million raw points. Label matchers reduce scan scope early by filtering on specific services or endpoints before aggregation. Step intervals should align to the time range: 15-second steps for a 1-hour window, 5-minute steps for a 24-hour window, 1-hour steps for a 7-day view.
Recording rules are the underrated performance lever. They pre-compute expensive aggregations and store results as new time series. A dashboard panel that runs rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m]) across 200 services on every page load can be replaced by a recording rule that computes and stores service:http_request_duration_seconds:mean5m continuously. Multi-second dashboard loads become sub-second responses. An afternoon of configuration for a huge improvement in usability.
Recording rule configuration example for Prometheus
Recording rules belong in a separate rule file loaded by Prometheus. Group related rules together and evaluate them at an interval matching your dashboard refresh rate (typically 30s or 1m).
groups:
- name: service_latency_aggregations
interval: 30s
rules:
- record: service:http_request_duration_seconds:mean5m
expr: |
rate(http_request_duration_seconds_sum[5m])
/ rate(http_request_duration_seconds_count[5m])
- record: service:http_request_duration_seconds:p99_5m
expr: |
histogram_quantile(0.99,
rate(http_request_duration_seconds_bucket[5m]))
Audit dashboards quarterly for panels querying more data than they display. Any panel running a raw PromQL query that touches more than 10,000 series is a candidate for a recording rule.
The full end-to-end architecture:
| Architecture Layer | Component | Purpose | Scaling Consideration |
|---|---|---|---|
| Collection | Prometheus / OTel Collector | Scrape or receive metrics from services | Horizontal: shard by target. Vertical: increase scrape interval at scale |
| Short-term storage | Prometheus local TSDB | Fast queries for recent data (hours to days) | 2-week retention typical. Beyond that, use long-term storage |
| Long-term storage | VictoriaMetrics / Thanos / Cortex | Durable, compressed, queryable across months | Tiered: raw (7d) + 1min aggregate (90d) + hourly (2y) |
| Downsampling | VictoriaMetrics vmagent / Thanos compactor | Reduce cardinality and storage for old data | Automatic. Configure retention tiers at setup, not after the bill arrives |
| Query | Grafana + PromQL / MetricsQL | Dashboards, alerting, ad-hoc exploration | Query performance degrades with cardinality. Pre-aggregate high-cardinality metrics |
| Alerting | Alertmanager / Grafana Alerting | Route alerts based on severity and ownership | Dedup, group, route. Avoid alert storms by grouping related metrics |
Downsampling and Retention Tiers
10,000 active series at 15-second resolution produces 21 billion data points per year. Without retention policies, storage costs just keep growing. Three tiers solve this.
| Tier | Resolution | Retention | Use case | Storage reduction |
|---|---|---|---|---|
| Raw | 15 seconds | 15-30 days | Incident investigation, debugging | Baseline |
| Medium | 5 minutes | 6-12 months | Capacity planning, SLA reporting | ~20x reduction |
| Long-term | 1 hour | 2-5 years | Trend analysis, year-over-year comparison | ~240x reduction |
Configure downsampling before the storage bill arrives, not after. VictoriaMetrics supports it natively with -retentionPeriod and downsampling flags. Thanos uses a compactor against object storage. Data engineering
teams should plan compute capacity for the downsampling pipeline at setup time, not as an afterthought when storage costs spike.
Alerting Architecture on Time Series Data
Alert on symptoms, not causes. An error rate exceeding 1% captures every failure mode, including ones nobody anticipated. A log-based alert for a specific error string captures only the failure someone predicted.
- SLO targets defined for each critical service (availability, latency percentiles)
- Recording rules pre-computing SLI metrics at 30-second evaluation intervals
- Alert routing configured by severity (page for critical, Slack for warning, dashboard for info)
- At least 2 weeks of baseline data for burn-rate calculation
- Documented response procedures for each alerting rule
Multi-burn-rate alerting gets rid of the false-positive noise that comes with simple threshold alerts. A 5-minute window detects acute spikes. A 1-hour window detects sustained degradation. Requiring both windows to breach before paging cuts noise way down. Google’s SRE workbook formalizes this approach, and it works in practice as well as on paper.
When the alert queue is always full, investigation quality collapses. Teams stop triaging and start ignoring. Every alert should have a documented response. If an alert fires repeatedly without requiring human action, either auto-remediate it or eliminate it. Alerts that train people to ignore alerts are worse than no alerts at all.
What the Industry Gets Wrong About Time Series Data
“PostgreSQL with TimescaleDB handles everything.” TimescaleDB extends PostgreSQL with time-partitioning and compression. It works well up to moderate scale. At high cardinality (millions of unique label combinations) and high write throughput (100K+ samples per second), purpose-built TSDBs like ClickHouse or VictoriaMetrics outperform it by a wide margin. The right tool depends on scale and workload characteristics, not familiarity.
“Store everything at full resolution.” Storing 15-second data for 2 years takes vastly more storage than downsampled equivalents. Raw resolution is valuable for the last 7-30 days when you’re debugging active issues. Beyond that, 5-minute averages serve every dashboard and alert use case. Retention policies with downsampling collapse storage costs without losing analytical value.
That metrics table devouring your database’s I/O budget? With the right TSDB, tiered retention, cardinality controls, and SLO-based alerting, your monitoring scales with the infrastructure instead of against it. Performance and capacity engineering
turns metric thresholds into capacity decisions before they become incidents. The DBA stops sending pointed emails. The dashboards load in under a second. And the 576 million daily data points become a well-managed pipeline instead of a storage crisis.