Time Series Data at Scale

Dec 20, 2024 Metasphere Engineering 11 min read

You start with PostgreSQL because you already run it. The monitoring data goes into a metrics table with a timestamp column, a metric name, a few label columns, and a value. Works fine for the first month when you’re tracking 500 time series from 10 services. A filing cabinet for a small office.

Then the Kubernetes migration happens. Suddenly 200 pods are emitting 30 metrics each at 15-second intervals. The metrics table hits 400 million rows. The filing cabinet for a small office is now handling the mail for a stadium. VACUUM takes 45 minutes. Dashboard queries that returned in 200ms now take 12 seconds. B-tree indexes are fighting the write load for I/O. Your DBA sends a pointed email about that one table eating most of the database’s I/O budget.

Key takeaways

PostgreSQL hits the wall around 100M time series rows. The filing cabinet for a stadium. VACUUM takes 45 minutes. Dashboards go from 200ms to 12 seconds. B-tree indexes fight the write load for I/O.
Purpose-built TSDBs use columnar storage and time-partitioning to handle write-heavy, append-only workloads that relational engines were never designed for.
Downsampling and retention policies are mandatory, not optional. Raw 15-second data older than 30 days should be downsampled to 5-minute averages. Storage costs drop by 20x or more.
Cardinality explosion kills TSDBs. A single unbounded label like user_id multiplies your series count into the millions. One filing drawer per user. Monitor active series. Alert before the index exceeds memory.
Which TSDB matters less than getting partitioning, retention, and cardinality right. Prometheus, VictoriaMetrics, InfluxDB, ClickHouse all work. The design decisions around them determine success or failure.

Why General-Purpose Databases Break Under Time Series Workloads

A medium infrastructure footprint: 100 nodes, 50 pods each, 20 metrics at 15-second intervals. That generates 576 million data points per day from monitoring alone.

B-tree indexes were designed for random access patterns, not append-only, time-ordered, high-throughput writes. Forcing time series data through a relational engine means you’re fighting the storage architecture itself. Purpose-built TSDBs use time-based partitioning, in-memory write buffers, and compression algorithms (delta encoding on timestamps, XOR on values) for far better storage efficiency. A PostgreSQL row storing a single metric sample consumes roughly 16 bytes. VictoriaMetrics compresses that same sample to 1-2 bytes.

Anti-pattern

Don’t: Store time series data in a general-purpose relational database with B-tree indexes and no partitioning. Writes compete with reads for I/O, VACUUM becomes a bottleneck, and query latency degrades exponentially with table size.

Do: Use a purpose-built TSDB with time-based partitioning and columnar compression. If you must stay on PostgreSQL, use TimescaleDB’s hypertable extension for automatic time partitioning.

Cardinality: The Silent TSDB Killer

The Cardinality Explosion Every label on a metric multiplies the total series count by that label’s unique values. http_request_duration_seconds with service(10) + endpoint(50) + status_code(5) = 2,500 series. Manageable. Add user_id(80,000) = 200 million series. The TSDB’s index exceeds available memory, and query latency jumps from milliseconds to seconds. The explosion is invisible until it hits.

Unbounded labels are the root cause of almost every TSDB outage. user_id, session_id, request_id. These fields create millions of new series per day. The correct architecture for request-level granularity is distributed tracing, not metrics. Metrics are for aggregated trends. Traces handle individual requests. Blurring that boundary is the most common mistake in observability implementations.

Label	Unique Values	Cumulative Cardinality
Base metric (e.g. http_requests_total)	1	1 series
+ method (GET, POST, PUT, DELETE)	4	4 series
+ status_code (200, 201, 400, 404, 500)	5	20 series
+ endpoint (/api/v1/users, /api/v1/orders, …)	100	2,000 series
+ user_id (unbounded)	100,000	200,000,000 series

Labels multiply, not add. One unbounded label (user_id, request_id, trace_id) turns a 20-series metric into 200 million. The TSDB falls over, and nobody understands why until they check cardinality.

Controls that catch it before it happens:

Control	Implementation	Effect
Label allowlisting	Prometheus `metric_relabel_configs` drop unlisted labels	Prevents unbounded labels from entering the TSDB
Series limits per scrape	`sample_limit` in scrape config	Hard ceiling on new series per target
Active series monitoring	Alert on `prometheus_tsdb_head_series`	Early warning before index exhaustion
Label value bucketing	Replace high-cardinality values with ranges (e.g., latency buckets)	Bounded cardinality with minimal information loss

Query Optimization and Recording Rules

A Grafana panel scanning 30 days of raw data with no aggregation will DDoS your own monitoring stack. During an incident. When you need it most.

Three query disciplines prevent this. Aggregation functions must match the visualization. A 7-day chart doesn’t need 40 million raw points. Label matchers reduce scan scope early by filtering on specific services or endpoints before aggregation. Step intervals should align to the time range: 15-second steps for a 1-hour window, 5-minute steps for a 24-hour window, 1-hour steps for a 7-day view.

Recording rules are the underrated performance lever. They pre-compute expensive aggregations and store results as new time series. A dashboard panel that runs rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m]) across 200 services on every page load can be replaced by a recording rule that computes and stores service:http_request_duration_seconds:mean5m continuously. Multi-second dashboard loads become sub-second responses. An afternoon of configuration for a huge improvement in usability.

Recording rule configuration example for Prometheus

Recording rules belong in a separate rule file loaded by Prometheus. Group related rules together and evaluate them at an interval matching your dashboard refresh rate (typically 30s or 1m).

groups:
  - name: service_latency_aggregations
    interval: 30s
    rules:
      - record: service:http_request_duration_seconds:mean5m
        expr: |
          rate(http_request_duration_seconds_sum[5m])
            / rate(http_request_duration_seconds_count[5m])
      - record: service:http_request_duration_seconds:p99_5m
        expr: |
          histogram_quantile(0.99,
            rate(http_request_duration_seconds_bucket[5m]))

Audit dashboards quarterly for panels querying more data than they display. Any panel running a raw PromQL query that touches more than 10,000 series is a candidate for a recording rule.

The full end-to-end architecture:

Architecture Layer	Component	Purpose	Scaling Consideration
Collection	Prometheus / OTel Collector	Scrape or receive metrics from services	Horizontal: shard by target. Vertical: increase scrape interval at scale
Short-term storage	Prometheus local TSDB	Fast queries for recent data (hours to days)	2-week retention typical. Beyond that, use long-term storage
Long-term storage	VictoriaMetrics / Thanos / Cortex	Durable, compressed, queryable across months	Tiered: raw (7d) + 1min aggregate (90d) + hourly (2y)
Downsampling	VictoriaMetrics vmagent / Thanos compactor	Reduce cardinality and storage for old data	Automatic. Configure retention tiers at setup, not after the bill arrives
Query	Grafana + PromQL / MetricsQL	Dashboards, alerting, ad-hoc exploration	Query performance degrades with cardinality. Pre-aggregate high-cardinality metrics
Alerting	Alertmanager / Grafana Alerting	Route alerts based on severity and ownership	Dedup, group, route. Avoid alert storms by grouping related metrics

Downsampling and Retention Tiers

10,000 active series at 15-second resolution produces 21 billion data points per year. Without retention policies, storage costs just keep growing. Three tiers solve this.

Tier	Resolution	Retention	Use case	Storage reduction
Raw	15 seconds	15-30 days	Incident investigation, debugging	Baseline
Medium	5 minutes	6-12 months	Capacity planning, SLA reporting	~20x reduction
Long-term	1 hour	2-5 years	Trend analysis, year-over-year comparison	~240x reduction

Configure downsampling before the storage bill arrives, not after. VictoriaMetrics supports it natively with -retentionPeriod and downsampling flags. Thanos uses a compactor against object storage. Data engineering teams should plan compute capacity for the downsampling pipeline at setup time, not as an afterthought when storage costs spike.

Alerting Architecture on Time Series Data

Alert on symptoms, not causes. An error rate exceeding 1% captures every failure mode, including ones nobody anticipated. A log-based alert for a specific error string captures only the failure someone predicted.

Prerequisites

SLO targets defined for each critical service (availability, latency percentiles)
Recording rules pre-computing SLI metrics at 30-second evaluation intervals
Alert routing configured by severity (page for critical, Slack for warning, dashboard for info)
At least 2 weeks of baseline data for burn-rate calculation
Documented response procedures for each alerting rule

Multi-burn-rate alerting gets rid of the false-positive noise that comes with simple threshold alerts. A 5-minute window detects acute spikes. A 1-hour window detects sustained degradation. Requiring both windows to breach before paging cuts noise way down. Google’s SRE workbook formalizes this approach, and it works in practice as well as on paper.

When the alert queue is always full, investigation quality collapses. Teams stop triaging and start ignoring. Every alert should have a documented response. If an alert fires repeatedly without requiring human action, either auto-remediate it or eliminate it. Alerts that train people to ignore alerts are worse than no alerts at all.

What the Industry Gets Wrong About Time Series Data

“PostgreSQL with TimescaleDB handles everything.” TimescaleDB extends PostgreSQL with time-partitioning and compression. It works well up to moderate scale. At high cardinality (millions of unique label combinations) and high write throughput (100K+ samples per second), purpose-built TSDBs like ClickHouse or VictoriaMetrics outperform it by a wide margin. The right tool depends on scale and workload characteristics, not familiarity.

“Store everything at full resolution.” Storing 15-second data for 2 years takes vastly more storage than downsampled equivalents. Raw resolution is valuable for the last 7-30 days when you’re debugging active issues. Beyond that, 5-minute averages serve every dashboard and alert use case. Retention policies with downsampling collapse storage costs without losing analytical value.

Our take Get retention policies and cardinality controls right before choosing the TSDB engine. A well-configured Prometheus with 30-day retention and strict label cardinality limits outperforms an expensive ClickHouse cluster with no retention policy and unbounded cardinality. The design decisions matter more than the engine choice. Start with Prometheus or VictoriaMetrics for infrastructure monitoring, TimescaleDB if you need SQL joins, and only reach for ClickHouse when query complexity or analytical workloads demand it.

That metrics table devouring your database’s I/O budget? With the right TSDB, tiered retention, cardinality controls, and SLO-based alerting, your monitoring scales with the infrastructure instead of against it. Performance and capacity engineering turns metric thresholds into capacity decisions before they become incidents. The DBA stops sending pointed emails. The dashboards load in under a second. And the 576 million daily data points become a well-managed pipeline instead of a storage crisis.

Frequently Asked Questions

What makes time series data architecturally different from relational data?

Three properties separate time series from relational workloads. Writes arrive in high-volume, time-ordered bursts (thousands of points per second) rather than random inserts. Queries are almost always range scans over time windows, not point lookups. Data has explicit retention tiers: raw data for 15-30 days, 5-minute averages for 12 months, hourly aggregates for 5+ years. PostgreSQL works at low scale but degrades on both write throughput and cardinality at production monitoring volumes.

What is the cardinality explosion problem in time series databases?

Cardinality is the count of unique time series: every combination of metric name plus label values. A metric like http_request_duration with labels for service, endpoint, status_code, and region can produce hundreds of thousands of combinations. Adding unbounded labels like user_id or request_id creates millions of series per day. Most TSDBs degrade or stop indexing under that load. Prometheus enforces a per-scrape cardinality limit of roughly 100,000 to prevent this.

What is downsampling and why is it required for retention?

Downsampling replaces raw data with statistical summaries at coarser resolutions. Raw 15-second metrics for one year produce about 2 million points per series. Downsampled to 5-minute averages: 105,000 points. Hourly averages for 5 years: 43,800 points. Tiered retention with raw, medium, and coarse resolution gives full precision for incidents and trend visibility for planning without unbounded storage growth.

What is the difference between VictoriaMetrics and InfluxDB?

VictoriaMetrics is a Prometheus-compatible TSDB optimized for storage efficiency and query performance. It uses MetricsQL (a Prometheus superset) and handles high cardinality better than Prometheus itself. InfluxDB uses Flux, supports multiple field values per timestamp, and fits IoT and financial use cases better. For infrastructure monitoring at scale, VictoriaMetrics is the default choice. For IoT or custom time series applications, InfluxDB is the stronger fit.

When is TimescaleDB sufficient vs. a dedicated TSDB?

TimescaleDB works when your stack already runs PostgreSQL, write volume stays under 100K points per second per node, cardinality is bounded, and you need SQL joins alongside time series queries. Switch to a dedicated TSDB when throughput exceeds TimescaleDB’s ceiling, cardinality climbs above 5 million unique series, or you need native Prometheus integration. PostgreSQL operational familiarity has real value. Don’t switch unless volume or cardinality forces it.