Analytics Engineering: Why the Numbers Disagree
The quarterly business review is underway. The CFO’s dashboard shows one revenue figure for Q3. The CRO’s dashboard shows a number several percent lower. Both pull from Salesforce. The CFO’s Looker tile counts revenue at contract signature. The CRO’s Tableau worksheet counts it at first payment. Both are technically correct by their own definitions. Neither analyst knows the other’s logic exists.
The executive meeting that was supposed to discuss growth strategy turns into a 40-minute argument about which spreadsheet is right. Two chefs, same ingredients, two different soups. Neither one used a recipe.
You’ve been in this meeting. Everyone has. Revenue logic lives in a Looker calculated field. A Tableau formula. A Salesforce filter. Three analysts’ saved queries. No single source of truth. No version history. Nobody can answer “when did this number change and why?”
- Revenue defined in five tools means five numbers. A semantic layer collapses them to one. Not by convention. By architecture.
- Three model layers (staging, intermediate, mart) separate cleaning from business logic from serving. Skip a layer and you pay for it in debugging time later.
- dbt tests block deploys before bad data reaches dashboards. A not-null test on a join key catches null rates that quietly underreport revenue.
- A semantic layer collapses metric disagreements by design. One definition of revenue, consumed by every tool, updated through version-controlled PR review.
- Staff the function before you outgrow it. One analytics engineer manages 30-50 models. Beyond that, PR review turnaround and test coverage start slipping.
When your revenue definition lives in a versioned, tested dbt
model called int_revenue_recognized, the debate about which number is right becomes impossible. One recipe. One soup.
The Modular Layer Architecture
Three dbt layers, each with a distinct job. A kitchen with stations.
Staging models are the prep station. Wash, peel, chop. One model per source table. Zero business logic. When Stripe changes charge_amt to amount, you update stg_stripe_charges and every downstream model keeps working. The rename is isolated to a single file. You relabeled the ingredient bin. Nobody’s recipe broke.
Intermediate models are the cooking station. This is where business logic lives: revenue recognition, LTV calculation, attribution, session construction. Version-controlled, testable, peer-reviewed. When the CFO decides revenue recognition shifts from contract signature to first payment, the change happens in one intermediate model. One PR. One review. Every downstream consumer reflects the new logic after merge. One recipe updated. Every plate changes.
Mart models are the plating station. Aggregated, denormalized, and built for BI tools. Pairing these with CI/CD pipelines means a PR that breaks the revenue model gets a red check in CI, not a panicked Slack thread two weeks later.
-- models/staging/stripe/stg_stripe_charges.sql
-- One source table, one staging model. No business logic.
with source as (
select * from {{ source('stripe', 'charges') }}
),
renamed as (
select
id as charge_id,
cast(amount as decimal(12, 2)) / 100 as charge_amount,
currency,
customer as customer_id,
cast(created as timestamp) as charged_at,
status
from source
where status != 'failed' -- Filter junk at the source
)
select * from renamed
Architecture without tests is an organized pantry with no quality control. Nice labels, bad ingredients.
Tests as Data Contracts
A failing test blocks model deployment. Broken data never reaches dashboards. This is the single highest-ROI investment in analytics engineering, and most teams skip it “because they’ll add tests later.” (They won’t.)
One production dashboard underreported revenue by 12% because a join key had 12% nulls. Not a rounding error. A board-level problem that persisted for weeks before anyone traced it to the join. Tasting the soup before serving would have caught it. A not_null test on that key catches it in CI, before any stakeholder sees the wrong number.
Business logic tests encode domain knowledge that outlives individual team members: sum(refunds) <= sum(charges) catches accounting errors. A “DAU cannot drop more than 30% day-over-day” test catches data pipeline failures
before anyone asks why the business apparently fell off a cliff.
Don’t: Add tests after the model is in production and someone reports a problem. By then, trust is already damaged. You don’t install the smoke detector after the fire.
Do: Add not_null, unique, and accepted_values tests to every model before the first production run. Start with your five most critical tables. The testing framework exists to prevent trust erosion, not repair it.
- dbt project initialized with a clear
models/directory structure (staging, intermediate, mart) - Source freshness checks configured for every raw table (
loaded_at_fielddefined) - CI pipeline running
dbt teston every PR, blocking merge on failure - At least 3 tests per model:
uniqueon primary key,not_nullon primary key, and one business-logic test - YAML schema files next to SQL models containing column descriptions
The Semantic Layer: One Recipe, Every Kitchen
A semantic layer defines metrics once. It sits between the warehouse and BI tools. Instead of each tool cooking its own version of “monthly recurring revenue,” the definition lives in one place. Measure. Time dimension. Filters. Every tool reads the recipe book instead of improvising.
MetricFlow (built into dbt Cloud), Cube (open-source REST/GraphQL API), and LookML (Looker-native) all do this. The specific tool matters less than the principle: one official definition per metric. The Michelin recipe book, not five chefs’ handwritten notes.
Without a semantic layer, that CFO revenue recognition change touches Looker. Tableau. Jupyter. Three analysts’ ad-hoc SQL scripts. Some updates get missed. They always do. With a semantic layer: update one definition, PR review, CI test, deploy. Every consumer reflects the new logic automatically. One change, every kitchen follows.
Data quality testing checks the ingredients. The semantic layer makes sure every chef follows the same recipe.
Documentation and Lineage That Stay Current
Column descriptions live in YAML files right next to the SQL models. dbt docs generate builds a searchable site with lineage graphs, test coverage, and freshness metrics. Docs can’t drift from the code because they live in the same commit. The recipe and the ingredients list are stapled together.
The lineage graph is the quiet superpower. When someone asks “which dashboards break if this source table changes?”, the answer is a graph query, not a manual hunt. Tracing which supplier’s tomatoes went into which dish. New analysts browse the docs site, trace lineage back to raw sources, and understand test coverage before writing their first query. Onboarding in hours instead of weeks.
Making lineage actionable with exposure tracking
dbt exposures let you declare downstream consumers (dashboards, reports, ML models) as first-class objects in the lineage graph. When a staging model changes, the lineage graph shows not just affected intermediate and mart models, but the specific Looker dashboards and Tableau workbooks that will show different numbers. Without exposures, the lineage stops at the mart layer and the “which dashboards break?” question is only half-answered.
When Analytics Engineering Is Overkill
A startup with 3 analysts and 10 dashboard queries doesn’t need dbt. Or a semantic layer. Or a three-layer architecture. A well-organized set of views in the warehouse and clear naming conventions carry you further than you’d think. Don’t build the commercial kitchen for a dinner party.
| Invest in analytics engineering | Skip it (for now) |
|---|---|
| Multiple BI tools querying the same metrics | Single BI tool, single analyst |
| 3+ teams consuming the same data | One team, one domain |
| Metric definitions disputed in meetings | Everyone agrees on the numbers |
| 50+ dashboard queries hitting raw tables | Under 20 queries, all straightforward |
| Regulatory or audit requirements on data lineage | Internal analytics only, low stakes |
The tipping point usually hits around 30-50 dbt models and 10+ regular data consumers. Below that, CI pipelines, testing frameworks, and semantic layers cost more in overhead than the problems they solve. Above it, the cost of skipping them grows fast.
What the Industry Gets Wrong About Analytics Engineering
“dbt is analytics engineering.” dbt is a tool. Analytics engineering is a discipline. Adopting dbt without changing how you think about data ownership, testing, and metric definitions produces version-controlled spaghetti instead of unversioned spaghetti. The folder structure is cleaner. The numbers are still wrong. Buying a KitchenAid doesn’t make you a baker.
“Analysts can do this on the side.” Analytics engineering is a full-time role. The failure mode is not subtle: test coverage drops, PR reviews pile up, and the backlog grows until the function is effectively abandoned. Part-time analytics engineering is like part-time hygiene. It shows.
“The semantic layer replaces the transformation layer.” A semantic layer defines what metrics mean. It doesn’t clean, join, or model the data underneath. Skip the staging and intermediate layers, define metrics directly on raw tables, and you’ve built a recipe book for unwashed ingredients.
not_null and unique tests to your five most critical tables. Not refactoring into a three-layer architecture. Not building a semantic layer. Tests. They catch the problems that erode trust, and trust is the prerequisite for every other investment in this space. Taste the soup before you redesign the kitchen.Team Structure and Scaling
Where to put them: embed analytics engineers in domain teams for business context, but have them report into a central function for standards and cross-team coordination. Embedded gives better context. Centralized keeps things consistent. You need both. Same reason a restaurant chain needs chefs who know the local menu and a head office that keeps the recipes consistent.
PR review needs two perspectives. Technical: is the SQL fast, are there enough tests? Business: does the logic match the company’s recognition policy? Skip the business review and you get models that run great but mean the wrong thing. Skip the technical review and you get correct definitions scanning terabytes when gigabytes would do. A perfect recipe that takes six hours to cook a sandwich.
Boundary with data engineers : data engineers own infrastructure (pipelines, warehouse, orchestration). Analytics engineers own the transformation layer (metric definitions, entity relationships, consumer access). Handoff point: source tables. Clean split, clear ownership. The supplier delivers ingredients. The chef turns them into meals.
Scaling signals: one analytics engineer manages 30-50 models comfortably. PR review exceeding 3 days, dropping test coverage, and a growing backlog of stakeholder requests are the early signs you need to hire.
That 40-minute argument about which spreadsheet is right? Impossible now. The CFO and CRO pull from the same dbt mart, with revenue recognition logic defined once in version-controlled SQL, tested on every PR, documented in the semantic layer. Two dashboards. One number. Same recipe. The meeting discusses growth strategy instead.