← Back to Insights

Analytics Engineering: Why the Numbers Disagree

Metasphere Engineering 13 min read

The quarterly business review is underway. The CFO’s dashboard shows one revenue figure for Q3. The CRO’s dashboard shows a number several percent lower. Both pull from Salesforce. The CFO’s Looker tile counts revenue at contract signature. The CRO’s Tableau worksheet counts it at first payment. Both are technically correct by their own definitions. Neither analyst knows the other’s logic exists.

The executive meeting that was supposed to discuss growth strategy turns into a 40-minute argument about which spreadsheet is right. Two chefs, same ingredients, two different soups. Neither one used a recipe.

You’ve been in this meeting. Everyone has. Revenue logic lives in a Looker calculated field. A Tableau formula. A Salesforce filter. Three analysts’ saved queries. No single source of truth. No version history. Nobody can answer “when did this number change and why?”

Key takeaways
  • Revenue defined in five tools means five numbers. A semantic layer collapses them to one. Not by convention. By architecture.
  • Three model layers (staging, intermediate, mart) separate cleaning from business logic from serving. Skip a layer and you pay for it in debugging time later.
  • dbt tests block deploys before bad data reaches dashboards. A not-null test on a join key catches null rates that quietly underreport revenue.
  • A semantic layer collapses metric disagreements by design. One definition of revenue, consumed by every tool, updated through version-controlled PR review.
  • Staff the function before you outgrow it. One analytics engineer manages 30-50 models. Beyond that, PR review turnaround and test coverage start slipping.

When your revenue definition lives in a versioned, tested dbt model called int_revenue_recognized, the debate about which number is right becomes impossible. One recipe. One soup.

Two conflicting revenue dashboards converge into a single source of truth via dbt layersAnimated diagram showing a CFO dashboard reporting one revenue figure and a CRO dashboard reporting a different figure side by side. A "Which is correct?" annotation highlights the conflict. Three dbt layers build bottom-up from staging through intermediate to mart, producing a single unified dashboard with one definition and one number.Conflicting Dashboards to Single Source of TruthCFO Dashboard (Looker)Revenue recognized at contract signatureQ3 Revenue14.2MCRO Dashboard (Tableau)Revenue recognized at first paymentQ3 Revenue12.8MWhich number is correct?Staging Layerstg_salesforce_opportunities, stg_stripe_chargesclean + castIntermediate Layerint_revenue_recognized (one definition, tested)business logicMart Layermart_monthly_revenue (aggregated, BI-ready)consumptionUnified DashboardQ3 Revenue: 14.2MOne definition. One number. No debate.

The Modular Layer Architecture

Three dbt layers, each with a distinct job. A kitchen with stations.

Staging models are the prep station. Wash, peel, chop. One model per source table. Zero business logic. When Stripe changes charge_amt to amount, you update stg_stripe_charges and every downstream model keeps working. The rename is isolated to a single file. You relabeled the ingredient bin. Nobody’s recipe broke.

Intermediate models are the cooking station. This is where business logic lives: revenue recognition, LTV calculation, attribution, session construction. Version-controlled, testable, peer-reviewed. When the CFO decides revenue recognition shifts from contract signature to first payment, the change happens in one intermediate model. One PR. One review. Every downstream consumer reflects the new logic after merge. One recipe updated. Every plate changes.

Mart models are the plating station. Aggregated, denormalized, and built for BI tools. Pairing these with CI/CD pipelines means a PR that breaks the revenue model gets a red check in CI, not a panicked Slack thread two weeks later.

-- models/staging/stripe/stg_stripe_charges.sql
-- One source table, one staging model. No business logic.
with source as (
    select * from {{ source('stripe', 'charges') }}
),

renamed as (
    select
        id as charge_id,
        cast(amount as decimal(12, 2)) / 100 as charge_amount,
        currency,
        customer as customer_id,
        cast(created as timestamp) as charged_at,
        status
    from source
    where status != 'failed'  -- Filter junk at the source
)

select * from renamed
dbt model layer architecture: raw, staging, intermediate, martRaw sources flow through staging for cleaning and renaming, intermediate for business logic joins, and marts for consumption. Each layer has clear ownership and testable contracts.dbt Model Layers: From Chaos to ContractsRaw SourcesPostgres, S3, APIsUntyped, messy namesNo tests, no docsNobody trusts thisStagingRename, cast, dedupe1:1 with source tablesFreshness + not-null testsClean, typed, testedIntermediateBusiness logic joinsAggregations, pivotsReusable building blocksNot exposed to consumersMartsFinance, Marketing, ProductDomain-owned, documentedContract-tested outputsDashboards connect hereEach layer has exactly one job. Skip a layer, pay with debugging time.

Architecture without tests is an organized pantry with no quality control. Nice labels, bad ingredients.

Tests as Data Contracts

A failing test blocks model deployment. Broken data never reaches dashboards. This is the single highest-ROI investment in analytics engineering, and most teams skip it “because they’ll add tests later.” (They won’t.)

One production dashboard underreported revenue by 12% because a join key had 12% nulls. Not a rounding error. A board-level problem that persisted for weeks before anyone traced it to the join. Tasting the soup before serving would have caught it. A not_null test on that key catches it in CI, before any stakeholder sees the wrong number.

Business logic tests encode domain knowledge that outlives individual team members: sum(refunds) <= sum(charges) catches accounting errors. A “DAU cannot drop more than 30% day-over-day” test catches data pipeline failures before anyone asks why the business apparently fell off a cliff.

Anti-pattern

Don’t: Add tests after the model is in production and someone reports a problem. By then, trust is already damaged. You don’t install the smoke detector after the fire.

Do: Add not_null, unique, and accepted_values tests to every model before the first production run. Start with your five most critical tables. The testing framework exists to prevent trust erosion, not repair it.

Prerequisites
  1. dbt project initialized with a clear models/ directory structure (staging, intermediate, mart)
  2. Source freshness checks configured for every raw table (loaded_at_field defined)
  3. CI pipeline running dbt test on every PR, blocking merge on failure
  4. At least 3 tests per model: unique on primary key, not_null on primary key, and one business-logic test
  5. YAML schema files next to SQL models containing column descriptions

The Semantic Layer: One Recipe, Every Kitchen

A semantic layer defines metrics once. It sits between the warehouse and BI tools. Instead of each tool cooking its own version of “monthly recurring revenue,” the definition lives in one place. Measure. Time dimension. Filters. Every tool reads the recipe book instead of improvising.

MetricFlow (built into dbt Cloud), Cube (open-source REST/GraphQL API), and LookML (Looker-native) all do this. The specific tool matters less than the principle: one official definition per metric. The Michelin recipe book, not five chefs’ handwritten notes.

Without a semantic layer, that CFO revenue recognition change touches Looker. Tableau. Jupyter. Three analysts’ ad-hoc SQL scripts. Some updates get missed. They always do. With a semantic layer: update one definition, PR review, CI test, deploy. Every consumer reflects the new logic automatically. One change, every kitchen follows.

Data quality testing checks the ingredients. The semantic layer makes sure every chef follows the same recipe.

The dbt Cargo Cult Adopting dbt without adopting the discipline it was built to support. The models compile. The tests don’t exist. The DAG looks impressive in demos. The metrics still disagree across dashboards because nobody enforced testing or used the semantic layer. A professional kitchen full of equipment. Everyone still cooking from memory.
dbt CI/CD: Test Models Before They Reach Productiondbt CI/CD: Test Models Before They Reach ProductionPR OpenedModel SQL changedTriggers CI pipelinedbt compileSQL compiles cleanly?Schema refs resolve?dbt testUnique, not_null, acceptedCustom business rulesFreshness checksPass: merge + deployFail: PR blockedBad data models caught in PR review, not in the morning dashboard panic.

Documentation and Lineage That Stay Current

Column descriptions live in YAML files right next to the SQL models. dbt docs generate builds a searchable site with lineage graphs, test coverage, and freshness metrics. Docs can’t drift from the code because they live in the same commit. The recipe and the ingredients list are stapled together.

dbt Documentation: Generated From Code, Always Currentdbt Documentation: Generated From Code, Always CurrentYAML Schema FilesColumn descriptionsTest definitionsdbt docs generateBuilds lineage graphGenerates searchable siteAuto-PublishedOn every merge to mainAlways matches prodAnalyst Self-ServeSearch any model/columnSee lineage upstream/downDocumentation generated from code cannot drift. Manual wikis always do.

The lineage graph is the quiet superpower. When someone asks “which dashboards break if this source table changes?”, the answer is a graph query, not a manual hunt. Tracing which supplier’s tomatoes went into which dish. New analysts browse the docs site, trace lineage back to raw sources, and understand test coverage before writing their first query. Onboarding in hours instead of weeks.

Making lineage actionable with exposure tracking

dbt exposures let you declare downstream consumers (dashboards, reports, ML models) as first-class objects in the lineage graph. When a staging model changes, the lineage graph shows not just affected intermediate and mart models, but the specific Looker dashboards and Tableau workbooks that will show different numbers. Without exposures, the lineage stops at the mart layer and the “which dashboards break?” question is only half-answered.

When Analytics Engineering Is Overkill

A startup with 3 analysts and 10 dashboard queries doesn’t need dbt. Or a semantic layer. Or a three-layer architecture. A well-organized set of views in the warehouse and clear naming conventions carry you further than you’d think. Don’t build the commercial kitchen for a dinner party.

Invest in analytics engineeringSkip it (for now)
Multiple BI tools querying the same metricsSingle BI tool, single analyst
3+ teams consuming the same dataOne team, one domain
Metric definitions disputed in meetingsEveryone agrees on the numbers
50+ dashboard queries hitting raw tablesUnder 20 queries, all straightforward
Regulatory or audit requirements on data lineageInternal analytics only, low stakes

The tipping point usually hits around 30-50 dbt models and 10+ regular data consumers. Below that, CI pipelines, testing frameworks, and semantic layers cost more in overhead than the problems they solve. Above it, the cost of skipping them grows fast.

What the Industry Gets Wrong About Analytics Engineering

“dbt is analytics engineering.” dbt is a tool. Analytics engineering is a discipline. Adopting dbt without changing how you think about data ownership, testing, and metric definitions produces version-controlled spaghetti instead of unversioned spaghetti. The folder structure is cleaner. The numbers are still wrong. Buying a KitchenAid doesn’t make you a baker.

“Analysts can do this on the side.” Analytics engineering is a full-time role. The failure mode is not subtle: test coverage drops, PR reviews pile up, and the backlog grows until the function is effectively abandoned. Part-time analytics engineering is like part-time hygiene. It shows.

“The semantic layer replaces the transformation layer.” A semantic layer defines what metrics mean. It doesn’t clean, join, or model the data underneath. Skip the staging and intermediate layers, define metrics directly on raw tables, and you’ve built a recipe book for unwashed ingredients.

Metrics on Quicksand A clean definition layer sitting on top of ungoverned, untested data. When the source schema changes, everything downstream breaks quietly. The semantic layer gives you consistency. It can’t give you correctness if the foundation is unreliable.
Our take Start with tests, not models. The highest-ROI first step in analytics engineering is adding not_null and unique tests to your five most critical tables. Not refactoring into a three-layer architecture. Not building a semantic layer. Tests. They catch the problems that erode trust, and trust is the prerequisite for every other investment in this space. Taste the soup before you redesign the kitchen.

Team Structure and Scaling

Where to put them: embed analytics engineers in domain teams for business context, but have them report into a central function for standards and cross-team coordination. Embedded gives better context. Centralized keeps things consistent. You need both. Same reason a restaurant chain needs chefs who know the local menu and a head office that keeps the recipes consistent.

PR review needs two perspectives. Technical: is the SQL fast, are there enough tests? Business: does the logic match the company’s recognition policy? Skip the business review and you get models that run great but mean the wrong thing. Skip the technical review and you get correct definitions scanning terabytes when gigabytes would do. A perfect recipe that takes six hours to cook a sandwich.

Boundary with data engineers : data engineers own infrastructure (pipelines, warehouse, orchestration). Analytics engineers own the transformation layer (metric definitions, entity relationships, consumer access). Handoff point: source tables. Clean split, clear ownership. The supplier delivers ingredients. The chef turns them into meals.

Scaling signals: one analytics engineer manages 30-50 models comfortably. PR review exceeding 3 days, dropping test coverage, and a growing backlog of stakeholder requests are the early signs you need to hire.

That 40-minute argument about which spreadsheet is right? Impossible now. The CFO and CRO pull from the same dbt mart, with revenue recognition logic defined once in version-controlled SQL, tested on every PR, documented in the semantic layer. Two dashboards. One number. Same recipe. The meeting discusses growth strategy instead.

Your Dashboards Show Different Numbers

When analysts debate which number is right instead of what the number means, analytics is failing. Version-controlled SQL transformations with dbt, automated testing on every merge, and lineage tracking fix the root cause: nobody owns the definition.

Transform Your Analytics Stack

Frequently Asked Questions

What is dbt and what problem does it solve?

+

dbt (data build tool) is a framework that lets data teams write SQL as version-controlled, testable, documented models in Git. It solves the problem where revenue logic lives in a BI tool formula, segmentation lives in a saved query, and nobody can answer ‘why did this number change?’ Teams that adopt dbt see metric disagreements drop fast because every transformation is auditable, testable, and owned.

What are the three standard dbt model layers?

+

Staging models clean raw source data with one model per source table and no business logic. Intermediate models apply business logic, joins, and calculations like revenue recognition and attribution. Mart models produce final, aggregated, dashboard-ready datasets. This separation means changing a business rule updates one intermediate model, not a dozen BI reports with different versions of the same formula.

What dbt tests should every model have?

+

At minimum four tests per model: primary key uniqueness, not-null on primary key, referential integrity between related models, and accepted_values on enum columns. Business-critical models add custom tests like ‘monthly revenue cannot drop more than 50% without a known cause.’ Mature projects average 3-5 tests per model, all running in CI to block deployment on failure.

How does dbt lineage help with impact analysis?

+

dbt builds a directed acyclic graph of all model dependencies from ref() references automatically. When you change a staging model, the lineage graph shows every downstream model, test, and dashboard that’s affected. Without lineage, figuring out ‘which dashboards break if this source changes?’ means relying on manually maintained docs that are almost always wrong.

When does a team need an analytics engineer vs. a data engineer?

+

Data engineers build infrastructure: pipelines, warehouses, orchestration, moving data around. Analytics engineers define what data means: metric definitions, entity relationships, segmentation logic, attribution models. Teams with more than 50 dbt models or 10+ data consumers need dedicated analytics engineers. They own the layer between raw data and business decisions, working mostly in SQL and dbt.