Developer Portals That Don't Go Stale

Aug 14, 2025 Metasphere Engineering 14 min read

You launch your internal developer portal on a Monday. The demo goes great. Standing ovation from the VP. The service catalog shows all 200 services with ownership, dependency graphs, and API documentation. Engineers are genuinely impressed.

Three months later, 40 services have changed teams and nobody updated the catalog. The golden path templates still scaffold services with the old CI/CD pipeline that the platform team deprecated two sprints ago. TechDocs hasn’t been rebuilt since launch because the build step was never wired into CI. The CNCF Backstage project popularized this pattern, but the same failure mode hits every portal regardless of implementation.

The building directory in the lobby. Beautiful on opening day. Three months later, 40 tenants moved and nobody updated the board. Someone looking for the finance team ends up at an empty office during an emergency. They call the receptionist instead. They never check the directory again.

Key takeaways

Catalog rot is the #1 portal failure mode. 40 services change teams in 3 months. Nobody updates ownership manually. The portal becomes a liability during incidents.
Automated catalog sync from CI/CD metadata kills manual updates. If the service deploys, the catalog reflects the current owner, dependencies, and docs without anyone filing a ticket.
Golden path templates must stay current with the platform team’s latest standards. Deprecated templates that still scaffold services create tech debt at creation time.
TechDocs generation wired into CI means documentation updates on every merge. Manual doc rebuilds are doc rebuilds that never happen.
Measure portal adoption weekly. If most engineers aren’t using the portal monthly after six months, it’s failing. Investigate why. Usually stale data or slow search.

The developer experience metrics that matter for portals are adoption and accuracy, not feature count.

Why Developer Portals Fail

Three failure modes kill portals. They tend to compound.

Catalog rot is the killer. Wrong ownership during an incident? The on-call engineer checks Slack instead. The directory says floor 3. Floor 3 is empty. Call the receptionist. That habit triggers a death spiral: nobody uses it because it’s wrong, nobody updates it because nobody uses it. Automation breaks the cycle. Entity descriptors live in each service’s repository (catalog-info.yaml) and update on every merge. No manual curation. No “please update the wiki” messages that everyone ignores.

Plugin bloat kills portals quietly. Platform teams install 20 plugins before the catalog is trusted, creating a portal that does everything poorly and nothing well. A building directory with 20 screens showing weather, stock prices, and cafeteria menus. Can’t find the tenant list. Engineers open it, get overwhelmed by dashboards they didn’t ask for, and close the tab.

No team buy-in happens when the portal is built in a vacuum. The most common actual pain point engineers face: “Who owns this service and how do I reach them during an incident?” If the portal answers that accurately and instantly, adoption follows. If it answers with a Confluence link from 2022, adoption dies. (The directory that links to a phone number the person hasn’t used since the office moved.)

Anti-pattern

Don’t: Launch with 15+ plugins, a custom theme, and integrations with every tool in your stack. This creates an overwhelming interface that distracts from the core value. Engineers visit once, see a wall of dashboards they didn’t ask for, and never return. The directory with so many screens that nobody can find the tenant list.

Do: Launch with three capabilities: catalog, golden paths, TechDocs. Get catalog completeness above 80% first. Add plugins one at a time, only when engineers request them. A request signals real need. A speculative install signals guessing.

Backstage Architecture

Backstage, originally built by Spotify and now a CNCF incubating project, is the dominant open source framework. Its architecture maps directly to the capabilities that make or break adoption.

The software catalog ingests entity descriptors from repositories and answers three questions: who owns this, what does it depend on, what’s its status. The building directory. The scaffolder generates production-ready services with CI/CD, observability, and security pre-configured. The move-in kit. TechDocs builds MkDocs sites from repository-local markdown, rebuilt on every merge via CI. The operating manual for each floor, updated every time someone moves a wall.

The Service Catalog Foundation

# catalog-info.yaml - lives in each service repo
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: checkout-service
  description: Handles cart, payment, and order creation
  annotations:
    github.com/project-slug: myorg/checkout-service
    backstage.io/techdocs-ref: dir:.
spec:
  type: service
  lifecycle: production
  owner: group:commerce-team  # Must match identity provider
  providesApis:
    - checkout-api
  dependsOn:
    - component:payment-service
    - resource:orders-database

Entity descriptors should be minimal and machine-verifiable. Ownership must map to your identity provider, not free-text. Dependencies reference catalog entities, not strings. API specs are OpenAPI or gRPC definitions, not Confluence links from 2022. The directory that links to a live phone number, not a post-it note.

Enforcement matters more than schema. A CI check that blocks merges when ownership is missing keeps the catalog accurate automatically. The directory that updates itself every time someone moves. Teams relying on voluntary compliance watch accuracy erode within months as services move and teams reorganize. Without exception. CI enforcement is what keeps accuracy high for the long term.

Measure completeness weekly: valid owner, dependencies declared, API spec attached, lifecycle status set. Display the metric on the portal homepage. Social pressure works. The accuracy percentage on the lobby wall. Nobody wants to be the floor with missing listings.

Prerequisites

Identity provider integration configured (ownership maps to real groups, not free-text strings)
CI pipeline modified to validate catalog-info.yaml on every merge
Catalog-info.yaml present in at least 80% of service repositories before launch
On-call integration wired (PagerDuty, OpsGenie) so the catalog answers “who do I page?” during incidents
Search indexed and tested with common queries (“who owns X”, “what depends on Y”)

Golden Paths Through Templates

A golden path template scaffolds a production-ready service. Application code with health checks. Dockerfile with security best practices. CI/CD pipeline. Kubernetes manifests with resource limits and network policies. OpenTelemetry wired in. Documentation skeleton. Security scanning. All of it. An afternoon of senior engineer setup, automated for every team going forward. The move-in kit. Furniture, utilities, keys, mailbox, name on the directory. Done before day one.

The template must be opinionated. If the standard is Go with gRPC, scaffold Go with gRPC. Don’t offer 8 languages and 4 frameworks. Flexibility in golden paths defeats their purpose. The template’s value is that it encodes institutional knowledge about what “production-ready” means for your organization. When the template is faster than copying an existing service and hacking it, adoption happens naturally. When it’s slower or more rigid than the alternative, engineers route around it. (They always find a way around.)

Templates also rot. The golden path must track the platform team’s latest standards. A template that scaffolds services with a deprecated CI pipeline or an outdated base image creates tech debt at creation time. The move-in kit that installs last year’s phone system. Wire template validation into the same CI that validates everything else.

TechDocs and API Documentation

When docs live in the same repository as the code, they show up in the same PR that changes behavior. A reviewer can say “you changed the retry logic but didn’t update the docs” in the same thread. That workflow change does more for documentation quality than any tooling investment. The operating manual lives on the floor it describes. Not in a filing cabinet at headquarters.

Wire TechDocs builds into CI so docs rebuild on every merge. The documentation that matters most is operational: deployment procedures, debugging guides, runbooks, SLOs. Generate API reference from OpenAPI specs automatically. A solid DevOps practice ensures docs are never more than one commit behind the code.

Make API registration part of the golden path template so every new service automatically shows its API documentation in the catalog. In an organization with 200 services, knowing which service exposes a capability and how to call it saves hours per integration. When the platform engineering investment includes API docs aggregation, it pays dividends across every team.

Alternatives to Backstage

Platform	Best For	Catalog	Golden Paths	Ops Overhead	Cost
Backstage	Teams with engineering capacity	YAML + API	Software templates	High (self-hosted)	Free (OSS)
Port	No-code portal, fast setup	Integration-based	Blueprints	Low (managed)	Commercial
Cortex	Service maturity scoring	Auto-discovered	Scorecards	Low (managed)	Commercial
OpsLevel	Standards compliance	Checks-based	Automated verification	Low (managed)	Commercial

Commercial alternatives trade customization for faster time-to-value. Port provides a no-code portal builder populated through integrations rather than YAML files. Cortex focuses on service maturity scorecards that drive real improvements when scoring criteria align with genuine priorities. OpsLevel validates standards compliance continuously through automated checks.

The decision hinges on one question: can you dedicate 1-2 engineers to ongoing portal maintenance? A half-maintained Backstage instance is worse than a fully managed commercial portal. Backstage gives unlimited customization. Customization without maintenance capacity produces a brittle portal that breaks on every upgrade. A custom-built directory that nobody updates vs. a managed one that handles itself.

Backstage plugin maintenance cost in practice

Backstage releases monthly. Each plugin must be tested against the new release. A portal with 20 plugins has 20 potential breakage points on every upgrade. The typical failure pattern: the team installs plugins enthusiastically in month one, falls behind on upgrades by month three, and is stuck on a version six months old by month six because the upgrade path touches too many plugins to tackle in a single sprint. Effective cloud-native tooling scales with integrations you actively maintain, not integrations you installed once and forgot about. Budget ongoing maintenance hours for every plugin, or don’t install it.

Measuring Portal Adoption

A portal without adoption metrics is a portal flying blind. Track four things weekly.

DAU as a percentage of engineering headcount. Low adoption after six months means the portal isn’t solving real problems. High adoption means critical infrastructure. The threshold between “nice to have” and “essential” is whether engineers check the directory before calling the receptionist.

Template usage rate: if most new services skip the golden path templates, they’re too rigid or don’t cover common patterns. When nearly all new services use the golden path, the templates are genuinely easier than the alternative. The move-in kit that tenants actually want to use.

Catalog accuracy: percentage of services with valid ownership, dependencies, and lifecycle status. Track weekly. Display the number on the portal homepage. Social pressure from a visible accuracy metric does more than any number of “please update your catalog entry” emails. The accuracy percentage on the lobby wall.

Incident resolution time before and after portal adoption. Track how often the portal is the first tool opened versus Slack during incidents. This metric justifies the investment to leadership in terms they care about. Measuring developer productivity through portal-specific metrics keeps the roadmap grounded in real needs.

The Catalog Rot Curve The rate at which service catalog accuracy degrades after launch. Without automated sync from CI/CD metadata, catalog accuracy erodes within months as teams reorganize, services move, and dependencies evolve. The directory losing one listing per week. Once accuracy drops enough that engineers find wrong data during routine lookups, they stop trusting the portal. Drop further and they stop opening it entirely. The curve is steep and the recovery is painful because rebuilding trust requires sustained accuracy over weeks, not a single cleanup sprint.

What the Industry Gets Wrong About Developer Portals

“Install Backstage and you have a portal.” Backstage is a framework, not a product. Out of the box, the catalog is empty, the templates are examples, and the documentation is a placeholder. A portal requires catalog curation, template maintenance, and adoption measurement. The installation is day one. The actual work is the next 12 months. Hanging a blank directory board in the lobby and calling it done.

“More plugins mean a better portal.” Teams that add 15 plugins before getting the catalog right have a portal that does many things poorly. A portal with an accurate catalog, current documentation, and working golden paths but zero plugins outperforms one with 20 plugins and stale data. Every time. A directory with 20 screens and wrong tenant data vs. a simple board with correct listings.

Our take Ship the catalog first. Not templates. Not plugins. Not documentation. The catalog with accurate ownership and dependency data. Wire ownership to your identity provider so it can’t go stale through manual neglect. Validate accuracy weekly. Once the catalog is trusted (80%+ completeness, verified by engineers actually using it during incidents), add golden paths. Everything else comes after. The sequence matters because trust is the foundation. No amount of plugin polish compensates for wrong ownership data at 2 AM during an outage. Get the directory right. Everything else is decoration until the directory works.

Those 40 services with wrong ownership? Wire ownership to your identity provider. Build TechDocs into CI. Make accuracy automatic instead of aspirational. Catalog first. Enforce in CI. Measure weekly. Golden paths at 80% completeness. Plugins when engineers ask. Every other sequence produces the same stale portal it was supposed to replace. Same lobby. Updated directory. Every listing correct. People stop calling the receptionist.

Frequently Asked Questions

How long does a Backstage deployment take to reach production readiness?

A minimal Backstage deployment with a software catalog and basic CI/CD integration takes 4-6 weeks. Reaching production readiness with golden path templates, TechDocs, and 3-5 essential plugins typically takes 3-4 months. Teams that try to launch with 15+ plugins before the catalog is trusted consistently fail. Start with the catalog, get ownership data above 80% completeness, then add capabilities one at a time.

What is catalog completeness and why does it matter?

Catalog completeness measures the share of services with accurate entity descriptors including ownership, dependencies, API specs, and lifecycle status. When completeness is low, engineers don’t trust the catalog and revert to Slack for service discovery. Once completeness is high enough that engineers consistently find accurate data, the catalog becomes the default reference for incident response, onboarding, and dependency analysis. Measure it weekly and treat it like an SLO.

How do golden path templates differ from project scaffolding?

Project scaffolding generates boilerplate code. Golden path templates generate a production-ready service with CI/CD pipelines, observability, security scanning, network policies, and documentation pre-configured. Scaffolding gives you a starting point. A golden path gives you a service that meets production standards from its first commit. Teams using golden paths deploy new services to production in a fraction of the time it takes teams starting from scratch.

Should we build a custom portal or adopt Backstage?

Build custom only if your needs are genuinely unique and you can commit 2-3 full-time engineers to portal development for the long term. Backstage has 150+ open source plugins and a large contributor community. Commercial alternatives like Port, Cortex, and OpsLevel offer faster time-to-value with less engineering investment. Most organizations underestimate the maintenance cost of custom portals. A custom portal that isn’t actively maintained becomes the same stale documentation it was supposed to replace.

What metrics indicate a developer portal is actually being adopted?

Track daily active users as a percentage of engineering headcount. Low DAU after 6 months means the portal isn’t solving real problems. Track template usage: if most new services bypass golden path templates, the templates either don’t cover common patterns or are too rigid. Track catalog search volume versus Slack channel questions about service ownership. When Slack questions drop noticeably for topics the catalog covers, the portal is working.