Developer Portals That Don't Go Stale
You launch your internal developer portal on a Monday. The demo goes great. Standing ovation from the VP. The service catalog shows all 200 services with ownership, dependency graphs, and API documentation. Engineers are genuinely impressed.
Three months later, 40 services have changed teams and nobody updated the catalog. The golden path templates still scaffold services with the old CI/CD pipeline that the platform team deprecated two sprints ago. TechDocs hasn’t been rebuilt since launch because the build step was never wired into CI. The CNCF Backstage project popularized this pattern, but the same failure mode hits every portal regardless of implementation.
The building directory in the lobby. Beautiful on opening day. Three months later, 40 tenants moved and nobody updated the board. Someone looking for the finance team ends up at an empty office during an emergency. They call the receptionist instead. They never check the directory again.
- Catalog rot is the #1 portal failure mode. 40 services change teams in 3 months. Nobody updates ownership manually. The portal becomes a liability during incidents.
- Automated catalog sync from CI/CD metadata kills manual updates. If the service deploys, the catalog reflects the current owner, dependencies, and docs without anyone filing a ticket.
- Golden path templates must stay current with the platform team’s latest standards. Deprecated templates that still scaffold services create tech debt at creation time.
- TechDocs generation wired into CI means documentation updates on every merge. Manual doc rebuilds are doc rebuilds that never happen.
- Measure portal adoption weekly. If most engineers aren’t using the portal monthly after six months, it’s failing. Investigate why. Usually stale data or slow search.
The developer experience metrics that matter for portals are adoption and accuracy, not feature count.
Why Developer Portals Fail
Three failure modes kill portals. They tend to compound.
Catalog rot is the killer. Wrong ownership during an incident? The on-call engineer checks Slack instead. The directory says floor 3. Floor 3 is empty. Call the receptionist. That habit triggers a death spiral: nobody uses it because it’s wrong, nobody updates it because nobody uses it. Automation breaks the cycle. Entity descriptors live in each service’s repository (catalog-info.yaml) and update on every merge. No manual curation. No “please update the wiki” messages that everyone ignores.
Plugin bloat kills portals quietly. Platform teams install 20 plugins before the catalog is trusted, creating a portal that does everything poorly and nothing well. A building directory with 20 screens showing weather, stock prices, and cafeteria menus. Can’t find the tenant list. Engineers open it, get overwhelmed by dashboards they didn’t ask for, and close the tab.
No team buy-in happens when the portal is built in a vacuum. The most common actual pain point engineers face: “Who owns this service and how do I reach them during an incident?” If the portal answers that accurately and instantly, adoption follows. If it answers with a Confluence link from 2022, adoption dies. (The directory that links to a phone number the person hasn’t used since the office moved.)
Don’t: Launch with 15+ plugins, a custom theme, and integrations with every tool in your stack. This creates an overwhelming interface that distracts from the core value. Engineers visit once, see a wall of dashboards they didn’t ask for, and never return. The directory with so many screens that nobody can find the tenant list.
Do: Launch with three capabilities: catalog, golden paths, TechDocs. Get catalog completeness above 80% first. Add plugins one at a time, only when engineers request them. A request signals real need. A speculative install signals guessing.
Backstage Architecture
Backstage, originally built by Spotify and now a CNCF incubating project, is the dominant open source framework. Its architecture maps directly to the capabilities that make or break adoption.
The software catalog ingests entity descriptors from repositories and answers three questions: who owns this, what does it depend on, what’s its status. The building directory. The scaffolder generates production-ready services with CI/CD, observability, and security pre-configured. The move-in kit. TechDocs builds MkDocs sites from repository-local markdown, rebuilt on every merge via CI. The operating manual for each floor, updated every time someone moves a wall.
The Service Catalog Foundation
# catalog-info.yaml - lives in each service repo
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: checkout-service
description: Handles cart, payment, and order creation
annotations:
github.com/project-slug: myorg/checkout-service
backstage.io/techdocs-ref: dir:.
spec:
type: service
lifecycle: production
owner: group:commerce-team # Must match identity provider
providesApis:
- checkout-api
dependsOn:
- component:payment-service
- resource:orders-database
Entity descriptors should be minimal and machine-verifiable. Ownership must map to your identity provider, not free-text. Dependencies reference catalog entities, not strings. API specs are OpenAPI or gRPC definitions, not Confluence links from 2022. The directory that links to a live phone number, not a post-it note.
Enforcement matters more than schema. A CI check that blocks merges when ownership is missing keeps the catalog accurate automatically. The directory that updates itself every time someone moves. Teams relying on voluntary compliance watch accuracy erode within months as services move and teams reorganize. Without exception. CI enforcement is what keeps accuracy high for the long term.
Measure completeness weekly: valid owner, dependencies declared, API spec attached, lifecycle status set. Display the metric on the portal homepage. Social pressure works. The accuracy percentage on the lobby wall. Nobody wants to be the floor with missing listings.
- Identity provider integration configured (ownership maps to real groups, not free-text strings)
- CI pipeline modified to validate catalog-info.yaml on every merge
- Catalog-info.yaml present in at least 80% of service repositories before launch
- On-call integration wired (PagerDuty, OpsGenie) so the catalog answers “who do I page?” during incidents
- Search indexed and tested with common queries (“who owns X”, “what depends on Y”)
Golden Paths Through Templates
A golden path template scaffolds a production-ready service. Application code with health checks. Dockerfile with security best practices. CI/CD pipeline. Kubernetes manifests with resource limits and network policies. OpenTelemetry wired in. Documentation skeleton. Security scanning. All of it. An afternoon of senior engineer setup, automated for every team going forward. The move-in kit. Furniture, utilities, keys, mailbox, name on the directory. Done before day one.
The template must be opinionated. If the standard is Go with gRPC, scaffold Go with gRPC. Don’t offer 8 languages and 4 frameworks. Flexibility in golden paths defeats their purpose. The template’s value is that it encodes institutional knowledge about what “production-ready” means for your organization. When the template is faster than copying an existing service and hacking it, adoption happens naturally. When it’s slower or more rigid than the alternative, engineers route around it. (They always find a way around.)
Templates also rot. The golden path must track the platform team’s latest standards. A template that scaffolds services with a deprecated CI pipeline or an outdated base image creates tech debt at creation time. The move-in kit that installs last year’s phone system. Wire template validation into the same CI that validates everything else.
TechDocs and API Documentation
When docs live in the same repository as the code, they show up in the same PR that changes behavior. A reviewer can say “you changed the retry logic but didn’t update the docs” in the same thread. That workflow change does more for documentation quality than any tooling investment. The operating manual lives on the floor it describes. Not in a filing cabinet at headquarters.
Wire TechDocs builds into CI so docs rebuild on every merge. The documentation that matters most is operational: deployment procedures, debugging guides, runbooks, SLOs. Generate API reference from OpenAPI specs automatically. A solid DevOps practice ensures docs are never more than one commit behind the code.
Make API registration part of the golden path template so every new service automatically shows its API documentation in the catalog. In an organization with 200 services, knowing which service exposes a capability and how to call it saves hours per integration. When the platform engineering investment includes API docs aggregation, it pays dividends across every team.
Alternatives to Backstage
| Platform | Best For | Catalog | Golden Paths | Ops Overhead | Cost |
|---|---|---|---|---|---|
| Backstage | Teams with engineering capacity | YAML + API | Software templates | High (self-hosted) | Free (OSS) |
| Port | No-code portal, fast setup | Integration-based | Blueprints | Low (managed) | Commercial |
| Cortex | Service maturity scoring | Auto-discovered | Scorecards | Low (managed) | Commercial |
| OpsLevel | Standards compliance | Checks-based | Automated verification | Low (managed) | Commercial |
Commercial alternatives trade customization for faster time-to-value. Port provides a no-code portal builder populated through integrations rather than YAML files. Cortex focuses on service maturity scorecards that drive real improvements when scoring criteria align with genuine priorities. OpsLevel validates standards compliance continuously through automated checks.
The decision hinges on one question: can you dedicate 1-2 engineers to ongoing portal maintenance? A half-maintained Backstage instance is worse than a fully managed commercial portal. Backstage gives unlimited customization. Customization without maintenance capacity produces a brittle portal that breaks on every upgrade. A custom-built directory that nobody updates vs. a managed one that handles itself.
Backstage plugin maintenance cost in practice
Backstage releases monthly. Each plugin must be tested against the new release. A portal with 20 plugins has 20 potential breakage points on every upgrade. The typical failure pattern: the team installs plugins enthusiastically in month one, falls behind on upgrades by month three, and is stuck on a version six months old by month six because the upgrade path touches too many plugins to tackle in a single sprint. Effective cloud-native tooling scales with integrations you actively maintain, not integrations you installed once and forgot about. Budget ongoing maintenance hours for every plugin, or don’t install it.
Measuring Portal Adoption
A portal without adoption metrics is a portal flying blind. Track four things weekly.
DAU as a percentage of engineering headcount. Low adoption after six months means the portal isn’t solving real problems. High adoption means critical infrastructure. The threshold between “nice to have” and “essential” is whether engineers check the directory before calling the receptionist.
Template usage rate: if most new services skip the golden path templates, they’re too rigid or don’t cover common patterns. When nearly all new services use the golden path, the templates are genuinely easier than the alternative. The move-in kit that tenants actually want to use.
Catalog accuracy: percentage of services with valid ownership, dependencies, and lifecycle status. Track weekly. Display the number on the portal homepage. Social pressure from a visible accuracy metric does more than any number of “please update your catalog entry” emails. The accuracy percentage on the lobby wall.
Incident resolution time before and after portal adoption. Track how often the portal is the first tool opened versus Slack during incidents. This metric justifies the investment to leadership in terms they care about. Measuring developer productivity through portal-specific metrics keeps the roadmap grounded in real needs.
What the Industry Gets Wrong About Developer Portals
“Install Backstage and you have a portal.” Backstage is a framework, not a product. Out of the box, the catalog is empty, the templates are examples, and the documentation is a placeholder. A portal requires catalog curation, template maintenance, and adoption measurement. The installation is day one. The actual work is the next 12 months. Hanging a blank directory board in the lobby and calling it done.
“More plugins mean a better portal.” Teams that add 15 plugins before getting the catalog right have a portal that does many things poorly. A portal with an accurate catalog, current documentation, and working golden paths but zero plugins outperforms one with 20 plugins and stale data. Every time. A directory with 20 screens and wrong tenant data vs. a simple board with correct listings.
Those 40 services with wrong ownership? Wire ownership to your identity provider. Build TechDocs into CI. Make accuracy automatic instead of aspirational. Catalog first. Enforce in CI. Measure weekly. Golden paths at 80% completeness. Plugins when engineers ask. Every other sequence produces the same stale portal it was supposed to replace. Same lobby. Updated directory. Every listing correct. People stop calling the receptionist.