Internal Developer Portals: Backstage and Beyond

Aug 14, 2025 Metasphere Engineering 15 min read

You launch your internal developer portal on a Monday. The demo goes great. Standing ovation from the VP. The service catalog shows all 200 services with ownership, dependency graphs, and API documentation. Engineers are genuinely impressed. Three months later, 40 services have changed teams and nobody updated the catalog. The golden path templates still scaffold services with the old CI/CD pipeline that the platform team deprecated two sprints ago. TechDocs has not been rebuilt since launch because the build step was never wired into CI. The portal that was supposed to be the single source of truth is now the single source of stale data that nobody trusts. Engineers open it, see wrong ownership data during an incident, and never come back.

This is the default outcome. Not the exception. It happens because portals get treated as a one-time deployment rather than a product that needs continuous curation and measurement. The developer experience metrics that matter most for portals are adoption and accuracy, not feature count.

Why Developer Portals Fail

Portal failures cluster around three root causes. None of them are technical.

Catalog rot is the killer. A software catalog is only as valuable as its accuracy. The moment ownership data is wrong during a 2 AM incident, the on-call engineer learns to ignore the catalog and check Slack instead. Once that habit forms, the catalog enters a death spiral: nobody uses it because it is wrong, and nobody updates it because nobody uses it. The fix is automation. Entity descriptors live in each service’s repository (catalog-info.yaml) and update the catalog on every merge to main. Ownership changes propagate automatically when teams update the file in their repo. No manual curation. No “please update the wiki” Slack messages that everyone ignores.

Plugin bloat kills portals in a subtler way. The Backstage plugin ecosystem has 150+ plugins. Platform teams install 20 of them before the catalog is trusted, creating a dashboard that does everything poorly instead of a catalog that does one thing well. Engineers open the portal, see a wall of half-configured widgets, and close the tab. This happens at organization after organization. Start with three capabilities: service catalog, golden path templates, and TechDocs. Add plugins only when engineers request them and the catalog completeness is above 80%.

No team buy-in happens when the portal is built in a vacuum. If the platform team ships a portal without interviewing engineering teams about their actual pain points, the portal solves problems that do not exist while ignoring the ones that do. The most common actual pain point is dead simple: “I do not know who owns this service and how to contact them during an incident.” If the portal answers that question accurately and instantly, adoption follows. Everything else is secondary.

Backstage Architecture

So if the failure modes are clear, what do you actually build with? Backstage, originally built by Spotify and now a CNCF incubating project, has become the dominant open source framework for internal developer portals. Understanding its architecture matters because the components map directly to the capabilities that make or break portal adoption.

The software catalog is the foundation. Everything else depends on it. The catalog ingests entity descriptors (catalog-info.yaml files) from service repositories, normalizes them into a unified data model, and exposes them through the frontend and API. Entity types include Component (services), API (specs), Resource (databases, queues), and Group/User (teams and people). The catalog becomes valuable when it answers three questions without leaving the browser: who owns this service, what does it depend on, and what is its current status.

The scaffolder (Software Templates) turns the portal from a read-only catalog into a productivity tool. Templates are YAML definitions that generate new services, libraries, or infrastructure components with all the production requirements baked in. A well-designed template creates a new repository, configures CI/CD, sets up observability, registers the service in the catalog, and opens a PR for the team to review. The engineer writes zero boilerplate. That is the promise, and when it works, it changes how fast teams ship.

TechDocs brings documentation into the portal by building MkDocs sites from markdown files that live alongside the service code. Because the docs live in the same repository as the code, they actually get updated when the code changes. The critical integration is wiring TechDocs builds into CI so that documentation is rebuilt on every merge, not manually. Skip this step and your docs will be stale within a month.

The Service Catalog Foundation

A portal without a trusted catalog is a bookmarks page. Getting the catalog right is 60% of the work and determines whether everything else succeeds or fails.

Entity descriptors should be minimal and machine-verifiable. Ownership must map to an actual team in your identity provider, not a free-text name that could be misspelled six different ways. Dependencies should reference other catalog entities, not free-text strings. API specs should be OpenAPI or gRPC definitions that the catalog can render, not links to Confluence pages that were last updated in 2022.

Here is the part that matters more than anything else in this article: the enforcement mechanism matters more than the schema. A catalog-info.yaml check in CI that blocks merges when ownership is missing or the team does not exist in the identity provider keeps the catalog accurate automatically. Teams that rely on voluntary compliance see catalog accuracy degrade to below 60% within 6 months. Every single time. Teams that enforce it in CI maintain 90%+ accuracy indefinitely. Automation beats good intentions.

Measure catalog completeness weekly. Track the percentage of services with: valid owner (team exists), at least one dependency declared, API spec attached (for services that expose APIs), lifecycle status set (production, experimental, deprecated). Display this metric prominently in the portal itself. Teams with low completeness see their score and feel social pressure to fix it. Sounds simple because it is. It works.

Golden Paths Through Templates

Golden paths are the highest-leverage feature a developer portal offers. Nothing else comes close. A golden path template does not just create a repository. It creates a production-ready service that meets every organizational standard from its first commit.

Think about what a well-designed golden path template for a new backend service actually scaffolds: the application code with a health check endpoint, a Dockerfile with security best practices (non-root user, minimal base image), a CI/CD pipeline (GitHub Actions, GitLab CI, or ArgoCD Application), Kubernetes manifests with resource limits, network policies, and pod disruption budgets, an OpenTelemetry instrumentation stub, a catalog-info.yaml with correct ownership, an mkdocs.yml with a documentation skeleton, and security scanning configuration (Trivy, Snyk, or Grype). That is an afternoon of senior engineer setup time, automated and handed to every team for free.

The template must be opinionated. If the organization standard is Go with gRPC, the template scaffolds Go with gRPC. Do not offer a menu of 8 languages and 4 frameworks. Flexibility in golden paths defeats their entire purpose. The goal is to make the right thing the easy thing. When creating a new service through the template is faster and more reliable than copying an existing service and modifying it, adoption happens naturally. You do not need to mandate it. Engineers will choose the path that gets them to production fastest.

Backstage Software Templates use a YAML definition with steps that call built-in or custom actions. The fetch:template action pulls a Cookiecutter or Nunjucks template. The publish:github action creates the repository. Custom actions can trigger Terraform to provision infrastructure, register DNS records, or create monitoring dashboards. The template execution is auditable and reproducible, which matters for compliance.

TechDocs: Documentation That Stays Current

Documentation rots because of proximity. When docs live in Confluence, updating them requires switching context away from the code. Nobody does that. When docs live in the same repository as the code, in a docs/ directory with an mkdocs.yml configuration, they show up in the same PR that changes the behavior they describe. A code reviewer can say “you changed the retry logic but did not update the docs” in the same review thread. That single workflow change is worth more than any documentation tooling investment.

TechDocs builds MkDocs sites from those repository-local markdown files and serves them through the Backstage frontend. The build step can run in CI (recommended) or on-demand in the Backstage backend. CI-built docs update automatically on every merge. On-demand builds are simpler to set up but stale until someone triggers a rebuild.

The documentation that matters most for a service catalog is operational: how to deploy the service, how to debug common failure modes, what the runbook looks like for incidents, and what the SLOs are. API reference documentation matters too, but generate it from OpenAPI specs rather than writing it by hand. A solid DevOps practice connects TechDocs builds to the same pipeline that deploys the service, ensuring docs are never more than one commit behind the code.

With the catalog trusted and docs flowing, the next question is whether to build on Backstage or buy something off the shelf.

Alternatives to Backstage

Backstage is the dominant open source option, but it is not the only game in town. Commercial alternatives trade customization for faster time-to-value, and for some teams, that trade-off is the right one.

Port provides a no-code portal builder with blueprints that map to your data model. The catalog is populated through integrations rather than YAML files in repositories. For organizations that want a portal without dedicating engineering time to Backstage maintenance, Port reduces the operational burden. The trade-off is less flexibility in customization and vendor dependency.

Cortex focuses on service maturity scorecards. It ingests data from CI/CD, monitoring, and security tools to score each service against defined standards. Teams see which services are below the maturity bar and what specific actions would raise the score. This gamification approach drives real improvements, but only when the scoring criteria align with genuine engineering priorities. Score vanity metrics and teams will game the system. Score things that actually matter and the competitive instinct works in your favor.

OpsLevel combines service catalog with a checks system that validates standards compliance automatically. It can verify that every service has an owner, a runbook, recent deployments, and passing security scans. The checks run continuously, so compliance is measured in real time rather than audited quarterly.

The build-versus-buy decision hinges on one question: do you have 1-2 engineers who can dedicate ongoing time to portal maintenance, plugin upgrades, and template development? If yes and you want maximum flexibility, Backstage is the right choice. If you need a portal running in weeks rather than months and cannot dedicate engineers to portal maintenance, pick a commercial alternative. Be honest about your capacity. A half-maintained Backstage instance is worse than a fully managed commercial portal.

Now, regardless of which platform you choose, there is one trap that catches nearly every team.

The Plugin Trap

Backstage’s plugin ecosystem is both its greatest strength and the most common cause of portal failure. The temptation is irresistible: Kubernetes cluster visualization, cost dashboards, security scan results, CI/CD status, PagerDuty integration, Grafana dashboards, SonarQube metrics. Each plugin adds value in isolation. Together, they create an overwhelming interface that engineers avoid. This exact pattern kills portal adoption regularly.

The pattern that works: launch with three capabilities (catalog, templates, TechDocs). After the catalog reaches 80%+ completeness and engineers are using it regularly, add one plugin per month based on the most frequent request from engineering teams. The request signals a real need. Adding plugins speculatively signals that the platform team is guessing.

Each plugin also carries a maintenance cost that compounds fast. Backstage upgrades frequently (monthly releases). Plugins that fall behind the Backstage API version require updates or replacements. A portal with 20 plugins has 20 potential points of breakage on every upgrade. A portal with 5 well-maintained plugins upgrades cleanly. The engineering investment in effective cloud-native tooling, including developer portals, scales with the number of integrations you are willing to maintain. Not install. Maintain.

API Documentation Aggregation

Plugins aside, there is one capability that multiplies the catalog’s value dramatically. A developer portal becomes significantly more valuable when it surfaces API documentation alongside service catalog entries. When an engineer needs to call another team’s service, they should find the API spec, example requests, and authentication requirements in the same place they found the service owner.

Backstage supports this through API entity types linked to components. An OpenAPI spec registered in the catalog renders an interactive API explorer directly in the portal. gRPC services can register their proto files. GraphQL services can expose their schema. The key is making API registration part of the golden path template so every new service automatically exposes its API documentation.

The aggregation value compounds with scale. In an organization with 200 services, knowing which service exposes a particular capability and how to call it saves hours per integration. Without a portal, that knowledge lives in Slack messages, outdated Confluence pages, or the heads of senior engineers who happened to build the service three years ago and are now on a different team. When the platform engineering investment includes API documentation aggregation, it pays dividends in reduced integration time across every team.

All of this only matters if people actually use the portal. Here is how you know whether they are.

Measuring Portal Adoption

Portals that are not measured like products die like projects. Every time. The metrics that matter are usage, accuracy, and impact.

Daily active users (DAU) as a percentage of engineering headcount is the primary adoption metric. Below 30% after 6 months means the portal is not solving problems engineers care about. Between 30-60% indicates the portal is useful but not yet the default tool. Above 60% means the portal has become essential infrastructure.

Template usage rate measures what percentage of new services are created through golden path templates versus manually. Below 50% means the templates are either missing common patterns, too rigid, or engineers do not know they exist. Above 80% means the golden path is genuinely easier than the alternative.

Catalog accuracy measured as the percentage of services with valid ownership, dependencies, and lifecycle status. Track weekly and display on the portal homepage. Social pressure is a legitimate adoption mechanism.

Incident resolution time before and after portal adoption. If the portal surfaces the right owner and runbook during incidents, mean time to engagement drops measurably. Track the number of incidents where the portal was the first tool opened versus Slack. This is the metric that justifies the entire investment to leadership.

The metrics feed directly into the portal roadmap. If trial-to-regular conversion is low, the catalog data is not trustworthy. If regular-to-habitual conversion is low, the portal does not cover the workflows engineers need. If template usage is low, the templates need to be more flexible or cover more service types. Measuring developer productivity through portal-specific metrics keeps the investment accountable and the roadmap grounded in real engineering needs.

Build the Catalog First

The internal developer portal is a product, not a project. It requires a product owner, a backlog, user research, and continuous iteration. The portals that succeed share one characteristic: they got the service catalog right before adding anything else. A trusted catalog with accurate ownership, dependency data, and API documentation is valuable even without golden paths, TechDocs, or plugins. A portal with 20 plugins and an untrustworthy catalog is a waste of everyone’s time.

Start with the catalog. Enforce accuracy in CI. Measure completeness weekly. Add golden paths when the catalog reaches 80% completeness. Add TechDocs when golden paths are adopted. Add plugins when engineers ask for them. That sequence works. Every other sequence produces the same stale portal that the organization launched the project to replace. Do not be the team that learns this the hard way.

Frequently Asked Questions

How long does a Backstage deployment take to reach production readiness?

A minimal Backstage deployment with a software catalog and basic CI/CD integration takes 4-6 weeks. Reaching production readiness with golden path templates, TechDocs, and 3-5 essential plugins typically takes 3-4 months. Teams that try to launch with 15+ plugins before the catalog is trusted consistently fail. Start with the catalog, get ownership data above 80% completeness, then add capabilities incrementally.

What is catalog completeness and why does it matter?

Catalog completeness measures the percentage of services with accurate entity descriptors including ownership, dependencies, API specs, and lifecycle status. Below 70% completeness, engineers do not trust the catalog and revert to Slack for service discovery. Above 90%, the catalog becomes the default reference for incident response, onboarding, and dependency analysis. Measure it weekly and treat it like an SLO.

How do golden path templates differ from project scaffolding?

Project scaffolding generates boilerplate code. Golden path templates generate a production-ready service with CI/CD pipelines, observability, security scanning, network policies, and documentation pre-configured. The difference is that scaffolding gives you a starting point. A golden path gives you a service that meets production standards from its first commit. Teams using golden paths deploy new services to production 5-10x faster than teams starting from scratch.

Should we build a custom portal or adopt Backstage?

Build custom only if your requirements are genuinely unique and you can commit 2-3 full-time engineers to portal development indefinitely. Backstage has 150+ open source plugins and a 2,000+ contributor community. Commercial alternatives like Port, Cortex, and OpsLevel offer faster time-to-value with less engineering investment. Most organizations underestimate the maintenance cost of custom portals. A custom portal that is not actively maintained becomes the same stale documentation it was supposed to replace.

What metrics indicate a developer portal is actually being adopted?

Track daily active users as a percentage of engineering headcount. Below 30% DAU after 6 months indicates the portal is not solving real problems. Track template usage: if fewer than 50% of new services use golden path templates, the templates either do not cover common patterns or are too rigid. Track catalog search volume versus Slack channel questions about service ownership. When Slack questions drop by 40%+ for topics the catalog covers, the portal is working.