The Challenge
A logistics technology startup had validated their business model with scrappy solutions - spreadsheets, manual processes, and a prototype that was held together with enthusiasm. They’d raised their Series A and needed to build the real platform.
The requirements were substantial: ingest data from hundreds of freight carriers in various formats, normalize it, run complex analytics, and expose it through APIs for their customers. And it needed to be built by a small team on a startup timeline.
Our Approach
Architecture First
Before writing code, we spent three weeks on architecture. We mapped data flows, identified integration patterns, and documented the key decisions:
- Data Lake + Warehouse hybrid: Raw data lands in S3, then flows through transformation pipelines into Snowflake. This gave us flexibility for new data sources while keeping analytics fast.
- Event-driven ingestion: Each carrier integration publishes to a message queue, letting us handle variable volumes without backpressure problems.
- API-first design: Internal services and external APIs were designed together, using the same patterns and contracts.
We presented three architectural options to the team - each with different trade-offs around complexity, cost, and capability. They chose wisely.
Building in Phases
Phase 1 (3 months): Core data pipeline and initial carrier integrations. By the end, they were processing real data.
Phase 2 (4 months): Analytics engine, customer-facing APIs, and scaling work. Query patterns that took minutes now took seconds.
Phase 3 (5 months): Advanced features, additional integrations, and operational hardening. Focus shifted to reliability and maintainability.
Each phase delivered usable capability. No big-bang release.
Knowledge Transfer Throughout
We paired with their engineers from day one. Design decisions were made collaboratively, documented in ADRs (Architecture Decision Records), and discussed in regular architecture reviews. When we stepped back, their team didn’t just inherit code - they inherited context.
Technical Decisions
Data Processing: Apache Spark for heavy transformation, dbt for analytics engineering. The combination handles both the volume of raw processing and the complexity of business logic.
Streaming: Kafka for event ingestion, with careful schema evolution practices to handle carrier format changes without breaking downstream consumers.
Storage: S3 for the data lake, Snowflake for the warehouse. The managed service costs were worth avoiding operational burden for a small team.
Observability: Datadog for infrastructure monitoring, custom metrics for business visibility. Alerts tied to SLOs that the team defined.
The Outcome
The platform went from concept to production in 12 months. It now processes over 5 million freight transactions daily with sub-second analytics query performance. More importantly, their engineering team owns it completely - we haven’t been needed for operational support since handoff.
The architecture has proven extensible. They’ve added carrier integrations, new analytics features, and customer-facing capabilities without requiring architectural changes.
What We’d Do Differently
If we did it again, we’d invest more in integration testing infrastructure earlier. The system worked well, but we discovered some carrier edge cases in production that better test coverage would have caught sooner.
We’d also recommend starting observability from day one rather than adding it in phase two. The instrumentation we added later would have been easier to build in from the start.