When Your Engineering Team Fixes the Wrong Problem for a Year

Most engineering cost problems don’t announce themselves. They just compound.

A three-engineer team running a B2B SaaS platform for enterprise document workflows was spending $4,200 per month on AI infrastructure twelve months ago. They had Datadog. They had two monitoring tools. They had dashboards that answered exactly one question with great precision: how much. What those tools never answered was which feature, which user, or which service. For a team trying to make sound architectural decisions, that gap is career-defining - the kind of gap that quietly shapes whether you get promoted for shipping efficient systems or get blamed when the CFO finally looks at the AWS bill.

The platform had six features calling GPT-4o regularly. Contract Analyzer pulled clauses and flagged risk on document upload. Executive Summary Generator ran on demand. Smart Search handled semantic lookup across the document corpus. Compliance Checker ran on every document save. Inline Redline Suggester fired on text selection. Audit Trail Narrator generated human-readable audit logs every night. Each feature had a clear function. None had a cost identity.

The team’s instinct, before any measurement, was that Contract Analyzer was the cost driver - it felt heavy, it processed complex legal text, it had been the subject of two full weeks of optimization work. Smart Search ranked second in the gut model. Compliance Checker and Audit Trail Narrator barely registered as concerns. Engineers working without attribution data don’t optimize systems. They optimize their assumptions about systems. That distinction matters enormously when you’re trying to build a track record of sound technical judgment.

The gut ranking was wrong. Not close. Not directionally off. Completely inverted.

What Forty-Eight Hours of Real Data Showed

The team instrumented every LLM call with feature-level, service-level, and user-level tags. After 48 hours, the actual cost breakdown looked nothing like the mental model anyone had been carrying.

Compliance Checker was running $1,890 per month - 45% of the total budget. Audit Trail Narrator came in at $1,102, accounting for 26%. Together, two features nobody had flagged in a single cost review were consuming 71% of the spend. Contract Analyzer, the feature that had received the most engineering attention, sat at $672 and 16%. Executive Summary Generator cost $294. Smart Search, the second entry in the team’s gut ranking, was $168 - 4% of total. Inline Redline Suggester ran $74. The average cost per call told an equally sharp story: Contract Analyzer’s $0.310 per call looked expensive until you understood that Compliance Checker was making vastly more calls at $0.087 each, and Audit Trail Narrator was hitting $0.240 per call on a nightly batch that nobody had interrogated.

This is the kind of data that reframes how you think about technical prioritization as a career skill. Optimization work applied to the wrong target isn’t neutral - it consumes engineering time, generates false confidence, and delays the real fix. Two weeks spent tuning Contract Analyzer while Compliance Checker quietly burned $22,680 annually is a concrete illustration of what misallocated technical effort costs.

The Compliance Checker’s problem was architectural, not functional. It ran on every document save, and the autosave interval was 30 seconds. With 40 active enterprise users, that produced 4,800 GPT-4o calls per hour, every working hour, every working day. No errors fired. No timeouts. No failed requests. Every log looked clean because the feature was working - it was just working at a cadence nobody had deliberately chosen. The system had no mechanism to distinguish between “operating correctly” and “operating expensively” because at the response level, those states are identical. Moving the compliance check to manual trigger and document submission only brought the monthly cost from $1,890 to $190. One logic change.

The Audit Trail Narrator’s problem was different. It ran nightly and generated a full GPT-4o narrative for every document with any recorded activity - including documents touched by automated processes, system integrations, and background jobs. Approximately 60% of those documents had no human readers of the audit logs. Prose was being generated for an audience that did not exist, every single night. Scoping the narrator to human-triggered activity only, with a minimum of three human edits before narration runs, dropped the cost from $1,102 to $310 per month. Combined, the two fixes recovered $2,592 per month. No feature was cut. No model was downgraded. No user noticed a change.

Both issues had an error rate of zero. That’s what made them invisible for so long.

The Pricing Problem Attribution Uncovered Next

Once per-feature attribution was operational, the team rolled the data up into cost-per-user by plan tier. What came back wasn’t an engineering problem anymore - it was a business model problem with direct career implications for every person involved in pricing decisions.

Starter plan customers cost $8.40 per seat per month to serve against $149 MRR per seat - a 94% margin. Growth plan customers cost $67.00 per seat against the same $149 price - 55% margin, still healthy. Enterprise customers cost $198.00 per seat per month to serve. The Enterprise price was also $149. That’s a negative 33% margin, meaning every Enterprise seat was losing $49 per month. Enterprise users were the platform’s most active, heaviest on Compliance Checker and Audit Trail Narrator - the two features that had just been identified as the cost drivers. Flat pricing made this loss invisible at the feature level. Per-user attribution made it impossible to continue ignoring.

The team repriced Enterprise to usage-based. The customer conversation wasn’t contentious because the data was exact - per user, per feature, per month figures with no estimates involved. That’s a meaningful career lesson about how technical work creates negotiating leverage. When an engineer can walk into a pricing conversation carrying granular, attributable cost data rather than approximations, the dynamic of that conversation changes entirely. The data does the arguing.

There’s a version of this story where the team never instruments at the feature level, continues optimizing Contract Analyzer for another two quarters, and eventually gets a difficult meeting about infrastructure costs in which nobody can explain where the money went. Attribution work prevented that meeting. It also surfaced a pricing structure that was actively subsidizing the company’s most demanding customers at a $49-per-seat monthly loss - a fact that no amount of application performance monitoring would have revealed, because APM tools measure latency and errors, not economic exposure.

The $2,592 monthly recovery came entirely from fixing the design of two features that had never generated a single error alert. The lesson for anyone building AI-integrated systems isn’t just technical - it’s about what kind of visibility you build into your stack before you need it, and whether you’ve structured your instrumentation to answer the questions that actually matter when the bill arrives.

Contract Analyzer’s average cost per call was $0.310. Compliance Checker’s was $0.087. Volume was the weapon, not unit cost.

The Setup: Six Features, One Blind Spot

What Forty-Eight Hours of Real Data Showed

The Pricing Problem Attribution Uncovered Next

Related Articles

How to Check If a Page Is Ready for AI Search

LunarSite: An End-to-End ML Pipeline for Lunar Landing Site Selection

When Your AI Colleague Fixes the Bug Before You Wake Up