Build vs Buy Is the Wrong Question

Movement 01

The Question Every Leader Is Asking

The conversation happens in every architecture review...every vendor pitch...every strategic planning session. Someone puts a workflow on the screen... claims processing, sponsor onboarding, customer service escalation... and the question lands:

"Do we build this or buy something off the shelf?"

And the answers are predictable. The vendor says buy. The engineers say build. The architects say "it depends." And the leader makes a decision based on incomplete information because nobody in the room has named the thing that actually changed.

Here's the thing. The question itself is outdated. "Build vs buy" assumes a world where workflow engines are hard to build, where the orchestration logic is the expensive part, where the differentiating value lives in the plumbing. That world ended about eighteen months ago. Most people just haven't updated their mental model yet.

Since I joined Sun Life, this question has come at me from every direction. We have Pega powering a chunk of our Operations workflows... claims routing, case management, the kind of process orchestration that's been running for years. We have a legacy platform handling document generation, and the license is aging, so someone asks: do we replace it with another vendor product or build something ourselves? We have teams exploring lightweight open-source orchestration tools for lighter-weight automation. And now, layered on top of all of it, the agentic AI conversation... what happens when Pega and Salesforce start embedding their own AI into these platforms? Do we use theirs? Do we build our own? How does any of this fit together?

Every one of these conversations defaults to the same framing. Buy for commodity. Build for competitive advantage. That's the traditional playbook. And it's not wrong... it's just incomplete. It doesn't account for what changed in the last eighteen months. The people in the room are trying to make a sound decision with a framework that predates the thing that rewired the economics.

I think there's a massive opportunity to elevate this conversation. Not to answer "build or buy" better, but to ask a fundamentally different question. That's what this piece is about.

Movement 02

What AI Actually Dissolved

To understand why the question is wrong, you need to understand what a traditional workflow engine actually solved. Not what the marketing says. What it actually did that was hard.

→

Deterministic Step Sequencing

If step A succeeds, do step B. If it fails, do step C. Branch on conditions...loop on errors...the logic tree that routes work through a process.

Dissolved

→

Intelligence at Each Node

Read this document...extract these fields...classify this request...decide if it's compliant...route to the right team. This was the expensive part. Months of rules engineering, custom NLP, OCR pipelines.

Dissolved

→

Integration Glue

Connect Guidewire to Salesforce to ServiceNow. Transform data between systems. Handle authentication, pagination, error mapping.

Dissolving

→

Human Approval Gates

Pause the workflow...notify a human...wait for approval...resume or reject...audit the decision.

Dissolved

→

Visual Design Canvas

Drag and drop...connect boxes...non-technical users can "see" the workflow. This was the selling point of every n8n, Zapier, and ServiceNow flow designer.

Dissolved

LLMs dissolved all five of these. And not in the way vendors dissolve things on a roadmap slide... actually dissolved them, to the point where rebuilding them the old way feels like commissioning a horse-drawn carriage.

Start with the intelligence at each node. That was always the hard part... "read this PDF, extract the sponsor details, validate against the policy rules, flag exceptions." That used to be a six-month integration project. Custom NLP pipelines, OCR tuning, rules engines that took a team of three just to maintain. Now it's a prompt. The thing that made workflow engines expensive is now a commodity API call.

Step sequencing went with it. An LLM reasons about what to do next based on context...it doesn't need a hard-coded decision tree. It handles branching, edge cases, and novel situations that no flowchart anticipated. The intelligence IS the orchestration. And once you see that, the visual canvas... the drag-and-drop flow designer that was the selling point of every n8n, Zapier, and ServiceNow... stops making sense. "Show me what happens when a sponsor setup request comes in" is now a prompt. And unlike a visual designer, the LLM can explain, modify, and reason about the workflow in natural language.

Integration glue is dissolving fast too, though I'll be honest... it's the one that's furthest from done. MCP and tool-use patterns are collapsing it, but complex enterprise systems with legacy APIs aren't there yet. An agent with the right connectors can talk to Salesforce and Guidewire through structured tool calls. But "the right connectors" still requires real engineering for systems that were never designed to be talked to this way. The trajectory is clear. The last mile is messy.

Approval gates are already solved. Tool-use with human-in-the-loop patterns. The agent calls a "request_approval" tool...the human sees the request in context...approves or rejects...the agent continues. No workflow engine required.

"The hard part of workflows was never the plumbing...it was the intelligence at each step. LLMs dissolved the intelligence problem. The plumbing was always commodity."

Movement 03

What Didn't Dissolve

Now here's where the nuance matters...and where the "just have the LLM do everything" crowd gets dangerous. Especially in regulated financial services.

LLMs handle the intelligence. They do NOT natively handle the operational guarantees that regulated industries require. And if you hand-wave past these, you're building a demo, not a system.

Durability

What if the server restarts mid-workflow?

LLMs are stateless. If a sponsor setup is halfway through and the process dies, where does it resume? A pure LLM workflow has no answer. State persistence requires infrastructure... a database, a state machine, a recovery mechanism. This isn't sexy, but in an FI, it's non-negotiable.

Exactly-Once Execution

Did that approval email send... or did it send twice?

Idempotency and delivery guarantees aren't LLM problems. They're distributed systems problems. If a workflow sends a notification, hits a timeout, and retries... does the customer get two emails? Does the payment process twice? These failure modes require careful engineering.

Audit-Grade Logging

Prove to OSFI that step 4 was executed before step 5.

Regulators don't want probabilistic reasoning. They want deterministic proof. What happened...when...in what order...who approved it...what data was used. The audit trail needs to be immutable and complete. LLM chain-of-thought is not an audit trail.

Compensating Transactions

Step 4 failed. Roll back steps 1-3.

If a workflow partially completes and then fails, you need to undo the completed steps. Reverse the provisional account credit...cancel the notification...remove the flag. This is transactional logic that requires explicit handling, not LLM reasoning.

Add to this: SLA enforcement (this workflow MUST complete in 4 hours or escalate), concurrent execution at scale (200 sponsor setups running simultaneously), and graceful degradation (what happens when the LLM provider goes down mid-workflow?).

These are boring infrastructure problems. But in a regulated FI, they're the problems. If a sponsor setup half-completes and nobody knows, that's a compliance incident. If an approval gate gets skipped because of a race condition, that's an audit finding. If you can't prove the sequence of events to a regulator, you have a material gap.

"People hear 'LLMs handle everything' and stop thinking. That's wrong. LLMs handle the intelligence... which was the expensive part. The execution infrastructure is simple now. But it still has to exist."

Movement 04

The New Architecture

So what does the right architecture look like? If you're not buying a workflow engine and you're not building one from scratch... what are you actually building?

The pattern I keep seeing from teams that have shipped this... Spotify, Shopify, the teams I've worked with directly... is surprisingly consistent. And simpler than you'd expect.

Intelligence

Agent Orchestration · LLM Reasoning · Tool Calls · Context Engineering This is 90% of the value. What to do, how to reason about it, which tools to invoke. This is what you own.

Governance

HITL Approval Gates · Audit Trail · Risk Tiering · Compliance Rules Human checkpoints at blast-radius boundaries. Every decision logged. Regulatory requirements as code. You own this.

Execution

State Machine · Durable Queue · Retry Logic · Compensation Thin layer. Postgres table + state tracking. Or Temporal/Inngest if you need scale. Commodity infrastructure.

Connectors

MCP Servers · API Integrations · System Connectors · Data Adapters How agents talk to your systems. Guidewire, Salesforce, ServiceNow, internal APIs. Increasingly standardized via MCP.

Why This Matters

Look at the proportions. The intelligence layer is 90% of the value and 90% of what differentiates you. Which agents, which reasoning patterns, which context assembly, which tool chains, which approval logic. This is where your domain expertise lives...where competitive advantage lives. You can't buy this. No vendor sells your operational judgment.

The execution layer? That's a state machine. A Postgres table with workflow_id, current_step, status, started_at, completed_at...a background worker that checks for stuck workflows...a retry mechanism with exponential backoff. This isn't a platform purchase. It's a few days of engineering.

Stripe learned this. They built their own orchestration on top of Temporal (open source durable execution engine) with their own intelligence layer on top. The workflow "engine" is almost trivially thin. The value is in the agent logic, the prompt engineering, the tool integrations, the governance gates.

You don't need to build Temporal from scratch either. A thin durable execution layer plus the AI orchestration that your agents provide. That's the whole thing.

90% Value = Intelligence

~10% Value = Execution

0% Value = Vendor Lock

Movement 05

The Decision Framework

So if "build vs buy" is the wrong framing... what's the right one? Here's the decision framework I use. Three questions. Answer them honestly and the path becomes clear.

Question 1: Where is the intelligence?

If the workflow is purely deterministic... "if field X equals Y, route to queue Z"... a traditional engine is fine. Use n8n. Use ServiceNow. Use whatever. There's no intelligence to own. But if the workflow requires judgment... reading documents, classifying requests, making contextual decisions, handling edge cases... you need an LLM at the nodes. And the moment you need an LLM at the nodes, the "engine" becomes irrelevant. The intelligence IS the workflow.

Deterministic Workflow

Route, branch, loop

No reasoning required. Rules are static. Conditions are binary. The engine does the work. Traditional platforms serve this fine. This is a decreasing percentage of valuable workflows.

Intelligent Workflow

Reason, decide, adapt

Every step requires judgment. Conditions are contextual. Edge cases are common. The LLM does the work. The "engine" is just state tracking. This is where the value is migrating.

Question 2: Where does the competitive advantage live?

If your workflow is a commodity process... basic HR onboarding, standard IT ticketing... buying makes sense. The workflow isn't a differentiator. But if the workflow encodes your operational expertise... how YOU process claims, how YOU onboard sponsors, how YOU assess risk... then buying a platform means encoding your competitive advantage in someone else's abstractions. You're renting your own judgment back.

Question 3: What's the regulatory surface?

In financial services, the audit trail matters. The governance matters. The ability to explain exactly what happened and why matters. When you buy a platform, you inherit their governance model. When you own the intelligence layer, you design governance that maps to YOUR regulatory requirements... OSFI, SOX, internal audit. The governance layer isn't overhead. It's the product.

In Practice

Here's how it plays out.

Deterministic + commodity + low regulatory surface? Buy. Use n8n, Zapier, ServiceNow flows. Don't over-engineer it.

Intelligent + differentiating + high regulatory surface? Own the intelligence. Use commodity execution infrastructure. Build the thin orchestration layer that connects your agents to your systems with your governance.

Mixed? Split. Use bought platforms for commodity workflows. Build the intelligent layer for differentiating ones. Don't force everything into one paradigm.

The Call

Stop Debating. Start Decomposing.

The next time someone puts a workflow on the screen and asks "build or buy?"... stop. Reframe. The workflow engine as a category is dissolving. What remains are two distinct things: intelligence that requires judgment, and execution infrastructure that doesn't.

The expensive mistake is licensing a platform to solve a problem that LLMs have already dissolved. You end up encoding your operational expertise in someone else's abstractions and paying annually for the privilege. The other expensive mistake is assuming LLMs handle everything and shipping something that can't survive a server restart or explain itself to a regulator.

The right move is knowing which layer is which. We're doing this now at Sun Life... looking at our Pega footprint, our document generation stack, our orchestration experiments, our agentic AI ambitions... and instead of asking "which vendor replaces which vendor," asking where the intelligence actually lives in each workflow and who should own it. Some of those workflows are commodity...some of them ARE our competitive advantage. The answer is different for each one, and that's the point.

"Build vs buy" was the right question in 2020. In 2026, the right question is: "What do I need to own, and what can I compose?"

"Own the intelligence. Compose the infrastructure. And stop debating a question the technology already answered."