API-First AI Agentic Patterns: Building Smarter Systems Without the Framework Overhead

Published in

Level Up Coding

16 min readMar 19, 2025

We’ve worked with dozens of teams building LLM agents across industries. Consistently, the most successful implementations use simple, composable patterns rather than complex frameworks. — Erik Schluntz and Barry Zhang

Let’s get real — frameworks are like those hipster, overly priced coffee houses. Yes, they’re cosy and make you feel part of something cool, but do you want a barista to pour milk into your coffee? No way. You can make it stronger, quicker, and just how you want it at home without additional expenses or waiting in line.

The same thing can be said for constructing AI agents. You don’t necessarily have to use LangChain, Haystack, or some other heavy-duty drag-and-drop library to get the job done. You need direct API access, a dash of ingenuity, and a cup of coffee to keep you awake.

Borrowing a leaf from Anthropic’s latest blog entry on AI agentic patterns (huge thanks to their team for the idea), I set out to take matters into my own hands and construct my autonomous systems — no frameworks, no superfluous layers, just pure API-driven wizardry. And you know what? It’s not only feasible, it’s supremely empowering.

Why Should You Care About Agentic Patterns?

Let us take a second before we get down to the nitty-gritty and tell us why agentic patterns are necessary in the first place.

AI agents are fantastic, but agentic patterns? That’s pure magic. You can imagine agentic patterns as blueprints for creating intelligent, autonomous systems that you can use repeatedly. Rather than constructing single-use bots that do something and you don’t see them again, you’re designing workflows that learn, get better, and even improve over time.

For instance, consider a customer service representative answering FAQs, reviewing past cases, searching for complaint patterns, and generating product development ideas on their initiative. A data processing agent can also discover anomalies and predict trends that become actionable insights without continuous human intervention.

The strength of agentic patterns is in their scalability and flexibility. They enable you to create systems that scale with you instead of being bogged down by the limited scale of current frameworks.

Why Go Framework-Free?

And so, you may ask: If there are frameworks to simplify our lives, why shun them?

In reality, frameworks are helpful — they abstract complexity, offer pre-existing components and usually possess strong community backing. However, they also have compromises to make. Frameworks are also bloated, rigid, and sometimes too much of an overkill for your attempt. They add excess overhead, hide what’s going on underneath the hood, and restrict your capacity to tailor your system.

You get complete control over your AI agents by going frameworks and depending on direct APIs. You can optimise for performance, customise the system to your requirements, debug easily when needed, and know precisely how everything functions. It’s like having your coffee machine rather than depending on Starbucks — you can choose the strength, flavour, and timing without middlemen.

We must first see the fundamental blocks that make us comprehend the agentic patterns. The blog post by Anthropic explains these patterns, and we will apply them with the OpenAI SDK and NodeJS to communicate with the LLMs directly.

Building Blocks

The entire purpose of AI agents — or what I prefer to refer to as agentic workflows — is to get our computers to do the heavy work for us. We don’t merely want them to sit there and instruct us on what to do; we want them to roll up their sleeves and get it done. Imagine having a personal assistant who doesn’t give you a to-do list but checks off the items for you. That’s the dream. Right?

To achieve this, we must take a step back and consider the way that we, as humans, do work. We don’t suddenly know everything — we look for information, act, and learn from past actions. We’re halfway there if we can provide AI agents with essential capabilities — looking(searching), performing functions(tools), and recalling past actions(memory).

This is where the concept of augmenting LLMs comes into play. Large language models are interesting right out of the box, but they’re not ready to dominate the world yet. They’re more like a bright intern with much knowledge but require mentoring to accomplish things. We can make them complete agents who can act independently and simplify our lives by providing them with select superpowers.

What Does It Mean to Augment an LLM?

Augmenting an LLM means giving it more than just the ability to generate text. It’s about transforming it from a passive conversationalist to an active problem solver.

Deal with it — no human or even an LLM can be omniscient. By having your agent access the web or question a database, you’re making the totality of world knowledge at hand. Need to view current stock prices? Must pull up a customer’s purchase history? An agent with searching capabilities can accomplish it in seconds.

A talk-only LLM is comparable to a chef who can write recipes but can’t cook. To be valid, it will have to be able to do things. That means adding APIs and functions that allow the agent to perform actions — send an email, update a spreadsheet, or even control smart home devices like IoT applications.

Aside from those, one of the most substantial limitations of vanilla LLMs is that they possess no memory. They’re akin to goldfish — every encounter is scratch-level. In memory, your agent will remember past actions, learn from them, and build context over time. This becomes key to being able to handle advanced, multi-step workflows.

Basic LLM Calls

Okay, now let’s move on to the exciting part — coding. Assuming you know how to invoke LLMs if you’re here. But to go through the basics, I’ll give you a brief rundown to get started. There are no frills, only the bare necessities.

Invoking an LLM directly is as simple as it comes. Whether you’re calling OpenAI, Hugging Face, Anthropic, or some other provider, the steps are essentially the same: authenticate, submit a prompt, and receive a response.

For instance, with OpenAI, you will install your API key, create your prompt, and issue a completion call. Hugging Face? Loading a model, feeding it some text, and letting it spit out is even easier. Anthropic creations are similar, with some modifications to their API format.

Don’t get me wrong when employing Gemini as the LLM; it offers a rich free tier and is acceptable for building and experimenting with agents.

Structured Output

Let’s discuss something underrated but worthwhile: structured output. You don’t always have to get a wall of text back when working with LLMs. You can have places where you’d like something cleaner, more predictable, and more actionable — like JSON, XML, or a simple good old-fashioned list. That’s where structured output appears.

Consider it. You cannot accept sloppy text if you build an AI agent that makes choices, reads information, or communicates with other computers. You require clean machine-readable output to proceed into the next phase of your pipeline.

For instance, if you are creating a weather bot, rather than the LLM returning with, “It will be sunny with a high of 75 degrees,” it could respond with something like:

{
 "weather": "sunny",
 "temperature": 75,
 "unit": "F"
}

This structured output makes it easy to parse, store, or use in downstream tasks. There is no need to write complex regex or natural language processing logic to extract the information — it’s already neatly organized for you.

All existing LLMs have out-of-the-box structured output support. You can prompt the model to return answers in a specific form by modifying your prompts or through features like function calls (OpenAI) or customized templates. Guide the model in the proper direction and say: “Hey, don’t just chat — format your responses like so.”

It is more about optimising and innovating your AI agents. The better and more optimised your predicted outputs are, the better your agents can perform more complex tasks, fit well into other systems, and bring more value.

Invoking Tools

AI note-taking agents are similar to text-producing AI agents. Naturally, they’re helpful but are not rolling up their sleeves and getting it accomplished. It’s here that the tools are utilized. It makes your agent greater than merely a passive observer but an active one.

Tools are out-of-band actions or APIs your agent can invoke to cause something to happen. Need to read from a database? Tools for that. Want to send an e-mail or fill out a spreadsheet? Got it; tools can do all of those things, too. They’re the interstitial between your agent’s abstractions and the physical world.

For instance, if you’re building a customer support agent. Rather than simply making suggestions, it could:

Pull a customer’s purchase history from an API.
Check inventory in real-time.
Follow-up/refund — all done automatically.

The magic here, herein, is integration. Connecting your LLM to these sites enables it to think and act. It’s as if you’re moving from a GPS that tells you where to go to one that can drive the car for you (concept of Tesla?).

The majority of contemporary LLMs implement this type of functionality. You specify the tools and what you want them to do, and your agent determines when and how they are implemented. It is not writing out each step but providing your agent with a way to create creative solutions.
Plug in the necessary tools and allow your agent to do the rest. When your agent is able not only to speak but to act as well, that is when it is made strong.

When we refer to calling tools, it is not merely calling APIs or functions. It also makes your agent search for a knowledge base, database, or vector store. Consider it as one additional tool in your agent’s toolbox, such as sending an email or retrieving data.

For instance, consider your agent answering a customer question such as, “Where is my order?” Rather than trying to guess or fabricate, it can look up the knowledge base about the user’s order, retrieve the correct information, and apply that to create an accurate response. It’s not much different from invoking a function or employing RAG. It’s sketching the proper information at the appropriate moment. The agent is starving; it fetches it via an API, database, or document. When you integrate KB search with other technology, your agent isn’t merely answering questions; it’s resolving issues, making decisions, and improving what it learns. You’re giving your team a better memory and a master key to the universe.

Workflow Patterns

Prompt Chaining

Prompt chaining is one of those methods that will be the only sensible choice once you’ve begun to do it. You build little, discrete steps in combination rather than having your LLM try a gigantic task simultaneously. One by one, each brings the model closer to where you need it to be. It’s similar to solving a puzzle — piece by piece.

For instance, if you’re creating an agent to assist customers in planning a holiday, rather than providing the model with a long prompt like “Plan a 5-day trip to Japan including flights, accommodations, and activities,” you do it step-by-step. Then, first, you’d type, “What are the best places to visit in Japan?” After getting that list, you would type, “What are the best flights to Tokyo?” Third, “Find hotels in Tokyo for 3 nights,” and last, “Give me a day-by-day itinerary in Tokyo.”

This strategy is not just more straightforward for the model but for you as well. You gain better control of the process by breaking down the job into smaller portions. If you have made some mistakes, you can re-tune the follow-up prompt or go back and fix the last step. It is like talking, wherein you can set the tone and direction while carrying on.

The second benefit is flexibility. Add, remove, and resequence steps with prompt chaining are simple. You can insert a budgeting step or rethink and adjust a prior response. It is a flexible process that adapts to your requirements.

It’s a simple, natural method of tackling complex tasks, and it is the way we naturally get things done — step by step, with space to adjust in the process.

Gating

When creating an AI agent, becoming enthusiastic about what it can accomplish is easy — responding to questions, extracting data, and getting things done. But even more important is how it understands what to do and when. That’s where gating comes into play. It’s creating decision points or checkpoints within your agent’s workflow to stay on track and not fritter time away or go down the wrong path.

Imagine this: your agent in a maze. It would push through without gating, taking blind turns and wishing it well. With gating, it stops at the crossroads, looks over, determines if it will proceed, circles, or closes up shop. It’s easy to insert smarts and productivity into your agent’s trip.

For instance, in the case of a customer support agent. If asked, the agent would first determine if the question is simple. If not, it would ask for additional details first before answering. Or, if it is requesting a database and doesn’t get anything, it might crash and inform you rather than going on without having all the facts.

Gating is helpful in complex workflows where a single mistake could sabotage the whole job. Picture an invoice-processing agent. The agent can gate on each step — such as error checking or missing fields — like extracting, validating, and passing it on to accounting. If there is something amiss, it gates and alerts the problem rather than going ahead and causing a mess.

Gates may be as simple as a pair of conditional checks or as complex as a decision module. Creating such gates where necessary is necessary so your agent becomes intelligent, not conscientious. Consider how it will make decisions rather than merely considering what it will do in creating an AI agent. Gating is an easy yet effective way of keeping your agent in line and always moving towards the correct goal.

Routing

Okay, now routing — because even AI agents require a little guidance in life. Routing is the GPS of the system, ever quietly steering tasks, queries, and information to the correct location at the right time. It is not flashy but prevents your agent from getting stuck in the weeds.

When a task comes in, your agent doesn’t just throw darts at a board blindly to determine what to do. It pauses (figuratively) and determines: “Do I make an API call? Send this to another agent? Or process it myself?” Routing figures out those choices, sending each task to where it belongs.

For example, if you’re making a customer support representative, a user asks, “What’s my account balance?” Routing answers, “Billing API, you’re up.” But if the user asks, “What’s a decent noodle recipe?” it might send that to a recipe database or get the LLM to do it in-house. It’s all about finding and putting the task into the right tool or path.

What the routing is so thrilling about is how flexible it is. Your agent is not bound to the script repeatedly and can turn on a dime. One minute, the agent is fetching data; the next minute, it’s generating a report; and the next minute, it’s generating an email. Routing helps your agent understand where to send each task, no matter how random they are.

And the good news? Routing doesn’t have to be complicated. It can be as simple as some “if-else” statements, LLM calls, or as evolved as a machine learning model that determines the best route based on past behaviour. The intention is to make it flexible so your agent can do whatever you throw. While routing isn’t the show’s star, the behind-the-scenes hero is what makes it all happen. It’s the decision-maker, the problem-solver, the glue that keeps your agent together. And to be honest, that’s what makes it so darn crucial.

Parallelization: Sectioning

Sectioning is excellent if the subtasks you break a task into can run in parallel, which provides your AI agent with a good speedup. The idea is simple: split a giant task into teeny-tiny, separate pieces and have your agent work through them all at once. It’s similar to having many experts work together, each doing his portion of the job.

LLMs work better if each subtask has its very own call. Instead of making one LLM deal with many considerations, you allow each call to focus on one aspect. This improves precision and makes the whole process quicker and more effective.

For example, suppose you’re creating a system where one instance of an LLM handles end-user questions while another does guardrails for offensive content or requests. It’s better to have it this way than to have the guardrails plus the core answer done by a single LLM call. By distributing the work between the two of them, you enable each LLM to utilize its best strengths while you get quality results overall.

Another great example is measuring LLM performance. Instead of taking one LLM call attempt to measure everything simultaneously, several calls can test different aspects of the model performance for a given prompt. One call can test for facts, another for tone, and another for relevance. Testing these in parallel speeds up the process and enables you to receive more granular and actionable feedback.

Sectioning is not merely splitting work into smaller pieces — it’s about making them work in parallel to deliver the best possible efficiency. It’s a speed, accuracy, and performance game-changer when done right.

Parallelization: Voting

Many times, one go-through of an activity isn’t enough. That’s where voting comes in. It’s simply repeating the same task repeatedly to get different outputs or perspectives, especially when you need more confidence in the result. Consider asking experts for their opinions — each has a different perspective, giving you a more trusted answer.

Voting is helpful for nuanced problems with many factors, such as juggling accuracy and danger. Say you’re analyzing some code for security vulnerabilities. Instead of using a single call to one LLM, you could make a series of calls on individual prompts to work their way through the code, each checking a specific vulnerability. If only one returns a problem, you know it’s worth pursuing further.

Another excellent use case is to decide whether the content is inappropriate. You could have several prompts that judge various things — tone, language, or context — and different vote thresholds to trade off false positives and negatives. One prompt may look for offensive language, another for sensitive topics, and a third for general context. By averaging their results, you have a more nuanced and precise judgment.

Voting is extremely powerful since it adds depth and predictability to your AI agent’s decision-making. You’re not rooted in a single response, but gathering multiple, fewer mistakes, more confidence in outcomes. It’s like seeking the advice of several friends over one tough decision- each of which contributes something uniquely different, altogether allowing them to give you a broader picture. Use voting to get diverse outputs, compare them, and make smarter, more informed decisions. It’s a low-key idea but a giant leap toward developing an agent you can trust. And come on — a little confidence goes a long way regarding AI(concept of deep seek?).

Orchestrator-Workers

Consider it: you’ve got a cluttered project and have no idea where to begin. That’s what the orchestrator worker’s workflow is for. You’ve got this project manager (the orchestrator) who takes the chaos, breaks it into smaller, manageable pieces, and subcontracts it to a team of specialists (the workers). With them, you’ve got the project done — economically and quickly.

That is where the orchestrator LLM gets it done. It has no agenda; it makes decisions along the way. It gets the job, determines how to break it down, and sends each piece to an LLM worker. The workers start, and the orchestrator assembles them into one completed product.

For instance, if you’re creating a tool that has to perform complicated editing on code, The orchestrator decomposes the work, identifies what files must be changed and assigns each worker. The workers perform their tasks, and the orchestrator constructs the changes.

A search task requires getting and reviewing information from multiple sources. The orchestrator may discover the best sources and allocate each to a worker to question and summarize the results into an easy solution.

What is excellent about this workflow is that it is highly flexible. Unlike parallelization, where the jobs are pre-specified and run parallel, the orchestrator-workers model can learn to adapt to the job. The brain and muscles cooperate to complete the job well, wisely, and smoothly.

Evaluator-Optimizer

At times, things being done are not sufficient. You want them done well. This workflow is about reviewing what your AI agent gives you, going through it, and making it better so it’s the best it can be. It’s as if having an editor tell you what’s not good and assisting you in making it better.

That’s how it works: Initially, the evaluator leaps in. It checks for output, sees errors, and sees how much the output has adhered to the task standards. They’re like the quality control department, giving everything a shape. When the evaluator has gone through, the optimizer steps in. As per the evaluation from the evaluator, it alters, sharpens, and fine-tunes the output. Between them, they turn good outcomes into great ones.

For example, if you are developing an AI agent to compose marketing copy, the evaluator would read the text for tone, clarity, and relevance. At the same time, the optimizer would shape the language into something more compelling or persuasive. Or if you are developing a coding assistant. The evaluator could highlight potential bugs or inefficiencies, and the optimizer would clean up and optimise the code.

Instead of taking the first draft, you’re iterating and refining until the output is as good as possible. You have a feedback loop in the system, so your agent improves over time. Building an AI agent where quality matters — writing, coding, or anything in between — the evaluator-optimizer process is your approach. It’s not about getting the job done; it’s about doing it well.

We are at the end of a wild ride through the API-first AI agentic patterns world. We’ve moved from prompt chaining and parallelization to orchestrator-workers and evaluator-optimizers. And, of course, routing, sectioning, and voting — because who doesn’t love some multitasking AI?

The bad news? You don’t need a full-fledged framework to do any of this. With direct APIs, some creative imagination, and maybe a lot of coffee, you can write AI agents as smart, fast, and friendly as you ever imagined. If you’re automating support, handling information, or building the following considerable coding utility, these patterns give you the tools to make it happen.

I’ve included a few code samples (thanks to OpenAI SDK and Node.js devs!) to kick-start you, but the real trick is extending these ideas into your projects. The patterns that the Anthropic authors and I have outlined are akin to LEGO pieces — build combinations, customize, and make your own.

Before we leave you, thank the Anthropic blog and team for the inspiration. Their ideas regarding AI agentic patterns were the spark that ignited this whole operation. If you haven’t, look at their stuff — it’s a goldmine of thought.

So, now what? Your call. Take these patterns, flip on your IDE, and go code. Whether you’re an old pro at coding or just starting to dip your toes into AI — there’s never a worse time to try, improve, and make something amazing.

And hey, if you do something amazing, share it with the world. Because the best part of this ride isn’t even what you create — you’ll be encouraging other folks to develop, too. So create. The future of AI agents is yours to shape — and for real, it will be awesome.