Technology

How Agents Work in Omnilude's ai-service

AAnonymous
12 min read

Introduction

This article is the follow-up to Why Omnilude's ai-service chose the Assistant / Thread / Run structure. In the previous post, I explained what assistant, thread, and run each do. This time, I want to show how agent actually works on top of that structure.

When I explain this structure, there is one misunderstanding I want to clear up first. Omnilude's AiAgent is not some giant magic box. It is not a layer that turns assistants into something completely different. In fact, it is closer to the opposite. AiAgent is a structure that groups multiple assistants as nodes and stores their execution order as a workflow.

To put it more simply:

If AiAssistant is an individual LLM preset, AiAgent is the workflow that connects those presets.

In this article, I will explain how that flow is implemented in code, as simply as I can.

An Agent is not a separate AI but a workflow

The first thing to look at is the AiAgent entity. It contains a name, description, and type, but the real core is the workflow field. This value is stored as JSONB instead of a separate table.

That means in Omnilude, an agent is less like "an object that calls one model more intelligently" and more like an execution graph expressed with nodes and edges.

For example, one agent can look like this:

  • It receives input at a start node.
  • A generate-text node uses one assistant to create text.
  • A router node decides the next path.
  • The final finish node organizes the result.

Once you see that, the difference between assistant and agent becomes much clearer.

  • An assistant is a preset for one inference.
  • An agent is a workflow that defines the order and conditions that connect those inferences.

An Agent is not a separate AI but a workflow (Text reference)

AiAssistant
  - preset for a single LLM call
  - defines which model/provider/instructions to use

AiAgent
  - a workflow connecting multiple nodes
  - each node may use an assistant or behave like a tool
  - the key is not "one call" but "execution order and connections"

A request does not run immediately

In Omnilude, AiAgent execution usually does not end as a synchronous function call. The important component here is DTE (Distributed Task Executor).

The flow looks roughly like this.

  1. A client or the backoffice requests agent execution.
  2. The server does not process it immediately and turns it into a DTE job.
  3. WorkflowTaskHandler picks up the job from the queue.
  4. SingleWorkflowExecutor takes over the actual workflow execution.
  5. Inside it, BasicWorkflowEngine advances through the nodes one by one.

This structure matters for a simple reason. Agent execution can be longer than expected, may include streaming, needs to keep intermediate state, and in some cases must run in the background. Rather than forcing all of that into a normal request-response cycle, treating execution itself as a job is far more stable.

A request does not run immediately (Text reference)

Controller
  -> AiAgentExecutor
  -> DistributedTaskQueue
  -> WorkflowTaskHandler
  -> SingleWorkflowExecutor
  -> BasicWorkflowEngine

Key points
  - execution is handled as a job
  - streaming and background processing become easier
  - long-running execution can be managed within one structure

It starts from one start node

Having a workflow does not mean the engine simply walks through a list from beginning to end. BasicWorkflowEngine first finds a node where startNode=true and the type ends with -input. It then injects the initial input into that node and starts execution from there.

I like this part quite a lot. It treats an agent not as "one command line," but as an execution graph with a clear entry point.

For example, the text-input node prepares the string provided by the user and passes that value to the next node. Then the generate-text node receives that value and triggers an LLM call. After that, the output propagates to the next node along the edge.

In other words, an agent is not one giant function. It is a flow made of small execution units connected together.

What actually executes nodes is NodeExecutor

This is where the really important layer appears: NodeExecutor. Each node has a type string, and NodeExecutorFactory finds the executor that matches that type. NodeExecutorRegistry scans the @NodeType annotation and automatically registers which executor handles which type.

Because of that structure, adding a new node does not require rewriting the whole engine. You make one node, declare its type, attach its provider, and it becomes part of the runtime.

There are four representative nodes that are especially useful to understand in this article.

  • TextInputNode: prepares the initial input.
  • GenerateTextNode: uses an assistant to generate actual text.
  • RouterNode: uses the LLM to decide the next path.
  • FinishNode: collects the final result and ends execution.

Once you understand these four, you can already get a good sense of how Omnilude's agent structure works.

Even inside an Agent, the actual inference is done by assistants

This is where the previous post and this post connect again.

Even if AiAgent exists, the agent itself is not what directly calls the model. The actual LLM call happens inside a node, and at that point AiAssistantService appears again.

For example, GenerateTextNode reads aiAssistantId from its configuration. It then loads the assistant by that id, combines AiModel, AiProvider, and AiApiKey, and creates a RunnableAiAssistant. Only after that does the actual chat model call happen.

So the right way to read the structure is this:

  • assistants are components for inference
  • agents are workflows that decide the order in which those components run

Once you understand that, it becomes natural to stop seeing assistant and agent as competing concepts. They are not alternatives. They are layers.

Even inside an Agent, the actual inference is done by assistants (Text reference)

AiAgent
  - upper-level execution unit that has a workflow

Node
  - selects and calls an assistant when needed

RunnableAiAssistant
  - the real executable object composed of assistant + model + provider + key

On the /ai-runs path, the agent sits on top of conversation context

One more interesting part is the /ai-runs path. This path is not just an API that directly executes a stored agent. It first reads the messages in a thread, then executes a specific AiAgent workflow on top of that context.

In the current implementation, ChatAgentTaskHandler selects an agent type like AiAgentType.USING_WEB_SEARCH_TOOL and passes it to SingleWorkflowExecutor. That means a run is no longer just a simple LLM call at this point. It becomes a command that starts an agent workflow on top of conversation context.

This is one of the reasons Omnilude's ai-service feels more like a platform than a thin model wrapper. The same inference structure takes on a different product meaning once it is placed on top of the thread/message/run interface.

I am intentionally leaving multiagent out of this article

If you read the code, you will also find the multiagent package. But its direction is a little different from the AiAgent workflow explained here. It is closer to a separate layer that orchestrates multiple agents and handles bigger concerns such as dynamic routing, retries, and human review.

More importantly, there are still parts in progress, so it is not yet something I want to explain on exactly the same level as the current AiAgent runtime. That is why I intentionally left it out of this article. The scope here is strictly the structure for running one agent workflow by combining assistants.

Narrowing the scope this way makes the article easier to follow and keeps the explanation from getting blurry.

I think this structure is quite realistic

If you look at it again, the agent in Omnilude's ai-service is not some grandly named different AI. It is not another wrapper around assistants either. If anything, it is more practical.

  • The execution flow is stored in AiAgent.workflow
  • Actual execution is handed off as a DTE job
  • The engine moves from the start node along edges
  • Each node calls an assistant when it needs an LLM

I think this structure works because it does not exaggerate extensibility. Assistant, thread, run, and agent all have different roles, and the boundary of each responsibility is relatively clear.

The more AI features you put into a product, the more important this kind of structure becomes. What lasts longer is not just calling a model well once, but deciding in what unit things are stored, through what interface they run, and through what flow they connect.

Closing

In the previous article, assistant / thread / run were the basic concepts that formed Omnilude's execution platform. In this article, agent is the composition layer placed on top of them. AiAgent is not a separate AI. It is an entity that stores a workflow, and its execution flows through DTE -> WorkflowExecutor -> NodeExecutor.

And the actual inference is still handled by assistants. In the end, Omnilude's ai-service grows not by putting assistants and agents in opposition, but by building agents out of assistants as components.

Next time, I will take this one step further into practice and explain, through concrete examples, how I design actual workflow JSON and node compositions.