Why Did Omnilude's ai-service Choose an Assistant / Thread / Run Structure?
Introduction
In this post, I want to share how Omnilude's ai-service is implemented. There is a famous saying most of us have heard at least once: "Don't reinvent the wheel." I agree with it too. When I designed the AI execution model for ai-service, I had already used OpenAI's Assistants API (Beta), and that experience was not bad. I did not have a more brilliant idea at the time, so the initial structure was adopted in a way that was very close to imitation.
OpenAI's Assistants API had a fairly simple structure. It had a skeleton of assistant / thread / message / run / run step, and you could understand Assistant as a preset, Thread as a conversation container, and Run as the actual execution. That conceptual axis is clearly visible in OpenAI's official deep dive and FAQ.
However, Omnilude did not copy that structure as-is. We were not building a service that only looked at OpenAI. We had to handle multiple execution paths in one platform, including local models, OpenAI-compatible servers, Anthropic, Ollama, and LM Studio.
So Omnilude's ai-service borrows concepts from OpenAI Assistants, but the actual implementation moved in a more separated direction. In this post, I will walk through that structure as simply as I can.
First, What Does It Store?
Omnilude's ai-service contains several AI-related objects. The names can look complicated at first, but the roles are simpler than they seem once you split them apart.
AiProvider: stores which provider is being used. This is the axis for OpenAI, Anthropic, Ollama, LM Studio, and so on.AiApiKey: stores the authentication key connected to that provider.AiModel: stores the actual model information, such as model name, reasoning support, pricing, and context size.AiAssistant: defines an execution preset for what purpose the model is used and what instructions it follows.AiThread,AiMessage,AiRun,AiRunStep: store conversation and execution history.AiAgent: the upper layer that combines multiple assistants into a workflow.
If I reduce it to one sentence, it looks like this.
Provider and Model are the ingredients, Assistant is the recipe, and Thread and Run are the actual order and cooking record.
The relationship becomes easier to understand when you see it as a diagram.
This is also where it resembles OpenAI's older Assistants API. The skeleton itself is similar: Assistant, Thread, Message, Run, and Run Step. The difference is that Omnilude manages provider / model / apiKey outside the Assistant and treats them as separate assets.
Why Split It This Way?
The reason is simple. In Omnilude, I did not want to treat an Assistant as just a lump of prompt text.
Suppose we create one assistant. What we really want to store is not just instructions.
- which provider it uses
- which model it uses
- whether it supports reasoning
- whether the response format is text or JSON
- how temperature and top-p are set
- even if the model changes later, which model is treated as the original baseline
You can see that idea directly in the actual AiAssistant entity. modelId and primaryModelId are separate, and apiKeyId and primaryApiKeyId are also separate. But this is not only about remembering the original baseline model. The more important reason is fallback. If a specific provider goes into a panic state, a certain key gets blocked, or the currently connected model becomes unstable, the service needs to keep going by switching to another model and key. In other words, separating the currently attached model and key from the primary model and key makes it possible to operate more robustly even during failures.
Because of this structure, an assistant is not just a prompt template. It becomes an operable execution preset.
This is one of the biggest places where Omnilude differs from OpenAI Assistants. OpenAI's Assistant gave the impression of bundling model + instructions + tools together. Omnilude, by contrast, makes provider, model, and API key independent assets first, and then designs Assistant as the composition layer built on top of them.
Assistant Does Not End as a DB Entity
There is one more important step here. A stored AiAssistant is not executed directly.
Right before execution, AiAssistantService combines the following information at once.
- the assistant's own preset values
- the connected model information
- the connected provider information
- the encrypted API key stored in the system
Then it turns that result into an execution object called RunnableAiAssistant.
That name matters quite a bit. In Omnilude, the assistant stored in the database and the assistant actually used for execution are treated as different things.
- DB-side
AiAssistant: an entity that stores configuration - execution-side
RunnableAiAssistant: an object ready to be used for a model call
This separation has obvious advantages.
First, the execution layer no longer has to care about JPA entities.
Second, provider-specific differences can be hidden inside RunnableAiModel implementations.
Third, reasoning options, response formats, and sampling parameters can be interpreted in one place at execution time.
In short, Omnilude separates storing an assistant as data from actually running an assistant.
How Does a Real Call Flow?
The simplest path for directly executing an assistant looks roughly like this.
The key piece in this flow is LlmGateway. In Omnilude, most direct execution paths now go through this gateway.
And the gateway is doing more than it first appears.
- applying provider-specific rate limits
- recording execution start and end
- creating and managing the state of
AiRunandAiRunStep - recording request and response payloads
- collecting token and cost information
In other words, it is not just calling assistant.chat(). It wraps that call as a platform execution with tracking attached.
That difference becomes important very quickly. Once you start using AI features more seriously, it matters less which prompt was used and more which assistant produced which result with which model at what cost.
Why Keep Thread, Message, and Run?
This is the point where many people will ask a natural question.
If you can run an assistant directly anyway, why keep Thread, Message, and Run separately?
I think this is a very important part of Omnilude's structure.
An Assistant is only a preset. But real user experience does not end with one preset. Users continue conversations, accumulate messages, request execution at a specific point, and receive the result back as another message.
Omnilude splits that flow like this.
AiThread: the conversation roomAiMessage: each turn exchanged between the user and the assistantAiRun: an execution request based on a specific messageAiRunStep: the internal step-by-step trace of that execution
I think this is a fairly practical way of bringing the structure shown by OpenAI Assistants into a real product runtime.
For example, a client can first create a thread, then add a user message inside it, and then call /ai-runs. From that point on, it is no longer just a chat completion call. It becomes an execution unit with conversation context.
Why does that matter?
First, you can connect execution history to the conversation. Second, you can manage where to restart from within the same thread. Third, you can attach additional features like title generation, favorites, and reset at the thread level.
The more you turn AI into a product, the longer this kind of interface tends to survive compared with a single direct call.
But Pressing Run Does Not Always Mean One Assistant Runs
This is another place where Omnilude differs from OpenAI Assistants.
The impression of OpenAI Assistants was relatively clear: there is an Assistant, there is a Thread, and Run executes that assistant on top of the thread.
But Omnilude's /ai-runs is now closer to a workflow entry point. In the actual implementation, AiRunController does not call a model right away. It pushes a task into the DTE queue. Then ChatAgentTaskHandler takes that task, reads the thread messages, and executes the workflow of AiAgent.
In other words, in Omnilude today, Run does not stop at meaning run one assistant. Depending on the case, it can be the starting point that executes a workflow containing multiple assistants.
Seen this way, the structure becomes clearer.
- Assistant: one LLM preset
- Agent: a workflow composed of multiple assistants
- Run: the execution unit that starts that workflow in reality
This difference is why Omnilude's ai-service feels one step closer to a platform than to a simple assistant repository.
Not Every Path Is Exactly the Same Yet
There is one realistic note worth leaving here. Even if the structure is well organized, not every entry point currently follows the exact same execution path.
For example, the backoffice playground and direct inference paths go through LlmGateway, so tracking and cost logging are attached relatively well. On the other hand, the internal system chat API /system/ai/chat uses a more direct path. That path loads the assistant, applies the rate limiter, and calls chat directly, so it has a slightly different texture from the gateway-based tracking path.
I actually think that is fine. In real platforms, things rarely start fully organized around one perfect pattern. More often, they converge toward a shared structure over time as usage expands.
What matters is that the central axis is already in place.
- management objects are separated
- execution objects are being unified as
RunnableAiAssistant - conversation and execution history remain as
Thread / Message / Run / RunStep - upper-level orchestration is handled by
AiAgent
I Like This Structure Quite a Bit
When I look at this structure again, Omnilude's ai-service is not just a service that calls models well.
It is trying to do four things at the same time.
- handle multiple providers and models within one platform
- turn assistants into reusable execution presets
- preserve conversations and execution as a thread/message/run structure
- later expand into agent workflows by composing assistants together
I think this direction is very practical. Once you start putting AI into a product, what you eventually need is not a few helper functions that call models. What you need is an executable interface.
And that interface still resembles the conceptual structure that OpenAI Assistants once showed. Omnilude simply went one step further by separating provider and model more clearly, and by pushing tracking and workflow more deeply into the design.
Closing
In Omnilude's ai-service, AiAssistant is not just a prompt store. It is closer to an execution preset that bundles which provider and model to call, how to call them, what format to receive back, and on what runtime that result should be recorded.
And AiThread, AiMessage, AiRun, and AiRunStep are the runtime interface that lets that preset operate inside a real product flow. Once AiAgent is added on top, Omnilude's ai-service grows beyond merely managing a few assistants. It becomes a platform that can operate workflows.
Next time, I will go one step further and come back with the story of how assistants are combined to build agents.