Technology

Introducing the Enhanced Chatbot Agent Workflow

AAnonymous
12 min read

Introduction

This article is the follow-up to Why Omnilude's ai-service chose the Assistant / Thread / Run structure and How Agents Work in Omnilude's ai-service.

The earlier posts explained the structure. This time I want to look at how that structure actually runs inside the backoffice canvas. The subject here is Enhanced Chatbot.

This article is not built on an abstract toy example. It uses the actual workflow configuration and explains, as it is, what input this agent receives, where it branches, and which assistants and tools it passes through before it produces an answer.

What is the enhanced chatbot in one sentence

This agent is a fairly explicit branching chatbot that splits questions into four paths.

  • If the question is general, it answers directly.
  • If fresh information is needed, it goes down the search path.
  • If the input is a YouTube URL, it goes down the transcript-summary path.
  • If the input is a regular web URL, it goes down the article-analysis path.

If you simplify the actual workflow, it looks like this.

In other words, the point of this workflow is not to call one model as well as possible with brute force. It first decides what kind of question it is dealing with, then chooses the right processing pipeline.

I would like to open this up so people can try it directly, but LLM usage has a real cost, so I am not distributing it publicly yet. When there is enough room, I want to expose it in a form that people can actually run themselves.

It starts from one text input

The start node of this agent is text-input, and its example value is What are the recent social issues?. That is only a demo phrase, but it also shows very clearly what kind of problem this workflow is trying to solve.

Instead of throwing the question straight into an LLM, it first sends it to a router node. The assistant used there is Question Router, and the configured routing sources are four.

  • llm_direct: direct answer
  • web_search: answer after web search
  • youtube_summary: YouTube summary
  • article_analyze: webpage analysis

This router is not a plain if statement. The assistant instructions explicitly describe which source should be selected in which situation, and the response is forced to come back as JSON. For example, questions that need fresh information, date-sensitive information, or fact checking are sent to web_search; YouTube URLs go to youtube_summary; regular URLs go to article_analyze; and everything else goes to llm_direct.

At that point the router leaves not only the selected source but also a reason. One text-visualize node attached to the canvas is there only to let you read that reason with your eyes. That means this screen is not just a design diagram. It is a working screen where you can directly inspect execution traces.

The direct-answer path is the shortest

When the branch is llm_direct, the structure is simple. The router output goes into a generate-text node that uses the Question Answer assistant, and the output of that node goes straight to finish.

The important point is that simplicity does not make the path any less structured. The generate-text node reloads the assistant by aiAssistantId at execution time, combines the system prompt and the user prompt, and only then makes the actual model call. In other words, even if assistant metadata is attached to the canvas, the real runtime reference is still the assistant id.

The assistants connected to Enhanced Chatbot are as follows.

Router: Question Router (assistantId=1)
Direct answer: Question Answer (assistantId=2)
Search query writer: Web Search Expert (assistantId=12)
Search-based answer: AdaptiveAnswerer (assistantId=3)
YouTube writer: YoutubeSummarizer (assistantId=15)
Article writer: ArticleAnalizer (assistantId=16)

Because of that setup, the very same generate-text node can play a completely different role depending on which assistant you plug into it. The node is the generic executor. The assistant carries the role.

The search path is not two stages but four

The real center of this agent is the web_search path. In the current implementation, search-based answers are not generated in one shot. The stages are deliberately split apart.

The first stage is Web Search Expert. This assistant rewrites the user question into a sentence that can be thrown directly into the search engine. Instead of searching with the raw question, it condenses it into one search-friendly query.

The second stage is web-search-tool. In the stored workflow config, this node is set with searchTool: "searx", and the actual node implementation calls SearXng, collects up to ten search-result snippets, and uses a 200-second timeout so that it does not wait forever.

The third stage is prompt-crafter. This node does not paste the search results directly into the final answer. It injects {{results}} into a saved template and produces a system prompt for answering. That template already contains the current time, reference materials, answer rules, and the target writing tone.

The fourth stage is AdaptiveAnswerer. The interesting part is that the original user question stays as prompt, while the context built from search goes into system. So the results are not just appended blindly. The question and the evidence are kept separate, and then a writer assistant composes the final answer again.

Drawn again, the flow looks like this.

I think this part matters quite a lot. Instead of making it look like RAG immediately just because search is involved, the design separates question refinement, evidence assembly, and final writing. That also makes it much easier to see which stage is shaking when you operate it.

The YouTube and article paths are tool-first

The URL branches are more direct.

If the input is a YouTube URL, youtube-summary-tool runs first. This node uses YoutubeTranscriptTool to fetch a transcript with time information attached. Then the YoutubeSummarizer assistant rewrites that transcript into readable content.

If the input is a normal web URL, article-analyzer-tool runs first. This node crawls the page with Crawl4Ai, and if possible it uses Readability4J to extract only the main body and convert it into markdown with title and text. After that, the ArticleAnalizer assistant rewrites it into a readable article.

So both URL branches share the same philosophy.

  • First the tool fetches the raw material.
  • Then the assistant reorganizes it into something a person can read.

This is similar to the search path as well. In Omnilude’s current agent design, tools and assistants are not treated as competing concepts. Tools fetch material. Assistants interpret and organize that material.

This screen is not just a pretty diagram

If you look at the backoffice canvas, text-visualize nodes are placed in the middle of the graph. They are not essential for producing the final answer. Their job is to let you see what value is flowing through the graph right now.

The frontend is built in the same direction. When a workflow execution receives a NODE_COMPLETED event, the frontend injects the output value of that node into the canvas state. A text-visualize node shows that value as it is, and if streaming deltas arrive, it accumulates the text.

That difference matters more than it seems. Many workflow tools stop at drawing nodes. But the current Omnilude backoffice agent screen combines two jobs: looking at the stored graph and observing the intermediate values of a real execution. So this canvas is both a diagram and a debugger.

This is how the execution engine sees it

Now step outside the canvas. This workflow is stored in the DB as ai.ai_agent.workflow JSONB. It is not kept in separate node tables. It is carried as one graph JSON.

When execution starts, BasicWorkflowEngine first looks for a node whose type ends with -input and whose startNode=true. In this case that start point is text-input. After that, NodeExecutorFactory picks the executor that matches each node type and keeps the execution moving.

For Enhanced Chatbot, the important executors can be summarized roughly like this.

  • TextInputNode: prepares the initial input
  • RouterNode: chooses the branch using an assistant
  • GenerateTextNode: generates text using an assistant
  • WebSearchToolNode: runs SearXng search
  • YoutubeSummaryToolNode: extracts a YouTube transcript
  • ArticleAnalyzerToolNode: cleans webpage body text
  • PromptCrafterNode: builds a template-based system prompt
  • FinishNode: gathers branch results and ends execution

The most important detail here is that both generate-text and router reload the assistant at execution time. The agent itself does not own intelligence directly. Each node pulls in an assistant only when it needs one. In the end, the agent is orchestration, and the actual inference is still done by assistants.

The finish node is less trivial than it looks

The last finish node is not just an end button. It checks the incoming edges, waits if there are still input handles in progress, and forwards only the outputs of the source nodes that are actually ready.

This structure is necessary because of branching. When the router splits into four paths, those four paths do not always complete in the same way. Some questions end with a direct answer, while others pass through search and summarization steps. finish acts as the last collector that absorbs those differences.

So this agent is not just a function that takes one question and returns one answer. It runs on top of a graph executor with branching, waiting, and merging.

The intention of this agent

I think even this simple agent shows Omnilude’s current direction quite well.

First, it does not exaggerate the idea of a general-purpose agent. It first classifies the question type, then attaches the right tool and writer in a practical way.

Second, it does not crush the search path into one opaque block. Question refinement, search, context assembly, and final answer writing are all separated.

Third, the backoffice canvas is not just an editor. It also shows intermediate execution values, so prompt experiments and debugging happen on the same screen.

Fourth, the roles of assistant and agent are clear. The assistant is the inference component. The agent is the workflow that connects those components.

For that reason, I think this implementation is closer to saying, "we made a visually operable question-processing pipeline," than to saying, "we made an AI agent."

Closing

If the earlier posts explained the structure of assistant, thread, run, and agent, this article is the case study that shows what that explanation looks like as an actual canvas and an actual node graph. Enhanced Chatbot works as a fairly practical chatbot by combining one router, several writer assistants, and search, YouTube, and article tools.

When I have more room, I would like to come back with an easier explanation of the agent composition and implementation, based on images and on the screen that actually runs.