RAG Chatbot Architecture

Dortha Franecki, Computer Science Student

Walk through the full request lifecycle of a production-ready RAG (Retrieval-Augmented Generation) chatbot — from input sanitization through vector retrieval, LLM inference, and response delivery. Designed for developers, system architects, and technical interviewers who need to communicate how a modern AI system handles context, memory, and safety in a single sequence.

How to create a RAG Chatbot Architecture

To create a RAG chatbot architecture, follow these steps:

01.

Map the layers first

Identify your core components: UI, safety/guardrails layer, backend API, session cache, vector database, and LLM. Each becomes a participant in the sequence.

02.

Start with the safety gate

Model input validation as the first step — before the backend ever sees a prompt. Use alt blocks to show the rejected vs. safe paths.

03.

Add session memory

Show the backend querying a cache (e.g., Redis) to retrieve recent conversation history before calling the LLM. This is what makes the chatbot feel coherent.

04.

Model the RAG step

Insert a vector DB query between the memory lookup and the LLM call — the backend embeds the sanitized prompt and retrieves relevant context.

05.

Build the LLM call

Pass the combination of history, retrieved context, and current prompt to the model. Show the response flowing back through the chain.

06.

Use autonumber

Add autonumber at the top of the sequence — it labels every step automatically and makes the diagram easy to reference in documentation.

07.

Use critical blocks for multi-step processing

Wrap the backend processing steps in a critical block to visually group the core request logic.

Share with others

C4 Context Diagram

Show the big picture of how your system fits into its environment using the C4 model approach. This template maps users, your system, and external dependencies with clear boundaries — perfect for explaining system scope to stakeholders, planning integrations, documenting architecture decisions, or onboarding new team members to complex platforms.

Mermaid

Login Sequence Diagram

Map every step of user authentication. This template shows the back-and-forth between a user, your login interface, validation logic, and database — making it clear where credentials are checked, how responses flow back, and what happens after successful authentication. It's a straightforward way to document login flows, debug authentication issues, or explain security processes to your team without getting lost in technical specs.

Mermaid

System Timeline Diagram

Track events and processes over time with a visual timeline. This diagram helps teams see sequences, responsibilities, and parallel activities clearly for planning, reporting, or retrospectives.

Mermaid

System State Diagram

Map how systems, objects, or processes transition between different states based on events or conditions. This template shows all possible states and the triggers that cause transitions — helping teams design robust behavior, catch edge cases, and document how things should work. Essential for software design, workflow automation, or explaining any system that changes over time.

RAG Chatbot Architecture

How to create a RAG Chatbot Architecture

Map the layers first

Identify your core components: UI, safety/guardrails layer, backend API, session cache, vector database, and LLM. Each becomes a participant in the sequence.

Start with the safety gate

Model input validation as the first step — before the backend ever sees a prompt. Use alt blocks to show the rejected vs. safe paths.

Add session memory

Show the backend querying a cache (e.g., Redis) to retrieve recent conversation history before calling the LLM. This is what makes the chatbot feel coherent.

Model the RAG step

Insert a vector DB query between the memory lookup and the LLM call — the backend embeds the sanitized prompt and retrieves relevant context.

Build the LLM call

Pass the combination of history, retrieved context, and current prompt to the model. Show the response flowing back through the chain.

Use autonumber

Add autonumber at the top of the sequence — it labels every step automatically and makes the diagram easy to reference in documentation.

Use critical blocks for multi-step processing

Wrap the backend processing steps in a critical block to visually group the core request logic.

Share with others

Tags

You might also like

C4 Context Diagram

Mermaid

Login Sequence Diagram

Mermaid

System Timeline Diagram

Track events and processes over time with a visual timeline. This diagram helps teams see sequences, responsibilities, and parallel activities clearly for planning, reporting, or retrospectives.

Mermaid

System State Diagram

Mermaid