16 KiB
RAG
Retrieval Augmented Generation (RAG) is a method of minimizing LLM hallucinations and extending the model's context without consuming a significant portion of the context length. It uses documents and other additional resources that you provide to give the model more context for all of your prompts.
Loki has a built-in vector database and full-text search engine to support RAG knowledge bases for your queries.
The generated knowledge bases are stored in the rag subdirectory of your Loki configuration directory. The location of
this directory varies by system, so you can use the following command to find your RAG directory:
loki --info | grep 'rags_dir' | awk '{print $2}'
Quick Links
Usage
There's two ways to use RAG in Loki: A persistent RAG that can be loaded on-demand for queries, and an ephemeral one for adding RAG to a single specific query.
Persistent RAG
In the REPL, persistent RAG is initialized via the .rag command:
The generated RAG is then saved to the rag subdirectory of the Loki configuration, and can then be loaded whenever you
want that knowledge base via either .rag <name> or loki --rag <RAG>.
Ephemeral RAG
Short-lived RAG that is only used for a single session or query is loaded using .file/--file.
You can use it to either execute a prompt from a file, or for temporary RAG. The difference is the usage of the --
separator. If you only specify a filename and no -- separator, Loki will know to read the file contents and pass them
as a query to the model. Otherwise, the -- separator is read to indicate that this is the end of the list of documents
to load into the ephemeral RAG, and what follows is the query to pass to the model.
.file prompt.md # Read the file as a prompt
.file %% -- translate the last reply to italian
.file `git diff` -- generate a commit message
Once the session ends, this RAG will no longer be accessible and is only visible to the current session.
The %% Document Type
In addition to the usual documents that can be specified for persistent RAG, ephemeral RAG has a special %% value.
This value references the content of the last reply. So you can use it like this:
.file %% -- translate the last reply to italian
The -- indicates that this is the end of your documents and the beginning of your prompt.
The cmd Document Type
Loki also lets you use command outputs for ephemeral RAG input. Simply enclose the command in backticks:
.file `git diff` -- generate a commit message
The -- indicates that this is the end of your documents and the beginning of your prompt.
How It Works
1. Build
When you define RAG, Loki will first "build" the RAG. This means that Loki will consume the documents you specified and generate embeddings for that text. This essentially just means that Loki translates the document into a language the LLM can understand.
These embeddings are then stored in an in-memory vector database.
2. Lookup
Loki sits between you and the model. So when you submit a prompt to the model, before Loki ever sends it, it will first convert your prompt into embeddings (LLM language), and look for relevant snippets of text in the vector database.
Loki then passes the top n-snippets of text that it finds in the vector database as additional context to the model
before your prompt.
2a. Reranking (Optional)
The lookup for relevant snippets of texts uses embeddings to find text that is semantically similar to your prompt, and
returns the top n-results. This often works fairly well, however these top results aren't always the most relevant for
answering the specific query.
Reranking improves these initial results (say, the top 20-100 text snippets) and re-scores them using a more sophisticated model. The reranker model will rank documents by their actual usefulness for answering the query to ensure the most relevant context is passed to the model alongside your query.
This reranking model can be customized for each RAG you build in Loki. See the Custom Reranker section below for more details on how to customize this.
3. Prompt
Finally, the text snippets that were looked up in RAG are passed to the model as additional context to your prompt, giving the model query-specific context to answer your question.
Supported Document Sources
Loki supports a number of document sources that can be used for RAG:
| Source | Example | Comments |
|---|---|---|
| Files | /tmp/dir1/file1;/tmp/dir1/file2 |
|
| Directory | /tmp/dir |
Picks up all files in a directory and all its subdirectories |
| Directory (extensions} | /tmp/dir2/**/*.{md,txt} |
Finds all files in all subdirectories with the specified extensions |
| Recursive Filename | /tmp/*/LOKI.md |
The following files will be picked up:
|
| URL | https://www.ohdsi.org/data-standardization/ |
Downloads and loads the specified webpage into the knowledge base |
| Recursive URL (Websites) | https://github.com/OHDSI/Vocabulary-v5.0/wiki/** |
Crawls all pages under the given URL and loads them into the knowledge base |
| Document Loader (custom) | jina:https://cloud.google.com/bigquery/docs/reference/standard-sql/ |
Use a custom document loader to parse the given document |
Document Loaders
Loki only has built-in support for loading text files. But that functionality can be extended to read all kinds of files
into your knowledge bases. These custom loaders are used by both RAG and for documents specified using the
.file/--file flags.
In the global configuration file, you can specify loaders for specific document types using the document_loaders
setting. Each loader is defined by specifying a name and then a command that Loki will execute to load the document.
The following variables are interpolated at runtime by Loki and can be used as placeholders in your command definitions:
$1(Required) - The input file$2(Optional) - The output file. If omitted,stdoutis used as the output destination
Note: It is your responsibility to ensure that any tools used to parse documents into text that Loki can read are
installed on your system and are available on your $PATH. Loki does not have any built-in way of installing
dependencies for document loaders for you.
The following are some example loaders:
document_loaders:
pdf: 'pdftotext $1 -' # Use pdftotext to convert a PDF file to text
# (see https://poppler.freedesktop.org for details on how to install pdftotext)
docx: 'pandoc --to plain $1' # Use pandoc to convert a .docx file to text
# (see https://pandoc.org for details on how to install pandoc)
jina: 'curl -fsSL https://r.jina.ai/$1 -H "Authorization: Bearer {{JINA_API_KEY}}' # Use Jina to translate a website into text;
# Requires a Jina API key to be added to the Loki vault
git: > # Use yek to load a git repository into the knowledgebase (https://github.com/bodo-run/yek)
sh -c "yek $1 --json | jq 'map({ path: .filename, contents: .content })'"
Document Loader Usage
Once you have your loaders defined, you can specify when Loki should use them by prefixing any RAG file/directory/URI with the name of the loader.
Example: Load a git repo into RAG

Example: Use pdf loader for ephemeral RAG
$ loki --file pdf:some-file.pdf
Advanced Customizations
For those familiar with RAG, Loki exposes a handful of advanced global settings that can be used to tweak your default RAG configurations.
Embedding Model
When Loki queries your RAG knowledge bases, it needs to first convert your query into embeddings. By default, Loki uses the same embedding model that was used to create the knowledge base in the first place.
This can be customized to any other embedding model available in your configured clients by setting the
rag_embedding_model setting in your global Loki configuration file:
rag_embedding_model: null # Specifies the embedding model used for context retrieval
Reranker
By default, Loki uses Reciprocal Rank Fusion (RRF) to merge vector and keyword search results.
You can change the default reranker model to any other reranking model in your configured clients. To change the default
reranker model, simply change the value of the rag_reranker_model setting in your global configuration file:
rag_reranker_model: null # By default,
Chunk Size
In the context of RAG, the chunk size is the maximum length of each text chunk (measured in characters) that is created
when splitting documents. In Loki, this defaults to 2000 characters.
You can specify a different global default by setting the rag_chunk_size property in your global configuration file:
rag_chunk_size: null # Defines the size of chunks for document processing in characters
Chunk Size Trade-Offs
Keep in mind the following trade-offs when changing the chunk size:
- Smaller chunks (e.g. 256 characters): More precise retrieval, better semantic focus, but may lack context or split important information
- Larger chunks (e.g. 1024 characters): More context preserved, fewer chunks to manage, but less precise matching and more noise in retrieved document
Chunk Overlap
Chunk overlap in RAG is the number of characters that overlap between consecutive chunks to maintain continuity.
Example: If the following sentence is cut off at the end of one chunk
I was doing fine until someone brought up
You'll ideally want that full sentence to be picked up at the beginning of the next chunk to make sure the full meaning is captured. So in this example, if your chunk overlap is 42 characters, then the start of the next chunk would look like this:
I was doing fine until someone brought up the game. <next sentence>
Often, this value is 10%-20% of the chunk size.
By default, in Loki, this value is 5% the chunk size. You can override this and specify the default chunk overlap (in
characters) that Loki should use as a global default by setting the rag_chunk_overlap property in the global Loki
configuration file:
rag_chunk_overlap: null # Defines the overlap between chunks
Top K
In RAG, top_k represents the top k-chunks to return from the vector database query. Think of it like if you search
something on Google and only care about the top 10 results, that's what you'll use for your context.
In Loki, the default value for this is 5. You can customize this global default by setting the rag_top_k property in
your global configuration file:
rag_top_k: 5 # Specifies the number of documents to retrieve for answering queries
Top K Trade-Offs
When customizing this value, keep in mind the following trade-offs so you get the best performance:
- Lower top_k (e.g. 3): Faster, more focused context, lower cost, but risks missing relevant information
- Higher top_k (e.g. 10): More comprehensive coverage, but more noise, higher latency, increased token costs, and potential context window constraints
RAG Template
When you use RAG in Loki, after Loki performs the lookup for relevant chunks of text to add as context to your query, it
will add the retrieved text chunks as context to your query before sending it to the model. The format of this context
is determined by the rag_template setting in your global Loki configuration file.
This template utilizes three placeholders:
__INPUT__: The user's actual query__CONTEXT__: The context retrieved from RAG__SOURCES__: A numbered list of the source file paths or URLs that the retrieved context came from
These placeholders are replaced with the corresponding values into the template and make up what's actually passed to
the model at query-time. The __SOURCES__ placeholder enables the model to cite which documents its answer is based on,
which is especially useful when building knowledge-base assistants that need to provide verifiable references.
The default template that Loki uses is the following:
Answer the query based on the context while respecting the rules. (user query, some textual context and rules, all inside xml tags)
<context>
__CONTEXT__
</context>
<sources>
__SOURCES__
</sources>
<rules>
- If you don't know, just say so.
- If you are not sure, ask for clarification.
- Answer in the same language as the user query.
- If the context appears unreadable or of poor quality, tell the user then answer as best as you can.
- If the answer is not in the context but you think you know the answer, explain that to the user then answer with your own knowledge.
- Answer directly and without using xml tags.
- When using information from the context, cite the relevant source from the <sources> section.
</rules>
<user_query>
__INPUT__
</user_query>
You can customize this template by specifying the rag_template setting in your global Loki configuration file. Your
template must include both the __INPUT__ and __CONTEXT__ placeholders in order for it to be valid. The
__SOURCES__ placeholder is optional. If it is omitted, source references will not be included in the prompt.

