docs: documented RAG
This commit is contained in:
+299
@@ -0,0 +1,299 @@
|
||||
# RAG
|
||||
Retrieval Augmented Generation (RAG) is a method of minimizing LLM hallucinations and extending the model's context
|
||||
without consuming a significant portion of the context length. It uses documents and other additional resources that you
|
||||
provide to give the model more context for all of your prompts.
|
||||
|
||||
Loki has a built-in vector database and full-text search engine to support RAG knowledge bases for your queries.
|
||||
|
||||
The generated knowledge bases are stored in the `rag` subdirectory of your Loki configuration directory. The location of
|
||||
this directory varies by system, so you can use the following command to find your RAG directory:
|
||||
|
||||
```shell
|
||||
loki --info | grep 'rags_dir' | awk '{print $2}'
|
||||
```
|
||||
|
||||
## Quick Links
|
||||
<!--toc:start-->
|
||||
- [Usage](#usage)
|
||||
- [Persistent RAG](#persistent-rag)
|
||||
- [Ephemeral RAG](#ephemeral-rag)
|
||||
- [How It Works](#how-it-works)
|
||||
- [1. Build](#1-build)
|
||||
- [2. Lookup](#2-lookup)
|
||||
- [2a. Reranking (Optional)](#2a-reranking-optional)
|
||||
- [3. Prompt](#3-prompt)
|
||||
- [Supported Document Sources](#supported-document-sources)
|
||||
- [Document Loaders](#document-loaders)
|
||||
- [Document Loader Usage](#document-loader-usage)
|
||||
- [Advanced Customizations](#advanced-customizations)
|
||||
- [Embedding Model](#embedding-model)
|
||||
- [Reranker](#reranker)
|
||||
- [Chunk Size](#chunk-size)
|
||||
- [Trade-Offs](#chunk-size-trade-offs)
|
||||
- [Chunk Overlap](#chunk-overlap)
|
||||
- [Top K](#top-k)
|
||||
- [Trade-Offs](#top-k-trade-offs)
|
||||
- [RAG Template](#rag-template)
|
||||
<!--toc:end-->
|
||||
|
||||
---
|
||||
|
||||
## Usage
|
||||
There's two ways to use RAG in Loki: A persistent RAG that can be loaded on-demand for queries, and an ephemeral one for
|
||||
adding RAG to a single specific query.
|
||||
|
||||
### Persistent RAG
|
||||
In the REPL, persistent RAG is initialized via the `.rag` command:
|
||||
|
||||

|
||||
|
||||
The generated RAG is then saved to the `rag` subdirectory of the Loki configuration, and can then be loaded whenever you
|
||||
want that knowledge base via either `.rag <name>` or `loki --rag <RAG>`.
|
||||
|
||||
### Ephemeral RAG
|
||||
Short-lived RAG that is only used for a single session or query is loaded using `.file`/`--file`.
|
||||
|
||||
You can use it to either execute a prompt from a file, or for temporary RAG. The difference is the usage of the `--`
|
||||
separator. If you only specify a filename and no `--` separator, Loki will know to read the file contents and pass them
|
||||
as a query to the model. Otherwise, the `--` separator is read to indicate that this is the end of the list of documents
|
||||
to load into the ephemeral RAG, and what follows is the query to pass to the model.
|
||||
|
||||
```shell
|
||||
.file prompt.md # Read the file as a prompt
|
||||
.file %% -- translate the last reply to italian
|
||||
.file `git diff` -- generate a commit message
|
||||
```
|
||||
|
||||

|
||||
|
||||
Once the session ends, this RAG will no longer be accessible and is only visible to the current session.
|
||||
|
||||
#### The `%%` Document Type
|
||||
In addition to the usual documents that can be specified for persistent RAG, ephemeral RAG has a special `%%` value.
|
||||
This value references the content of the last reply. So you can use it like this:
|
||||
|
||||
```shell
|
||||
.file %% -- translate the last reply to italian
|
||||
```
|
||||
|
||||
The `--` indicates that this is the end of your documents and the beginning of your prompt.
|
||||
|
||||
#### The `cmd` Document Type
|
||||
Loki also lets you use command outputs for ephemeral RAG input. Simply enclose the command in backticks:
|
||||
|
||||
```shell
|
||||
.file `git diff` -- generate a commit message
|
||||
```
|
||||
|
||||
The `--` indicates that this is the end of your documents and the beginning of your prompt.
|
||||
|
||||
## How It Works
|
||||
#### 1. Build
|
||||
When you define RAG, Loki will first "build" the RAG. This means that Loki will consume the documents you specified and
|
||||
generate [embeddings](https://huggingface.co/spaces/hesamation/primer-llm-embedding) for that text. This essentially just means that Loki translates the document into a language
|
||||
the LLM can understand.
|
||||
|
||||
These embeddings are then stored in an in-memory vector database.
|
||||
|
||||
#### 2. Lookup
|
||||
Loki sits between you and the model. So when you submit a prompt to the model, before Loki ever sends it, it will first
|
||||
convert your prompt into embeddings (LLM language), and look for relevant snippets of text in the vector database.
|
||||
|
||||
Loki then passes the top `n`-snippets of text that it finds in the vector database as additional context to the model
|
||||
before your prompt.
|
||||
|
||||
#### 2a. Reranking (Optional)
|
||||
The lookup for relevant snippets of texts uses embeddings to find text that is semantically similar to your prompt, and
|
||||
returns the top `n`-results. This often works fairly well, however these top results aren't always the most relevant for
|
||||
answering the specific query.
|
||||
|
||||
Reranking improves these initial results (say, the top 20-100 text snippets) and re-scores them using a more
|
||||
sophisticated model. The reranker model will rank documents by their actual usefulness for answering the query to ensure
|
||||
the most relevant context is passed to the model alongside your query.
|
||||
|
||||
This reranking model can be customized for each RAG you build in Loki. See the [Custom Reranker](#reranker) section
|
||||
below for more details on how to customize this.
|
||||
|
||||
#### 3. Prompt
|
||||
Finally, the text snippets that were looked up in RAG are passed to the model as additional context to your prompt,
|
||||
giving the model query-specific context to answer your question.
|
||||
|
||||
## Supported Document Sources
|
||||
Loki supports a number of document sources that can be used for RAG:
|
||||
|
||||
| Source | Example | Comments |
|
||||
|--------------------------|-----------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| Files | `/tmp/dir1/file1;/tmp/dir1/file2` | |
|
||||
| Directory | `/tmp/dir` | Picks up all files in a directory and all its subdirectories |
|
||||
| Directory (extensions} | `/tmp/dir2/**/*.{md,txt}` | Finds all files in all subdirectories with the specified extensions |
|
||||
| Recursive Filename | `/tmp/*/LOKI.md` | The following files will be picked up: <br><ul><li>`/tmp/dir1/LOKI.md`</li><li>`/tmp/dir2/subdir1/LOKI.md`</li><li>`/tmp/dir2/subdir2/LOKI.md`</li></ul> |
|
||||
| URL | `https://www.ohdsi.org/data-standardization/` | Downloads and loads the specified webpage into the <br>knowledge base |
|
||||
| Recursive URL (Websites) | `https://github.com/OHDSI/Vocabulary-v5.0/wiki/**` | Crawls all pages under the given URL and loads them <br>into the knowledge base |
|
||||
| Document Loader (custom) | `jina:https://cloud.google.com/bigquery/docs/reference/standard-sql/` | Use a custom document loader to parse the given document |
|
||||
|
||||
## Document Loaders
|
||||
Loki only has built-in support for loading text files. But that functionality can be extended to read all kinds of files
|
||||
into your knowledge bases. These custom loaders are used by both RAG and for documents specified using the
|
||||
`.file`/`--file` flags.
|
||||
|
||||
In the global configuration file, you can specify loaders for specific document types using the `document_loaders`
|
||||
setting. Each loader is defined by specifying a name and then a command that Loki will execute to load the document.
|
||||
|
||||
The following variables are interpolated at runtime by Loki and can be used as placeholders in your command definitions:
|
||||
* `$1` (Required) - The input file
|
||||
* `$2` (Optional) - The output file. If omitted, `stdout` is used as the output destination
|
||||
|
||||
**Note:** It is your responsibility to ensure that any tools used to parse documents into text that Loki can read are
|
||||
installed on your system and are available on your `$PATH`. Loki does not have any built-in way of installing
|
||||
dependencies for document loaders for you.
|
||||
|
||||
The following are some example loaders:
|
||||
```yaml
|
||||
document_loaders:
|
||||
pdf: 'pdftotext $1 -' # Use pdftotext to convert a PDF file to text
|
||||
# (see https://poppler.freedesktop.org for details on how to install pdftotext)
|
||||
docx: 'pandoc --to plain $1' # Use pandoc to convert a .docx file to text
|
||||
# (see https://pandoc.org for details on how to install pandoc)
|
||||
jina: 'curl -fsSL https://r.jina.ai/$1 -H "Authorization: Bearer {{JINA_API_KEY}}' # Use Jina to translate a website into text;
|
||||
# Requires a Jina API key to be added to the Loki vault
|
||||
git: > # Use yek to load a git repository into the knowledgebase (https://github.com/bodo-run/yek)
|
||||
sh -c "yek $1 --json | jq 'map({ path: .filename, contents: .content })'"
|
||||
```
|
||||
|
||||
### Document Loader Usage
|
||||
Once you have your loaders defined, you can specify when Loki should use them by prefixing any RAG file/directory/URI
|
||||
with the name of the loader.
|
||||
|
||||
**Example: Load a git repo into RAG**
|
||||

|
||||
|
||||
**Example: Use pdf loader for ephemeral RAG**
|
||||
```shell
|
||||
$ loki --file pdf:some-file.pdf
|
||||
```
|
||||
|
||||
## Advanced Customizations
|
||||
For those familiar with RAG, Loki exposes a handful of advanced global settings that can be used to tweak your default
|
||||
RAG configurations.
|
||||
|
||||
### Embedding Model
|
||||
When Loki queries your RAG knowledge bases, it needs to first convert your query into embeddings. By default, Loki uses
|
||||
the same embedding model that was used to create the knowledge base in the first place.
|
||||
|
||||
This can be customized to any other embedding model available in your configured clients by setting the
|
||||
`rag_embedding_model` setting in your global Loki configuration file:
|
||||
|
||||
```yaml
|
||||
rag_embedding_model: null # Specifies the embedding model used for context retrieval
|
||||
```
|
||||
|
||||
### Reranker
|
||||
By default, Loki uses [Reciprocal Rank Fusion (RRF)](https://www.elastic.co/docs/reference/elasticsearch/rest-apis/reciprocal-rank-fusion) to merge vector and keyword search results.
|
||||
|
||||
You can change the default reranker model to any other reranking model in your configured clients. To change the default
|
||||
reranker model, simply change the value of the `rag_reranker_model` setting in your global configuration file:
|
||||
|
||||
```yaml
|
||||
rag_reranker_model: null # By default,
|
||||
```
|
||||
|
||||
### Chunk Size
|
||||
In the context of RAG, the chunk size is the maximum length of each text chunk (measured in characters) that is created
|
||||
when splitting documents. In Loki, this defaults to `2000` characters.
|
||||
|
||||
You can specify a different global default by setting the `rag_chunk_size` property in your global configuration file:
|
||||
|
||||
```yaml
|
||||
rag_chunk_size: null # Defines the size of chunks for document processing in characters
|
||||
```
|
||||
|
||||
#### Chunk Size Trade-Offs
|
||||
Keep in mind the following trade-offs when changing the chunk size:
|
||||
|
||||
* **Smaller chunks (e.g. 256 characters):** More precise retrieval, better semantic focus, but may lack context or split
|
||||
important information
|
||||
* **Larger chunks (e.g. 1024 characters):** More context preserved, fewer chunks to manage, but less precise matching
|
||||
and more noise in retrieved document
|
||||
|
||||
### Chunk Overlap
|
||||
Chunk overlap in RAG is the number of characters that overlap between consecutive chunks to maintain continuity.
|
||||
|
||||
---
|
||||
|
||||
**Example:** If the following sentence is cut off at the end of one chunk
|
||||
|
||||
`I was doing fine until someone brought up`
|
||||
|
||||
You'll ideally want that full sentence to be picked up at the beginning of the next chunk to make sure the full meaning
|
||||
is captured. So in this example, if your chunk overlap is 42 characters, then the start of the next chunk would look
|
||||
like this:
|
||||
|
||||
`I was doing fine until someone brought up the game. <next sentence>`
|
||||
|
||||
---
|
||||
|
||||
Often, this value is 10%-20% of the chunk size.
|
||||
|
||||
By default, in Loki, this value is 5% the chunk size. You can override this and specify the default chunk overlap (in
|
||||
characters) that Loki should use as a global default by setting the `rag_chunk_overlap` property in the global Loki
|
||||
configuration file:
|
||||
|
||||
```yaml
|
||||
rag_chunk_overlap: null # Defines the overlap between chunks
|
||||
```
|
||||
|
||||
### Top K
|
||||
In RAG, `top_k` represents the top `k`-chunks to return from the vector database query. Think of it like if you search
|
||||
something on Google and only care about the top 10 results, that's what you'll use for your context.
|
||||
|
||||
In Loki, the default value for this is `5`. You can customize this global default by setting the `rag_top_k` property in
|
||||
your global configuration file:
|
||||
|
||||
```yaml
|
||||
rag_top_k: 5 # Specifies the number of documents to retrieve for answering queries
|
||||
```
|
||||
|
||||
#### Top K Trade-Offs
|
||||
When customizing this value, keep in mind the following trade-offs so you get the best performance:
|
||||
|
||||
* **Lower top_k (e.g. 3):** Faster, more focused context, lower cost, but risks missing relevant information
|
||||
* **Higher top_k (e.g. 10):** More comprehensive coverage, but more noise, higher latency, increased token costs, and
|
||||
potential context window constraints
|
||||
|
||||
### RAG Template
|
||||
When you use RAG in Loki, after Loki performs the lookup for relevant chunks of text to add as context to your query, it
|
||||
will add the retrieved text chunks as context to your query before sending it to the model. The format of this context
|
||||
is determined by the `rag_template` setting in your global Loki configuration file.
|
||||
|
||||
This template utilizes two placeholders:
|
||||
* `__INPUT__`: The user's actual query
|
||||
* `__CONTEXT__`: The context retrieved from RAG
|
||||
|
||||
These placeholders are replaced with the corresponding values into the template and make up what's actually passed to
|
||||
the model at query-time.
|
||||
|
||||
The default template that Loki uses is the following:
|
||||
|
||||
```text
|
||||
Answer the query based on the context while respecting the rules. (user query, some textual context and rules, all inside xml tags)
|
||||
|
||||
<context>
|
||||
__CONTEXT__
|
||||
</context>
|
||||
|
||||
<rules>
|
||||
- If you don't know, just say so.
|
||||
- If you are not sure, ask for clarification.
|
||||
- Answer in the same language as the user query.
|
||||
- If the context appears unreadable or of poor quality, tell the user then answer as best as you can.
|
||||
- If the answer is not in the context but you think you know the answer, explain that to the user then answer with your own knowledge.
|
||||
- Answer directly and without using xml tags.
|
||||
</rules>
|
||||
|
||||
<user_query>
|
||||
__INPUT__
|
||||
</user_query>
|
||||
```
|
||||
|
||||
You can customize this template by specifying the `rag_template` setting in your global Loki configuration file. Your
|
||||
template *must* include both the `__INPUT__` and `__CONTEXT__` placeholders in order for it to be valid.
|
||||
Reference in New Issue
Block a user