Prompt Preprocessor

Overview

The prompt preprocessor handles user queries with attached files for a RAG (Retrieval-Augmented Generation) system. It automatically selects the optimal context injection strategy based on file sizes and the available context window of the language model.

Main Function: `preprocess`

Parameters:

ctl: PromptPreprocessorController — preprocessor controller
userMessage: ChatMessage — user message

Workflow:

Extracts the text from the user message
Loads chat history and appends the new message
Filters files from the message (excludes images)
Determines the context processing strategy:
- If new files are present → selects strategy via chooseContextInjectionStrategy()
- If only existing files are present → uses retrieval
Returns the processed prompt

Context Injection Strategies

1. `inject-full-content` — Full Content Injection

When used:

All files combined with the prompt fit within the model's context window

Process:

1. Parse each file via ctl.client.files.parseDocument()
2. Extract full content
3. Format with headers: ** filename full content **
4. Inject into prompt with instructions

Output format:

This is a Enriched Context Generation scenario.

The following content was found in the files provided by the user.

** document.pdf full content **

[full file content]

** end of document.pdf **

Based on the content above, please provide a response to the user query.

User query: [user query]

2. `retrieval` — Semantic Search

When used:

Files are too large to fit entirely in the context
Only relevant fragments need to be retrieved

Process:

1. Load embedding model (nomic-embed-text-v1.5-GGUF)
2. Perform semantic search via ctl.client.files.retrieve()
3. Filter results by retrievalAffinityThreshold
4. Add found citations to the prompt
5. Attach citations via ctl.addCitations()

Output format (with results):

The following citations were found in the files provided by the user:

Citation 1: "[citation text]"

Citation 2: "[citation text]"

Use the citations above to respond to the user query, only if they are relevant. Otherwise, respond to the best of your ability without them.

User Query:

[user query]

Output format (no results):

Important: No citations were found in the user files for the user query. In less than one sentence, inform the user of this. Then respond to the query to the best of your ability.

User Query:

[user query]

3. `none` — No Context

When used:

No files are attached
No relevant citations found (affinity threshold not met)

Strategy Selection Algorithm

The chooseContextInjectionStrategy() function makes decisions based on token calculations:

Algorithm Steps

Step	Description
1	Load LLM model via `ctl.client.llm.model()`
2	Measure current context usage via `measureContextWindow()`
3	Parse files and count total tokens
4	Calculate available tokens with 70% target utilization
5	Compare: `totalFilePlusPromptTokenCount > availableContextTokens`

Calculation Formula

const contextOccupiedFraction = contextOccupiedPercent / 100;
const targetContextUsePercent = 0.7;
const targetContextUsage = targetContextUsePercent * (1 - contextOccupiedFraction);
const availableContextTokens = Math.floor(modelRemainingContextLength * targetContextUsage);

Selection Criteria

If totalFileTokenCount + userPromptTokenCount > availableContextTokens
    → retrieval
Else
    → inject-full-content

Helper Functions

`measureContextWindow()`

Measures context window utilization:

Returns:

{
  totalTokensInContext: number,      // total tokens in context
  modelContextLength: number,        // model context size
  modelRemainingContextLength: number, // remaining tokens available
  contextOccupiedPercent: number     // percentage filled
}

`getEffectiveContextFormatted()`

Applies the model's prompt template:

Calls model.applyPromptTemplate(ctx)
On error (no user messages), adds placeholder "?" and retries

`prepareRetrievalResultsContextInjection()`

Handles the retrieval strategy:

Creates UI statuses (loading, parsing, embedding)
Processes files with progress tracking
Filters results by retrievalAffinityThreshold
Formats citations into the prompt

`prepareDocumentContextInjection()`

Handles the full-content injection strategy:

Parses files from cache
Formats content with headers
Replaces message text via input.replaceText()

Configuration

Parameters from configSchematics:

Parameter	Type	Description
`retrievalLimit`	number	Maximum number of citations to retrieve
`retrievalAffinityThreshold`	number	Relevance threshold for filtering citations (0.0–1.0)

User Status Messages

The preprocessor displays progress via PredictionProcessStatusController:

Status	Message
Deciding	`Deciding how to handle the document(s)...`
Loading parser	`Loading parser for {filename}...`
Parser loaded	`{library} loaded for {filename}...`
Processing	`Parsing file {filename}... ({progress}%)`
Retrieval	`Retrieving relevant citations for user query...`
Done	`Retrieved {N} relevant citations for user query`

Debug Output

The preprocessor outputs debug information via ctl.debug():

Retrieval results
Processed content
Performance metrics (read time, tokenization time)
Strategy selection details

Dependencies

import {
  text,
  type Chat,
  type ChatMessage,
  type FileHandle,
  type LLMDynamicHandle,
  type PredictionProcessStatusController,
  type PromptPreprocessorController,
} from "@lmstudio/sdk";

Embedding Model: nomic-ai/nomic-embed-text-v1.5-GGUF

Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                    User Message + Files                      │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                     preprocess()                             │
│  - Load history                                              │
│  - Filter files (no images)                                  │
│  - Choose strategy                                           │
└─────────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┴───────────────┐
              │                               │
              ▼                               ▼
    ┌─────────────────────┐       ┌─────────────────────┐
    │  inject-full-content │       │      retrieval      │
    │                      │       │                     │
    │  - Parse all files   │       │  - Load embeddings  │
    │  - Format content    │       │  - Semantic search  │
    │  - Build prompt      │       │  - Filter by score  │
    └─────────────────────┘       │  - Build prompt      │
                                  └─────────────────────┘
                                              │
                              ┌───────────────┴───────────────┐
                              │                               │
                              ▼                               ▼
                    ┌─────────────────┐           ┌─────────────────┐
                    │  Results found  │           │  No results     │
                    │  - Add citations│           │  - Inform user  │
                    │  - Continue     │           │  - Continue     │
                    └─────────────────┘           └─────────────────┘

top

Prompt Preprocessor

Overview

Main Function: preprocess

Context Injection Strategies

1. inject-full-content — Full Content Injection

2. retrieval — Semantic Search

3. none — No Context

Strategy Selection Algorithm

Algorithm Steps

Calculation Formula

Selection Criteria

Helper Functions

measureContextWindow()

getEffectiveContextFormatted()

prepareRetrievalResultsContextInjection()

prepareDocumentContextInjection()

Configuration

User Status Messages

Debug Output

Dependencies

Architecture Diagram

Main Function: `preprocess`

1. `inject-full-content` — Full Content Injection

2. `retrieval` — Semantic Search

3. `none` — No Context

`measureContextWindow()`

`getEffectiveContextFormatted()`

`prepareRetrievalResultsContextInjection()`

`prepareDocumentContextInjection()`