docs / promptPreprocessor.md

Prompt Preprocessor

Overview

The prompt preprocessor handles user queries with attached files for a RAG (Retrieval-Augmented Generation) system. It automatically selects the optimal context injection strategy based on file sizes and the available context window of the language model.


Main Function: preprocess

Parameters:

  • ctl: PromptPreprocessorController — preprocessor controller
  • userMessage: ChatMessage — user message

Workflow:

  • Extracts the text from the user message
  • Loads chat history and appends the new message
  • Filters files from the message (excludes images)
  • Determines the context processing strategy:
    • If new files are present → selects strategy via chooseContextInjectionStrategy()
    • If only existing files are present → uses retrieval
  • Returns the processed prompt

Context Injection Strategies

1. inject-full-content — Full Content Injection

When used:

  • All files combined with the prompt fit within the model's context window

Process:

1. Parse each file via ctl.client.files.parseDocument()
2. Extract full content
3. Format with headers: ** filename full content **
4. Inject into prompt with instructions

Output format:

This is a Enriched Context Generation scenario.

The following content was found in the files provided by the user.

** document.pdf full content **

[full file content]

** end of document.pdf **

Based on the content above, please provide a response to the user query.

User query: [user query]

When used:

  • Files are too large to fit entirely in the context
  • Only relevant fragments need to be retrieved

Process:

1. Load embedding model (nomic-embed-text-v1.5-GGUF)
2. Perform semantic search via ctl.client.files.retrieve()
3. Filter results by retrievalAffinityThreshold
4. Add found citations to the prompt
5. Attach citations via ctl.addCitations()

Output format (with results):

The following citations were found in the files provided by the user:

Citation 1: "[citation text]"

Citation 2: "[citation text]"

Use the citations above to respond to the user query, only if they are relevant. Otherwise, respond to the best of your ability without them.

User Query:

[user query]

Output format (no results):

Important: No citations were found in the user files for the user query. In less than one sentence, inform the user of this. Then respond to the query to the best of your ability.

User Query:

[user query]

3. none — No Context

When used:

  • No files are attached
  • No relevant citations found (affinity threshold not met)

Strategy Selection Algorithm

The chooseContextInjectionStrategy() function makes decisions based on token calculations:

Algorithm Steps

StepDescription
1Load LLM model via ctl.client.llm.model()
2Measure current context usage via measureContextWindow()
3Parse files and count total tokens
4Calculate available tokens with 70% target utilization
5Compare: totalFilePlusPromptTokenCount > availableContextTokens

Calculation Formula

const contextOccupiedFraction = contextOccupiedPercent / 100;
const targetContextUsePercent = 0.7;
const targetContextUsage = targetContextUsePercent * (1 - contextOccupiedFraction);
const availableContextTokens = Math.floor(modelRemainingContextLength * targetContextUsage);

Selection Criteria

If totalFileTokenCount + userPromptTokenCount > availableContextTokens
    → retrieval
Else
    → inject-full-content

Helper Functions

measureContextWindow()

Measures context window utilization:

Returns:

{
  totalTokensInContext: number,      // total tokens in context
  modelContextLength: number,        // model context size
  modelRemainingContextLength: number, // remaining tokens available
  contextOccupiedPercent: number     // percentage filled
}

getEffectiveContextFormatted()

Applies the model's prompt template:

  • Calls model.applyPromptTemplate(ctx)
  • On error (no user messages), adds placeholder "?" and retries

prepareRetrievalResultsContextInjection()

Handles the retrieval strategy:

  • Creates UI statuses (loading, parsing, embedding)
  • Processes files with progress tracking
  • Filters results by retrievalAffinityThreshold
  • Formats citations into the prompt

prepareDocumentContextInjection()

Handles the full-content injection strategy:

  • Parses files from cache
  • Formats content with headers
  • Replaces message text via input.replaceText()

Configuration

Parameters from configSchematics:

ParameterTypeDescription
retrievalLimitnumberMaximum number of citations to retrieve
retrievalAffinityThresholdnumberRelevance threshold for filtering citations (0.0–1.0)

User Status Messages

The preprocessor displays progress via PredictionProcessStatusController:

StatusMessage
DecidingDeciding how to handle the document(s)...
Loading parserLoading parser for {filename}...
Parser loaded{library} loaded for {filename}...
ProcessingParsing file {filename}... ({progress}%)
RetrievalRetrieving relevant citations for user query...
DoneRetrieved {N} relevant citations for user query

Debug Output

The preprocessor outputs debug information via ctl.debug():

  • Retrieval results
  • Processed content
  • Performance metrics (read time, tokenization time)
  • Strategy selection details

Dependencies

import {
  text,
  type Chat,
  type ChatMessage,
  type FileHandle,
  type LLMDynamicHandle,
  type PredictionProcessStatusController,
  type PromptPreprocessorController,
} from "@lmstudio/sdk";

Embedding Model: nomic-ai/nomic-embed-text-v1.5-GGUF


Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                    User Message + Files                      │
└─────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│                     preprocess()                             │
│  - Load history                                              │
│  - Filter files (no images)                                  │
│  - Choose strategy                                           │
└─────────────────────────────────────────────────────────────┘

              ┌───────────────┴───────────────┐
              │                               │
              ▼                               ▼
    ┌─────────────────────┐       ┌─────────────────────┐
    │  inject-full-content │       │      retrieval      │
    │                      │       │                     │
    │  - Parse all files   │       │  - Load embeddings  │
    │  - Format content    │       │  - Semantic search  │
    │  - Build prompt      │       │  - Filter by score  │
    └─────────────────────┘       │  - Build prompt      │
                                  └─────────────────────┘

                              ┌───────────────┴───────────────┐
                              │                               │
                              ▼                               ▼
                    ┌─────────────────┐           ┌─────────────────┐
                    │  Results found  │           │  No results     │
                    │  - Add citations│           │  - Inform user  │
                    │  - Continue     │           │  - Continue     │
                    └─────────────────┘           └─────────────────┘