docs / promptPreprocessor.md
The prompt preprocessor handles user queries with attached files for a RAG (Retrieval-Augmented Generation) system. It automatically selects the optimal context injection strategy based on file sizes and the available context window of the language model.
preprocessParameters:
ctl: PromptPreprocessorController — preprocessor controlleruserMessage: ChatMessage — user messageWorkflow:
chooseContextInjectionStrategy()retrievalinject-full-content — Full Content InjectionWhen used:
Process:
1. Parse each file via ctl.client.files.parseDocument() 2. Extract full content 3. Format with headers: ** filename full content ** 4. Inject into prompt with instructions
Output format:
This is a Enriched Context Generation scenario. The following content was found in the files provided by the user. ** document.pdf full content ** [full file content] ** end of document.pdf ** Based on the content above, please provide a response to the user query. User query: [user query]
retrieval — Semantic SearchWhen used:
Process:
1. Load embedding model (nomic-embed-text-v1.5-GGUF) 2. Perform semantic search via ctl.client.files.retrieve() 3. Filter results by retrievalAffinityThreshold 4. Add found citations to the prompt 5. Attach citations via ctl.addCitations()
Output format (with results):
The following citations were found in the files provided by the user: Citation 1: "[citation text]" Citation 2: "[citation text]" Use the citations above to respond to the user query, only if they are relevant. Otherwise, respond to the best of your ability without them. User Query: [user query]
Output format (no results):
Important: No citations were found in the user files for the user query. In less than one sentence, inform the user of this. Then respond to the query to the best of your ability. User Query: [user query]
none — No ContextWhen used:
The chooseContextInjectionStrategy() function makes decisions based on token calculations:
| Step | Description |
|---|---|
| 1 | Load LLM model via ctl.client.llm.model() |
| 2 | Measure current context usage via measureContextWindow() |
| 3 | Parse files and count total tokens |
| 4 | Calculate available tokens with 70% target utilization |
| 5 | Compare: totalFilePlusPromptTokenCount > availableContextTokens |
const contextOccupiedFraction = contextOccupiedPercent / 100; const targetContextUsePercent = 0.7; const targetContextUsage = targetContextUsePercent * (1 - contextOccupiedFraction); const availableContextTokens = Math.floor(modelRemainingContextLength * targetContextUsage);
If totalFileTokenCount + userPromptTokenCount > availableContextTokens → retrieval Else → inject-full-content
measureContextWindow()Measures context window utilization:
Returns:
{ totalTokensInContext: number, // total tokens in context modelContextLength: number, // model context size modelRemainingContextLength: number, // remaining tokens available contextOccupiedPercent: number // percentage filled }
getEffectiveContextFormatted()Applies the model's prompt template:
model.applyPromptTemplate(ctx)"?" and retriesprepareRetrievalResultsContextInjection()Handles the retrieval strategy:
retrievalAffinityThresholdprepareDocumentContextInjection()Handles the full-content injection strategy:
input.replaceText()Parameters from configSchematics:
| Parameter | Type | Description |
|---|---|---|
retrievalLimit | number | Maximum number of citations to retrieve |
retrievalAffinityThreshold | number | Relevance threshold for filtering citations (0.0–1.0) |
The preprocessor displays progress via PredictionProcessStatusController:
| Status | Message |
|---|---|
| Deciding | Deciding how to handle the document(s)... |
| Loading parser | Loading parser for {filename}... |
| Parser loaded | {library} loaded for {filename}... |
| Processing | Parsing file {filename}... ({progress}%) |
| Retrieval | Retrieving relevant citations for user query... |
| Done | Retrieved {N} relevant citations for user query |
The preprocessor outputs debug information via ctl.debug():
import { text, type Chat, type ChatMessage, type FileHandle, type LLMDynamicHandle, type PredictionProcessStatusController, type PromptPreprocessorController, } from "@lmstudio/sdk";
Embedding Model: nomic-ai/nomic-embed-text-v1.5-GGUF
┌─────────────────────────────────────────────────────────────┐ │ User Message + Files │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ preprocess() │ │ - Load history │ │ - Filter files (no images) │ │ - Choose strategy │ └─────────────────────────────────────────────────────────────┘ │ ┌───────────────┴───────────────┐ │ │ ▼ ▼ ┌─────────────────────┐ ┌─────────────────────┐ │ inject-full-content │ │ retrieval │ │ │ │ │ │ - Parse all files │ │ - Load embeddings │ │ - Format content │ │ - Semantic search │ │ - Build prompt │ │ - Filter by score │ └─────────────────────┘ │ - Build prompt │ └─────────────────────┘ │ ┌───────────────┴───────────────┐ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Results found │ │ No results │ │ - Add citations│ │ - Inform user │ │ - Continue │ │ - Continue │ └─────────────────┘ └─────────────────┘