IMPLEMENTATION_PLAN.md

RAG Plugin Implementation Plan

Objective

Upgrade the current LM Studio document RAG plugin from a simple prompt-preprocessor flow into a stronger, measurable, and safer RAG system while preserving a working fast path.

The plan is intentionally staged:

  • keep the current prompt preprocessor working first
  • improve retrieval quality and reliability inside the current architecture
  • only later add agentic retrieval with a tools provider

Active execution task list

Workspace migration tranche

  • Review the saved handoff notes, current repo layout, and existing implementation plan.
  • Research current npm workspace and TypeScript project-reference guidance before changing repo structure.
  • Identify package-boundary blockers in the current codebase.
  • Decouple core runtime contracts from MCP-specific schema types.
  • Create the root npm workspace configuration and package skeletons for packages/core, packages/adapter-lmstudio, and packages/mcp-server.
  • Move transport-agnostic core sources into packages/core while preserving compatibility through temporary re-export shims.
  • Move LM Studio adapter-specific sources into packages/adapter-lmstudio behind stable entrypoints.
  • Move MCP server-specific sources into packages/mcp-server behind stable entrypoints.
  • Update scripts and TypeScript config for the workspace layout.
  • Update docs for the workspace layout.
  • Re-run smoke tests and type-check for the migrated layout.

MCP extraction tranche

  • Review the MCP-oriented architecture note and current repo state.
  • Confirm the live integration constraints: LM Studio prompt preprocessors/tools providers are plugin hooks, while MCP uses separate servers and transports.
  • Turn the MCP note into a staged task list that preserves the existing plugin while carving out a reusable core.
  • Introduce transport-agnostic core contracts for documents, candidates, evidence blocks, and rerank outputs.
  • Extract the retrieval post-processing pipeline behind those contracts (fusion, hybrid merge, heuristic rerank, dedupe, evidence assembly).
  • Add an LM Studio bridge layer that converts between LM Studio retrieval entries and the new core contracts.
  • Rewire the prompt preprocessor to use the extracted core pipeline without changing current plugin behavior.
  • Move gating, rewrite generation, corrective assessment, and safety helpers behind the same core contract boundary.
  • Add first-pass MCP request/response schemas for rag_answer, rag_search, corpus_inspect, and rerank_only.
  • Add smoke coverage for the extracted core pipeline, policy helpers, and bridge conversions.
  • Sketch the next extraction step: adapter-specific retrieval/file-loading interfaces.
  • Only after that, scaffold the MCP server package and first tool handlers.

Working task list after this slice

  • Add src/core/pipeline.ts-level eval cases independent of LM Studio runtime objects.
  • Move gating, rewrite, corrective assessment, and safety behind the same core contract boundary.
  • Introduce explicit MCP request/response schemas.
  • Add adapter-specific retrieval and document-loading runtime interfaces.
  • Scaffold a minimal stdio MCP server entrypoint and tool handler layer using the new schemas.
  • Replace the current stub MCP runtime with real filesystem loading and shared lexical retrieval adapters.
  • Decide whether the first real MCP transport should stay hand-rolled JSON-RPC or switch to the official MCP TypeScript SDK once the runtime shape stabilizes.
  • Add packages/ workspace split only after the runtime boundary stops moving.

Current baseline

Existing behavior

  • src/index.ts
    • registers config schematics
    • registers the prompt preprocessor
  • src/config.ts
    • plugin config fields for embedding selection, unloading, retrieval limit, and threshold
  • src/promptPreprocessor.ts
    • decides whether to inject full file content or run retrieval
    • parses documents
    • retrieves relevant chunks
    • adds citations

Recently fixed issues

  • missing LLMDynamicHandle import
  • downloaded embedding model lookup now uses the typed embedding API and modelKey
  • context measurement now appends the active user prompt before applying the model prompt template

Constraint to keep in mind

The prompt preprocessor is still the best fit for the simple fast path. More advanced iterative retrieval should be added later via a tools provider rather than overloading the preprocessor.


Deliverables by phase

Phase 0 — Baseline evaluation and instrumentation

Goal

Create a repeatable way to measure whether changes improve the plugin.

Work items

  • Add an evaluation corpus.
  • Add a simple runner for end-to-end plugin pipeline evaluation.
  • Add metrics capture for retrieval quality, answer grounding, and latency.

Files to add

  • eval/cases/basic.jsonl
  • eval/cases/hard.jsonl
  • scripts/eval.ts
  • src/metrics.ts
  • src/types/eval.ts

Eval case schema

Each case should include:

  • id
  • files
  • question
  • expected_answer_points
  • expected_sources
  • answerability
  • difficulty

Metrics to track

  • retrieval hit rate
  • citation coverage
  • unsupported-claim count
  • no-match correctness
  • average latency
  • average injected tokens
  • chunk redundancy rate

Acceptance criteria

  • a single command runs the eval suite
  • baseline metrics are saved to JSON
  • current plugin behavior is captured before further changes

Suggested task checklist

  • create eval/ folder structure
  • define JSONL schema
  • add 30–50 initial cases
  • add scripts/eval.ts
  • write eval output to eval/results/
  • document how to run evals in README.md

Phase 1 — v2.1 fast-path upgrade inside the prompt preprocessor

Goal

Improve answerability handling, retrieval quality, evidence quality, and safety without changing plugin type.

1A. Answerability / retrieval-utility gate

Goal

Predict whether retrieval is likely useful before paying the full retrieval cost.

Behavior

Classify each request into one of:

  • no retrieval needed
  • retrieval likely useful
  • likely unanswerable from provided files
  • ambiguous / clarification needed

Files to add

  • src/gating.ts
  • src/types/gating.ts

Changes to existing files

  • src/promptPreprocessor.ts
    • call gate before retrieval strategy selection
    • allow early return for no-match / clarification cases
  • src/config.ts
    • add gate-related config fields

New config fields

  • answerabilityGateEnabled
  • answerabilityGateThreshold
  • ambiguousQueryBehavior

Suggested implementation order

  • heuristic gate first
  • optional small-model judge later

Acceptance criteria

  • no-match questions avoid unnecessary retrieval
  • ambiguous questions can produce a clarification instruction path
  • eval shows improved no-match handling

Checklist

  • implement query heuristic features
  • add gate result type
  • wire gate into preprocessor
  • add config fields
  • add eval cases for no-match and ambiguous prompts

1B. Multi-query rewrite and fusion

Goal

Retrieve from multiple query variants and fuse the results.

Query variants

  • literal rewrite
  • keyword-focused rewrite
  • acronym-expanded rewrite
  • decomposed sub-question rewrite

Files to add

  • src/queryRewrite.ts
  • src/fusion.ts
  • src/types/retrieval.ts

Changes to existing files

  • src/promptPreprocessor.ts
    • replace single-query retrieval path with multi-query candidate generation
  • src/config.ts
    • add fusion configuration

New config fields

  • multiQueryEnabled
  • multiQueryCount
  • fusionMethod
  • maxCandidatesBeforeRerank

Acceptance criteria

  • retrieval can run on multiple rewrites
  • results are deduplicated and fused
  • citation recall improves on eval set without large latency regression

Checklist

  • implement rewrite generator interface
  • implement initial deterministic rewrites
  • add reciprocal-rank fusion
  • add dedupe step after fusion
  • expose config fields

1C. Evidence packaging, dedupe, and neighbor expansion

Goal

Pass better evidence to the model than raw top chunks.

Behavior

Each evidence block should include:

  • file name
  • section or heading if available
  • page number if available
  • matched chunk text
  • optional neighboring chunk text
  • provenance label for citation formatting

Files to add

  • src/evidence.ts
  • src/types/evidence.ts

Changes to existing files

  • src/promptPreprocessor.ts
    • use evidence packaging before injecting retrieval content

New config fields

  • neighborWindow
  • dedupeSimilarityThreshold
  • maxEvidenceBlocks

Acceptance criteria

  • repeated or near-identical chunks are reduced
  • answers get better local context around retrieved evidence
  • citation quality improves

Checklist

  • add evidence block type
  • add near-duplicate filtering
  • add neighbor expansion
  • include structural metadata in formatted retrieval content

1D. Retrieved-text safety and injection hardening

Goal

Treat file content as untrusted data.

Files to add

  • src/safety.ts
  • src/types/safety.ts

Changes to existing files

  • src/promptPreprocessor.ts
    • sanitize and wrap evidence before injecting it
  • src/config.ts
    • add safety options

New config fields

  • sanitizeRetrievedText
  • stripInstructionalSpans
  • strictGroundingMode

Acceptance criteria

  • retrieved text is normalized before injection
  • the injected prompt explicitly treats retrieved content as data, not instructions
  • malicious-looking spans are reduced or flagged

Checklist

  • normalize unicode and spacing
  • sanitize markdown/html-ish content
  • add instruction wrapper around evidence blocks
  • optionally downweight imperative text spans

Phase 2 — v2.2 retrieval core rebuild

Goal

Upgrade the underlying retrieval engine with adaptive chunking, hybrid retrieval, and reranking.

2A. Adaptive and structure-aware chunking

Goal

Move from flat text assumptions to document-structure-aware chunking.

Files to add

  • src/chunking.ts
  • src/documentModel.ts
  • src/types/document.ts

Chunking modes

  • prose mode
  • section-heading mode
  • page-plus-section mode
  • fallback fixed-token mode

New config fields

  • chunkingMode
  • targetChunkTokens
  • maxChunkTokens
  • structureAwareChunking

Acceptance criteria

  • chunks preserve major structure boundaries where possible
  • section metadata is preserved for later ranking and citation formatting

Checklist

  • define normalized document section model
  • add parser-to-structure conversion helpers
  • implement heading-aware chunker
  • implement prose chunker
  • add chunk metadata

2B. Hybrid retrieval

Goal

Combine semantic retrieval with lexical retrieval.

Files to add

  • src/lexicalRetrieve.ts
  • src/hybridRetrieve.ts
  • src/indexing.ts

Approach

  • semantic retrieval from LM Studio embedding/file APIs
  • local lexical scoring over parsed chunks
  • merge and score both candidate sets

New config fields

  • hybridEnabled
  • lexicalWeight
  • semanticWeight
  • hybridCandidateCount

Acceptance criteria

  • lexical-only matches improve for exact terms and rare phrases
  • merged candidate pool outperforms semantic-only baseline on evals

Checklist

  • implement lexical scoring over chunk text and headings
  • merge semantic and lexical candidate lists
  • expose weights in config
  • add eval cases for exact-match terminology queries

2C. Reranking for evidence suitability

Goal

Select evidence that is sufficient and complementary, not just topically similar.

Files to add

  • packages/adapter-lmstudio/src/rerank.ts
  • packages/lmstudio-shared/src/rerankTypes.ts

Reranking strategy

Version 1:

  • heuristic reranker using:
    • lexical overlap
    • heading match
    • completeness score
    • diversity penalty
    • section relevance

Version 2:

  • optional model-based reranker for top candidate set

New config fields

  • rerankEnabled
  • rerankTopK
  • rerankStrategy

Acceptance criteria

  • top evidence set is less redundant
  • answer-supporting evidence improves on eval set

Checklist

  • define rerank feature set
  • implement heuristic reranker
  • integrate with evidence packaging
  • add optional model-based rerank hook

Phase 3 — v3 agentic retrieval with tools provider

Goal

Support iterative, multi-hop, or clarification-heavy retrieval workflows.

Rationale

This should be implemented as a tools provider rather than forcing it into the prompt preprocessor.

Files to add

  • src/toolsProvider.ts
  • src/tools/searchFiles.ts
  • src/tools/readSection.ts
  • src/tools/readNeighbors.ts
  • src/tools/listHeadings.ts
  • src/tools/verifyClaim.ts
  • src/types/tools.ts

Changes to existing files

  • src/index.ts
    • register tools provider when enabled
  • src/config.ts
    • add agentic mode settings

New config fields

  • agenticModeEnabled
  • maxToolCalls
  • toolReadWindow
  • verificationEnabled

Initial tool set

  • search_files(query)
  • read_section(file, sectionId)
  • read_neighbors(file, chunkId, window)
  • list_headings(file)
  • verify_claim(claim, evidenceIds)

Acceptance criteria

  • complex questions can retrieve iteratively
  • multi-hop answers improve compared with single-shot retrieval
  • tool traces are observable in LM Studio

Checklist

  • add tools-provider registration
  • define tool schemas
  • implement file search tool
  • implement structured read tools
  • gate advanced mode behind config

Phase 4 — Verification and grounded generation

Goal

Reduce unsupported claims in generated answers.

Files to add

  • src/verify.ts
  • src/claimSplit.ts
  • src/types/verify.ts

Behavior

  • generate a draft answer
  • split into claims or sentences
  • verify each claim against selected evidence
  • rewrite, remove, or downgrade unsupported claims

New config fields

  • claimVerificationEnabled
  • maxClaimsToVerify
  • unsupportedClaimBehavior

Acceptance criteria

  • unsupported-claim rate decreases on evals
  • citation linkage remains intact

Checklist

  • add claim splitter
  • add evidence-check function
  • add unsupported-claim policy behaviors
  • record verification outcomes in eval logs

Phase 5 — Reliability and security hardening

Goal

Make the plugin safer and more robust against hostile or messy documents.

Files to add

  • src/sanitize.ts
  • src/policy.ts
  • src/types/policy.ts

Work items

  • parse-time sanitization
  • suspicious-span detection
  • attribution-gated answering
  • quarantine handling for risky document types

Acceptance criteria

  • risky content is flagged or neutralized before reaching generation
  • grounded-answer behavior stays consistent under hostile inputs

Checklist

  • add document sanitization pipeline
  • add suspicious span rules
  • add strict attribution mode
  • add security-focused eval cases

Config roadmap

Immediate config additions for v2.1

Add these to src/config.ts:

  • answerabilityGateEnabled
  • answerabilityGateThreshold
  • ambiguousQueryBehavior
  • multiQueryEnabled
  • multiQueryCount
  • fusionMethod
  • maxCandidatesBeforeRerank
  • neighborWindow
  • dedupeSimilarityThreshold
  • maxEvidenceBlocks
  • sanitizeRetrievedText
  • stripInstructionalSpans
  • strictGroundingMode

Later config additions

  • chunkingMode
  • targetChunkTokens
  • maxChunkTokens
  • structureAwareChunking
  • hybridEnabled
  • lexicalWeight
  • semanticWeight
  • hybridCandidateCount
  • rerankEnabled
  • rerankTopK
  • rerankStrategy
  • agenticModeEnabled
  • maxToolCalls
  • toolReadWindow
  • verificationEnabled
  • claimVerificationEnabled
  • maxClaimsToVerify
  • unsupportedClaimBehavior

Milestone 1

Phase 0 + Phase 1A

  • baseline eval harness
  • answerability gate

Milestone 2

Phase 1B + Phase 1C

  • multi-query rewrite
  • fusion
  • evidence dedupe and neighbor expansion

Milestone 3

Phase 1D + cleanup

  • sanitization and strict grounding behavior
  • polish current prompt-preprocessor flow

Milestone 4

Phase 2A + Phase 2B + Phase 2C

  • chunking
  • hybrid retrieval
  • reranking

Milestone 5

Phase 3 + Phase 4

  • tools provider
  • verification pipeline

Milestone 6

Phase 5

  • security and reliability hardening

Definition of success

The plugin should improve along these axes while remaining usable inside LM Studio:

  • better no-match behavior
  • higher citation coverage
  • lower unsupported-claim rate
  • better retrieval for rare or exact terminology
  • less redundant evidence injection
  • manageable latency
  • safer handling of hostile document content

Immediate next action

Start with:

  • Phase 0 baseline eval harness
  • Phase 1A answerability gate
  • Phase 1B multi-query rewrite and fusion
  • Phase 1C evidence packaging and neighbor expansion
  • Phase 1D retrieved-text safety wrapper

This is the best first cut because it materially improves the current plugin without forcing an architectural jump before there is measurement in place.