rag-flex

Public

Forked from yongwei/rag-flex

A flexible RAG (Retrieval-Augmented Generation) plugin for LM Studio with dynamic embedding model selection and intelligent context management.

19 Downloads

1 star

README

RAG-Flex

Version: 1.2.0 | License: MIT

English | 繁體中文 | 日本語

A flexible RAG (Retrieval-Augmented Generation) plugin for LM Studio with dynamic embedding model selection, intelligent context management, and multilingual support.

✨ Features

  • 🔄 Dynamic Model Selection: Choose from 4 mainstream embedding models with automatic local model detection
  • 🧠 Smart Context Management: Automatically decides between full-text injection and RAG retrieval based on file size
  • 🌏 Multilingual Support: Full UI and messages in English, Traditional Chinese, and Japanese
  • ⚙️ Flexible Configuration: Adjustable retrieval limits, affinity thresholds, and context usage
  • 🛡️ Robust Error Handling: AI-friendly error messages that guide users to solutions
  • 🔧 Developer Tools: Optional debug logging for troubleshooting and development

🚀 Quick Start

Prerequisites

  • Install LM Studio (v0.2.9 or later)
  • Download at least one embedding model:
    • Recommended: nomic-ai/nomic-embed-text-v1.5-GGUF (built-in, fast)
    • For Chinese/Multilingual: lm-kit/bge-m3-gguf (slower but more accurate)

Installation

From LM Studio Plugin Page (Recommended)

From GitHub (Development Mode)

git clone https://github.com/henrychen95/rag-flex.git
cd rag-flex
lms dev

The plugin will automatically load into LM Studio. You should see "Register with LM Studio" in the terminal output.

📖 Usage

Basic Workflow

  • Enable the plugin in LM Studio settings (Plugins tab)
  • Upload documents to your chat (PDF, DOCX, TXT, MD)
  • Ask questions - RAG-Flex automatically:
    • Analyzes file size and context usage
    • Chooses between full-text injection (small files) or RAG retrieval (large files)
    • Returns relevant chunks with citations

Example Conversations

Small File (Full-Text Injection)

📎 Upload: meeting-notes.txt (5 KB)
💬 You: "What were the action items from the meeting?"
🤖 AI: [Reviews entire document] "The action items were:
       1. John to prepare Q4 report by Friday
       2. Sarah to schedule follow-up meeting..."

Large File (RAG Retrieval)

📎 Upload: technical-manual.pdf (2 MB)
💬 You: "How do I configure SSL certificates?"
🤖 AI: [Retrieves relevant sections]
       "Based on Citation 1 and Citation 3:
       To configure SSL certificates, you need to..."

       Citation 1: (Page 45) "SSL Configuration involves..."
       Citation 3: (Page 89) "Certificate installation steps..."

⚙️ Configuration Options

Access plugin settings in LM Studio → Plugins → RAG-Flex

ParameterDefaultRangeDescription
Message LanguageAuto-detectedEN/ZH-TW/JALanguage for runtime messages
Embedding Modelnomic-ai/nomic-embed-text-v1.54 presetsSelect from preset embedding models
Custom Embedding Model(empty)Text inputOverride selection above with model key (e.g. text-embedding-bge-m3), identifier (e.g. lm-kit/bge-m3-gguf), or full path
Context Usage Threshold0.70.1 - 1.0Trigger point for RAG retrieval (lower = more precise)
Retrieval Limit51 - 15Number of chunks to retrieve
Retrieval Affinity Threshold0.40.0 - 1.0Similarity threshold (BGE-M3: 0.4-0.6 recommended)
Enable Debug LoggingOffOn/OffEnable debug logs for developers
Debug Log Path./logs/lmstudio-debug.logCustom pathPath to debug log file

Embedding Model Comparison

ModelSizeSpeedBest ForLanguage Support
nomic-ai/nomic-embed-text-v1.5-GGUF84 MB⚡⚡⚡ FastEnglish, general useEnglish
NathanMad/sentence-transformers_all-MiniLM-L12-v2-gguf133 MB⚡⚡⚡ FastLightweight tasksEnglish
groonga/gte-large-Q4_K_M-GGUF216 MB⚡⚡ MediumBalanced performanceMultilingual
lm-kit/bge-m3-gguf1.16 GB⚡ Slow (F16) / ⚡⚡ Medium (Q4)Chinese, multilingual, high precision100+ languages

Note: Due to SDK limitations, the dropdown only shows preset models. Use the Custom Embedding Model field to specify any downloaded model by entering its model key (e.g. text-embedding-qwen3-embedding-8b), identifier, or full path.

💡 Use Cases & Examples

📚 Technical Documentation Analysis

Scenario: Software developer needs API documentation
Upload: FastAPI-documentation.pdf (3.2 MB)
Ask: "What authentication methods does FastAPI support?"

Result: RAG retrieval mode activated
✓ Retrieved 5 relevant citations
✓ Found JWT, OAuth2, API Key sections
✓ Provided code examples from documentation

Configuration Tips:
- Context Threshold: 0.7 (default)
- Retrieval Limit: 5-7 (for comprehensive coverage)
- Affinity Threshold: 0.5 (technical content)
Scenario: Lawyer reviewing contract terms
Upload: commercial-lease-agreement.docx (250 KB)
Ask: "What are the tenant's responsibilities for maintenance?"

Result: Full-text injection mode (file within threshold)
✓ Entire document injected as context
✓ AI can cross-reference multiple clauses
✓ Comprehensive answer with exact clause numbers

Configuration Tips:
- Context Threshold: 0.8 (allow full injection)
- Language: 繁體中文 (for Traditional Chinese contracts)

💻 Code Understanding & Analysis

Scenario: Understanding database schema
Upload: database-schema.sql (450 KB)
Ask: "Explain the relationship between users and orders tables"

Result: RAG retrieval with lowered threshold
✓ Retrieved relevant CREATE TABLE statements
✓ Found foreign key constraints
✓ Identified junction tables

Configuration Tips:
- Affinity Threshold: 0.3-0.4 (lower for code/SQL)
- Retrieval Limit: 8-10 (capture related tables)
- Model: bge-m3 (better for code with comments in Chinese)

🏛️ Government Document Processing

Scenario: Public servant processing applications
Upload: subsidy-application-guidelines-2024.pdf (1.8 MB)
Ask: "申請資格有哪些限制條件?"

Result: Multilingual RAG retrieval
✓ Language auto-detected as Traditional Chinese
✓ Retrieved eligibility criteria sections
✓ Citations include page numbers and article references

Configuration Tips:
- Language: 繁體中文
- Model: bge-m3 (best for Traditional Chinese)
- Affinity Threshold: 0.5-0.6

📊 Research Paper Analysis

Scenario: Graduate student literature review
Upload: machine-learning-survey-2024.pdf (4.5 MB)
Ask: "What are the current challenges in transformer architectures?"

Result: Precision RAG retrieval
✓ Retrieved sections from "Challenges" and "Future Work"
✓ Cross-referenced with methodology sections
✓ Provided citations with page numbers

Configuration Tips:
- Context Threshold: 0.6 (force RAG for large papers)
- Retrieval Limit: 10-15 (capture diverse viewpoints)
- Model: gte-large (good balance for academic content)

🔧 Advanced Configuration Guide

Understanding Context Usage Threshold

The threshold determines when to switch from full-text injection to RAG retrieval:

Available Context = Remaining Context × Threshold

If (File Tokens + Prompt Tokens) > Available Context:
    → Use RAG Retrieval (precise mode)
Else:
    → Use Full-Text Injection (comprehensive mode)

When to adjust:

ThresholdBehaviorUse Case
0.3-0.5Forces RAG more oftenLarge documents, memory constraints
0.6-0.7Balanced (default)General use
0.8-0.9Allows more full injectionSmall documents, need full context

Optimizing Retrieval Affinity Threshold

Different content types require different similarity thresholds:

Content TypeRecommended ThresholdReason
Natural language text0.5-0.7Clear semantic matching
Technical documentation0.4-0.6Technical terms vary
Code/SQL0.3-0.4Syntax-heavy, lower semantic similarity
Mixed language0.4-0.5Account for language switching

Multilingual Configuration

The plugin automatically detects your system language and sets the UI accordingly:

  • Windows: Uses Intl API to detect locale
  • Linux/macOS: Checks LANG, LANGUAGE, LC_ALL environment variables
  • Manual Override: Change "Message Language" in plugin settings

Supported Languages:

  • 🇬🇧 English (en)
  • 🇹🇼 繁體中文 (zh-TW)
  • 🇯🇵 日本語 (ja)

📖 For developers: See I18N.md for technical details on the internationalization system, adding new languages, and translation guidelines. Also available in 繁體中文 and 日本語.

Developer Mode: Debug Logging

Enable debug logging for troubleshooting or development:

  • Open LM Studio → Plugins → RAG-Flex settings
  • Enable "Enable Debug Logging"
  • (Optional) Set custom "Debug Log Path"
  • Logs will include:
    • System locale detection
    • Model loading events
    • File processing steps
    • Retrieval results
    • Error stack traces

Default log location: ./logs/lmstudio-debug.log

🐛 Troubleshooting

Common Issues

"❌ Embedding model not found"

Cause: Selected model not downloaded in LM Studio

Solution:

  • Open LM Studio → Search (🔍)
  • Search for the model name (e.g., bge-m3)
  • Click Download
  • Wait for download to complete
  • Restart the chat or reload the plugin

Alternative: Select a different model in plugin settings


"No relevant citations found (threshold: 0.4)"

Cause: Retrieval affinity threshold too high for your content

Solutions:

  • For code/SQL files: Lower threshold to 0.3-0.4
  • For mixed-language documents: Try 0.4-0.5
  • For technical jargon: Lower to 0.35-0.45

How to adjust: LM Studio → Plugins → RAG-Flex → Retrieval Affinity Threshold


File processing too slow

Cause: Large file with high-precision embedding model

Solutions:

  • Switch to faster model:
    • Use nomic-embed-text-v1.5 instead of bge-m3
    • 10-20x faster for English content
  • Lower retrieval limit:
    • Reduce from 5 to 3 chunks
    • Faster processing, less context
  • Split large files:
    • Break >5MB files into chapters/sections

Runtime messages in wrong language

Cause: System locale auto-detection doesn't match your preference

Solution:

  • Open plugin settings
  • Manually select "Message Language"
  • Choose: English (en) / 繁體中文 (zh-TW) / 日本語 (ja)

Note: This only changes plugin runtime messages (errors, status updates). LM Studio's UI language is controlled by LM Studio itself.


Debug logs not being created

Possible causes:

  • Debug logging not enabled in settings
  • Insufficient file write permissions
  • Invalid log path

Solutions:

  • Enable "Enable Debug Logging" in plugin settings
  • Check log path exists and is writable
  • Try default path: ./logs/lmstudio-debug.log
  • On Windows, ensure path uses \\ or /

💡 Pro Tip: All error messages are AI-friendly - paste them directly into your LLM chat for automated troubleshooting!

📦 Supported File Formats

FormatExtensionProcessing MethodNotes
PDF.pdfText extractionSupports text-based PDFs (not scanned images)
Word Documents.docxFull document parsingPreserves structure and formatting
Plain Text.txtDirect readUTF-8 encoding recommended
Markdown.mdMarkdown parsingMaintains heading structure

Not supported: Images, audio, video, Excel spreadsheets, scanned PDFs without OCR

🆚 Improvements Over RAG-v1

FeatureRAG-v1RAG-Flex (v1.2.0)
Embedding Models❌ Hardcoded (nomic only)✅ 4 selectable + auto-detection
Multilingual Support❌ English only✅ English, 繁體中文, 日本語
Error Messages❌ Technical English✅ User-friendly, localized
Context Management⚙️ Basic threshold✅ Smart threshold-based strategy
Affinity Threshold❌ Fixed at 0.5✅ Configurable (0.0-1.0)
No-result Handling❌ Exposes system prompt✅ Graceful degradation
Model Detection❌ Manual configuration✅ Auto-detects local models
Debug Tools❌ None✅ Optional debug logging
Configuration UI⚙️ English only✅ Multilingual (system language)

🤝 Contributing

Contributions are welcome! Here's how you can help:

Reporting Issues

  • Use GitHub Issues for bug reports
  • Include debug logs (enable debug logging first)
  • Provide file type, size, and configuration used

Submitting Code

  • Fork the repository
  • Create a feature branch (git checkout -b feature/amazing-feature)
  • Follow existing code style (TypeScript with proper types)
  • Test with multiple embedding models
  • Update documentation if needed
  • Commit changes (git commit -m 'Add amazing feature')
  • Push to branch (git push origin feature/amazing-feature)
  • Open a Pull Request

Adding Translations

To add a new language:

  • Add language code to src/locales/types.ts
  • Create translation file: src/locales/[lang].ts
  • Update src/locales/index.ts
  • Update src/config.ts language options
  • Create README.[lang].md

📝 License

MIT License - see LICENSE file for details.

This means you can:

  • ✅ Use commercially
  • ✅ Modify and distribute
  • ✅ Use privately
  • ✅ Sublicense

Requirements:

  • ⚖️ Include original license and copyright notice

🙏 Acknowledgments

  • LM Studio Team - For the excellent SDK and plugin ecosystem
  • Original RAG-v1 Plugin - Inspiration and foundation
  • Embedding Model Authors:
  • Hugging Face Community - For model hosting and distribution
  • All Contributors - Thank you for your improvements and feedback!

Author: Henry Chen GitHub: @henrychen95 Repository: rag-flex LM Studio Plugin Page: lmstudio.ai/yongwei/rag-flex

Community


⭐ If RAG-Flex helps your workflow, please star the repository!

Made with ❤️ for the LM Studio community