Law Office RAG Console

Matter Readiness

Track ingestion, metadata quality, extraction confidence, and source-backed outputs.

Complete extraction profile

Idle - no active ingestion

Start a fast scan or full ingestion when you are ready.

0%

0 files remaining Background work can yield to active tasks

Current file activity No active file stages

GPU Intel B580 active

Model Qwen2.5 14B

Ollama checking…

CPU 8/16 cores - 18%

RAM 9.6 / 24 GB

VRAM 6.8 / 12 GB

Storage Browser only

Documents 0

0 ready for retrieval

Indexed passages 0

Page and paragraph anchored

Key facts 0

Dates, names, places, issues

Needs review 0

Low-confidence metadata

Ingestion Queue

Recent Source Findings

Ingest Client Files

Add a NAS folder or local file batch, then normalize OCR, metadata, embeddings, and extraction.

^

Drop files or choose a batch

PDF, DOCX, TXT, email exports, image scans, and mixed disclosure folders.

NAS source

Navigate and add folders from your network share

No NAS folders added — click Browse NAS to add one.

OCR scanned pages Keep data local

Extraction profile

Pipeline Steps

Idle

Matter Work Queue

Vision OCR

—

Runs a two-tier vision pass on low-quality scans and image files. GLM-OCR (0.9B) handles typed text; the selected Qwen model handles photographs, handwriting, and complex layouts. Files are flagged automatically during ingestion when word count or confidence is below threshold.

Phase 2 model

Document Register

Review dates, authors, document types, relevance, and extraction confidence before relying on answers.

Custom Tags

Build the matter tag vocabulary you want to reuse while reviewing documents.

Date ↕	Document ↕	Author ↕	Type ↕	Description ↕	Tags ↕	Privilege ↕	Affidavit ↕	Relevance ↕	Status ↕	Review	Open	Download

Case Theory and Issues

Maintain the living theory of the case so relevance, privilege review, summaries, and chronologies can be reassessed as the pleadings evolve.

Working notes

Train-of-thought notes for your own use

Matter summary

Neutral summary Current theory

Live issues

Issues, defences, and relief Pleadings snapshot

Relevance criteria

What makes a document relevant now? Hot document signals

Privilege posture

Privilege and review instructions Require human confirmation before final privilege tagging

Extraction Profiles

Each profile has its own prompt and stores its own extraction results separately. The active profile drives the chronology and findings displays.

Extraction Prompt

Saved prompts What to extract from each document. Use {matter_context}, {doc_name}, and {text} as placeholders. Leave blank to use the system default.

AI Draft Assistance

Requires approval

Suggested Issue Updates

From ingestion

Reassessment History

Theory v1

Matter Scratchpad

Persistent working notes — inject selected sections as context into any query. Auto-synced to NAS as scratchpad.md.

Chronology Builder

Source-backed dates and facts with page and paragraph references for review and export.

Ask the Matter

Answers must cite the document, page, paragraph, and confidence for each important proposition.

Legal Research

Search CanLII, pull metadata and citator data, and add full decision text to any matter — all without leaving this window.

Step 1 — Search CanLII

Not checked

The CanLII API is a metadata-only service — keyword search requires the website. Search opens CanLII in a separate tab. Find your case there, then copy its URL and paste it into Step 2 below.

Open CanLII ↗

Step 2 — Look up a case by URL

Paste any CanLII URL

Paste the URL of any CanLII decision to retrieve its metadata, keywords, and citator links. Every case — including cases that cite it and cases it cites — has its own Add to Matter button to pull the full decision text into your Qdrant index.

Add cases to:

Future Modules

Pin planned integrations here so the matter system can grow without losing the core ingestion, retrieval, and review workflow.

Local AI Settings

Swap models and define the local services the backend will use on your Ubuntu VM.

Matter manager

Matter list will appear when the server state loads.

Conversation model

Model

Checking installed local models...

Assistant name Local endpoint Context window (standard) Context window — Deep Reasoning

Deep Reasoning automatically uses at least this many tokens of context. Increase if reasoning is still being cut off; decrease if you are running out of RAM. Qwen3 14B supports up to 131072.

KV cache quantization

Controls how the model's attention cache is stored in VRAM during generation. f16 (default) is full precision and uses the most memory. q8_0 cuts cache memory roughly in half with no meaningful quality difference — recommended if you are hitting VRAM limits or running long contexts. q4_0 halves it again but may slightly degrade coherence on very long answers. Requires Ollama 0.5 or newer; on older versions this setting is silently ignored.

Download model

Retrieval store

Qdrant URL Embedding model

Embedding models are cached on the VM. Building a new index preserves older indexes so you can switch back without losing the earlier ingestion.

Child chunk size (words, ~150 tokens) Child chunk overlap (words)

Parent window size is fixed at 650 words (~850 tokens) with no overlap. Children are embedded; the matched child's parent window is returned to the LLM as context. Use Re-index clean after changing these settings.

OCR and vision

OCR profile

Runtime: Tesseract, 1 thread per document. Worker status will appear when the VM reports telemetry.

Preserve original file path and checksum Require citations in generated answers

Vision OCR tuning

Phase 2 vision model

Phase 2 runs on documents that Phase 1 (GLM-OCR) could not improve adequately — typically handwritten notes, degraded scans, tables, and mixed-language pages. Tesseract+ is the CPU fallback with layout-aware preprocessing; the VLM options use a vision language model to read the page image directly and produce much better results on difficult scans, but require a GPU with enough free VRAM to load an 8B model alongside the main LLM.

Auto-trigger on ingest

When enabled, every ingested PDF or image is checked against the low-yield threshold. Files that fall below it are automatically queued for vision OCR in the background. The pipeline pauses whenever you submit a query so the GPU stays responsive. Disable this if you want to control which files get re-processed manually.

Low-yield threshold (passages per page)

A document is flagged as a candidate for vision OCR if its passage count divided by page count falls below this number. The default of 2 means a 10-page PDF with fewer than 20 indexed passages will be re-processed. Increase this to be more aggressive (re-process more files); decrease it to only catch near-blank pages.

GLM improvement factor

Phase 1 (GLM-OCR) runs a fast vision pass on each page. Its result is only accepted — skipping the slower Phase 2 — if it produces at least this many times more words than the original extraction. 1.5 means GLM must be 50% better. Set lower (e.g. 1.1) to accept smaller gains and skip Phase 2 more often; set higher (e.g. 2.0) to demand clear improvement before trusting GLM's result.

GLM sufficient words per page

A secondary acceptance criterion for Phase 1. Even if GLM only marginally improves on the original, if it produces a dense enough result (default: 80 words per page) it is assumed the page is well-covered and Phase 2 is skipped. Increase this if you are finding that GLM passes low-quality results; decrease it for shorter documents like cover pages or indexes.

Pipeline tuning

Tesseract threads per document OCR worker slots (0 = distributed workers only) Embedding worker slots Deep extraction workers (Ollama parallel requests) Reserved system cores

Prioritize chat while external OCR workers continue

Runtime tuning will appear when the VM reports telemetry.

OCR worker pool

Worker status will appear when helper machines check in.

Backups

Backup profile Prune older than days

Include Qdrant vector store Include local model caches

Backup status will appear here.

CanLII research

CanLII API key Default language API call limit per minute

CanLII metadata calls will be rate-limited and cached locally.

Setting guide

Recommended defaults included

Select an information button to see what the setting changes, recommended defaults, and what trade-offs to expect.