Forgeron3
/ MethodMay 11, 20267 min read

RAG: feeding an assistant with your documents

An assistant that doesn’t cite its sources eventually invents them. The three RAG principles that separate a reliable tool from a hallucination machine.

F3
The Forgeron3 teamMarseille & Paris

RAG in one sentence

RAG, for Retrieval-Augmented Generation, is the simple idea that a language model answers your questions better if it has access to your documents at the moment it answers — rather than relying solely on what it learned during training.

In practice: your question is turned into a search signal, the system pulls the relevant passages from your document base, and the model writes an answer grounded in those passages — ideally citing their origin. Nothing more mystical than that.

The point: your assistant knows your contracts, your procedures, your board decisions, your internal doctrine. Not a generalized version of the internet.

Why chunking changes everything

Before a document becomes “findable” by the assistant, it is split into passages — typically a few hundred words each. Search happens on these passages. Bad chunking makes everything downstream fragile: you retrieve half a clause, a heading without its content, a conclusion without the reasoning.

The parameters that matter:

  • Passage size. Too short (200 words), context is lost. Too long (2,000 words), search loses precision. The sweet spot: 400 to 800 words, tuned to the document type.
  • Overlap. Each passage spills a bit into the next (10 to 15 percent), so an idea that straddles two chunks is never cut in half.
  • Preserved structure. Chunking that respects headings, sections, and bullets works far better than chunking purely by word count. A PDF whose structure was broken by OCR will produce broken answers.
Rule of thumbIf your answers are approximate, check the chunking before blaming the model. In 70 percent of the cases we diagnose, the problem lies in the retrieved passage, not in the generation. A good corpus that’s poorly chunked produces bad answers.

Force source citation

The reflex to enforce from day one: the assistant cites, or it doesn’t answer. Not “according to our archives,” but “according to the board decision of March 14, 2024, article 3, page 2.” Three benefits:

  • The user can verify in one click. Trust is built at that price, and that price only.
  • The assistant disciplines itself: if it can’t find a source, it says it doesn’t know — instead of extrapolating.
  • You get an auditable log: which question, which sources, which answer. Valuable for GDPR, the AI Act, and any future dispute.

To prepare a corpus that supports clean citations, see Good documentation makes a good assistant.

The three mistakes that cause hallucinations

  • The rotten corpus. You index everything that’s lying around — outdated versions, drafts, personal notes. The assistant can’t tell current doctrine from a working note abandoned in 2019. A clean corpus beats an exhaustive one.
  • No governance. Nobody owns keeping the corpus current. Six months in, the assistant is answering with old procedures and nobody knows when or why. Without an owner, every RAG drifts.
  • The “always answer” instruction. If you prompt the assistant to answer even without a source, it will invent rather than admit it doesn’t know. A good system prompt explicitly says “if the answer is not in the sources, say so.” Forced humility beats artificial confidence.

Production checklist

  • The corpus has a named owner: who owns it, who approves additions, who removes what is no longer valid.
  • Chunking has been tested on ten representative questions before opening it up to users.
  • The assistant cites sources with page, date, or reference — visible in the interface, not just in the technical trace.
  • The system prompt explicitly includes “if no source, say so.”
  • A feedback mechanism is in place: users can flag a wrong answer in two clicks, and the flag reaches a real person.
  • The full log (question, sources, answer, model, user) is retained according to your retention policy.

To plug this RAG into your IT (SSO, DMS, NAS), see Integrating an assistant into your infrastructure.

How many documents do you need for a RAG to be useful?

There is no lower bound. We’ve seen highly useful RAGs built on 50 well-chunked internal procedures. The upper bound, however, demands real work: past 5,000 documents, chunking quality and corpus organization drive answer quality far more than volume.

Does RAG replace model fine-tuning?

For 95 percent of enterprise use cases, yes. Fine-tuning carries a cost (technical, financial, ongoing maintenance) that most organizations never recoup. RAG is enough as soon as your need is to answer based on your documentation — which is the majority of cases.

Can you trust RAG citations?

Yes, provided the citation is constructed system-side (extracted from the retrieved passage) and not generated by the model. A good RAG hands you the displayable source passage, not just a plausible-sounding reference. Verify this during the demo: clicking the citation should open the document, not an error page.

Test on your documents

Twenty minutes by video call with your team. We index a few real documents and look at the answers together — not a generic demo.

Book a demo