The 80 percent rule
The classic mistake before an AI rollout is believing answer quality depends on the model used. It actually depends, by a wide margin, on the quality of the documentation the assistant was given.
On a properly configured assistant, clean documentation produces 85 to 95 percent correct answers. Average documentation tops out at 60 percent. Chaotic documentation stays at 30 percent. The model, in all three cases, is the same.
A good assistant is not a brain that knows. It’s a memory that retrieves. If the memory is in disorder, what comes back is in disorder.
Audit what you have (before touching anything)
Half a day with two or three people who actually know your sources. List everything, without throwing anything out yet:
- Live sources (documents updated within the last 12 months)
- Reference sources (stable procedures, template contracts, doctrine)
- Archived sources (old case files, meeting notes, emails)
- Obsolete sources (superseded versions, abandoned decisions)
The trap: loading everything in one go. Obsolete sources poison answers — the assistant doesn’t know they’re stale. Ingest first: live + reference. The rest can wait.
Clean, without rewriting
Three mechanical passes before ingestion:
- Deduplication by content hash. On a ten-year-old NAS, expect 30 to 40 percent useless duplicates.
- Triage by last-modified date. Anything older than 5 years and not flagged as “doctrine” goes to secondary archive.
- Convert the unreadable. Blurry scans, corrupted .docx, password-protected files with no password — set aside, handle case by case.
What not to do: rewrite the documents. That’s a recipe for a two-year project. An AI assistant ingests what you have, not what you wish you had.
Structure the bare minimum
Three items to add to each document before ingestion:
- A date in the filename or the header (YYYY-MM-DD).
- A status: active, archive, draft, expired.
- A business owner (who can answer if the assistant is unsure).
If your documentation already carries these three metadata fields, you’re ready. If not, add them to the 200 most-consulted documents first — not all 5,000.
Maintain without overinvesting
Once the assistant is in production, the documentation has to stay alive — without becoming a chore. Three simple routines:
- Quarterly review: a business owner validates or archives documents flagged “to review.”
- Automatic sync: connect the DMS to the assistant so new documents flow in without manual steps.
- Feedback loop: users flag wrong answers, you trace back to the source to correct or archive it.
Special case: accounting firms
For an accounting firm, three document families matter more than the others:
- Doctrine (technical opinions, memos, case law) — clean it first; it’s the core of your value.
- Client files — sensitive, must be partitioned by client if the assistant is shared across staff.
- Internal procedures (engagement scoping, methodologies) — most useful for new hires.
Step-by-step in Accounting firms: three assistants to build before summer. See also our accounting firms page.
Twenty minutes on a sample of your corpus. We look at the real quality, identify what to clean first, and size the effort.
Book a demo→