Forgeron3
/ Accounting firmsNov 24, 20256 min read

Turn your PDFs into an AI engagement assistant

You have the client’s PDF stack. Two years of engagement reports, three fiscal years of accounts, agreements, emails. In one weekend, it becomes a searchable assistant. Here’s how.

F3
The Forgeron3 teamMarseille & Paris

A typical engagement

A review, audit, or forensic engagement almost always starts the same way: the client provides two to six gigabytes of documents — PDFs, scans, accounting exports. The assigned junior spends the first three days trying to figure out where things are.

Those three days, you can save. And stop billing them as low-value work.

Step 1 — Sort the PDFs (Saturday morning, 2 hours)

Not a perfect sort. A quick sort, into four folders:

  • Accounting (FEC files, trial balances, general ledgers, balance sheets)
  • Contracts (articles of association, agreements, leases, registry extracts)
  • Operations (invoices, quotes, engagement reports)
  • Communication (client emails, letters, internal notes)

The assistant works better if each folder has a clear label. It does not work better if you rename every file individually — don’t spend the weekend on that.

Step 2 — Check the OCR (Saturday afternoon, 1 hour)

On scans, default OCR misses 5 to 15% of characters. On pre-2015 black-and-white scans, that climbs to 25%. Check by opening three or four PDFs at random and selecting text: if you get gibberish, enable “fine OCR” mode on the affected batch.

It’s ten times slower. It’s ten times more accurate. Essential on sensitive documents (contracts, rulings, leases).

Step 3 — Ingest in batches (Saturday evening)

In the platform, create an assistant named Engagement [Client] — [Period]. Configure it with watertight scope (never shared with other files).

Upload in 1-2 GB batches. Indexing runs in parallel; allow 10 to 15 minutes for 1,000 standard documents. Launch Sunday morning, go have breakfast, come back.

TipSemantic indexing takes 2 to 4 times longer than the upload itself. Don’t assume it’s done because the bar is at 100%. Check the ingestion log before testing.

Step 4 — Test with ten questions (Sunday morning, 1 hour)

Ten questions, written before loading. That’s the rule — otherwise you adapt your questions to what you get and the evaluation is worthless.

Three easy questions (a date, an amount, a name), five realistic ones (synthesis, comparison, cross-document search), two hard ones (cross-reference, exception, ambiguity).

For each question, score: correct answer (yes/no), source citation (yes/no), hallucination (yes/no).

The three essential guardrails

  1. Per-engagement isolation. No client assistant should have access to another client’s data. Non-negotiable; it’s the GDPR baseline.
  2. Systematic citation. Every answer must come with the source document. If the assistant answers without citing, you have a configuration problem — fix it before going further.
  3. Polite refusal. If the information isn’t in the data, the assistant must say so (“I can’t find this information in the file”). No invention, no extrapolation.

Bottom line: one weekend between the PDF stack and a working assistant. Three days of engagement work saved, redirected to advisory hours. See also Good documentation makes a good assistant for the upstream audit phase.

See an engagement assistant in practice

Twenty minutes with a sample of documents from one of your files. We load, ask ten questions, and evaluate accuracy together.

Book a demo