Indexed Sources & Chunks

Once the Knowledge Layer is enabled, the indexing cron turns every eligible ir.attachment into an Indexed Source plus a stream of Chunks. This page is how you monitor and troubleshoot that pipeline.

The Indexed Sources list

MCP Server ‣ Knowledge ‣ Indexed Sources.

Each row corresponds to one attachment that the indexer has seen.

Columns:

  • Attachment — the underlying file name.

  • Mimetype — content type detected by Odoo.

  • Attached Model — what model the attachment is linked to (res.partner, hr.employee…).

  • Statepending, indexing, indexed, failed, skipped.

  • Extraction Methodpdf, text, html / markdown, …

  • Chunks — number of chunks produced from this source.

  • Tokens — total token count across chunks.

  • Last indexed at — timestamp of the most recent successful index.

    Indexed Sources list grouped by state with success rows green, failed rows red, and skipped rows greyed out

The colour coding makes it easy to spot failures at a glance:

  • Greenindexed (success).

  • Redfailed (with error details on the form).

  • Greyskipped (excluded by mimetype, model, or size).

  • Blue / Infopending / indexing (in progress).

State machine

  1. Pending — the cron has noticed the attachment but hasn’t processed it yet.

  2. Indexing — the worker is extracting text and producing chunks.

  3. Indexed — chunks are stored and queryable.

  4. Failed — extraction crashed; the form’s Error section explains why. The cron keeps retrying up to the configured limit, then stops.

  5. Skipped — excluded by mimetype, model, or size policy.

The Source form

Open a row to see:

  • Smart buttons

    • Chunks — drills into the embedded chunk browser.

    • Attachment — opens the underlying ir.attachment.

    • Retry Count (failed only) — X / Y indicator showing remaining retry budget.

  • Statusbarpending → indexing → indexed.

  • Header :guilabel:`Re-index` — visible in any state except pending / indexing. Click to retry / rebuild this source.

  • Error group — populated when state = failed.

    Source form for a failed PDF with the Re-index button, the chunks smart button, and an Error group showing the extraction stack

Manually re-indexing

Sometimes you need to force a re-index:

  • A new document was uploaded but you don’t want to wait for the 15-minute cron.

  • A previously-failed extraction now works after a server update.

  • You changed the chunking parameters and want a fresh rebuild.

Two options:

  • Per row — open the source form and click Re-index. The action runs inline — you wait while the worker rebuilds chunks for this source.

  • Bulk — in the list view, tick several rows and click the header Re-index Selected button. The confirmation dialog warns about the wait for large selections.

    Indexed Sources list with multiple rows selected and the Re-index Selected button visible above the list

The Chunks browser

MCP Server ‣ Knowledge ‣ Chunks.

A read-only browser of every chunk the indexer has produced.

Columns:

  • Attachment — the source file name.

  • Chunk index — 0-based position inside the source.

  • Tokens — token count.

  • Attached model — same as the source row.

  • Content — full chunk text (optional column).

    Chunks list grouped by source showing one source folded open with three chunks visible

Search filters

The search bar is geared toward debugging:

  • Has embedding / No embedding — diagnose hybrid-mode gaps (chunks with no vector mean the embedding call failed).

  • Tiny chunks (<50 tokens) — usually boilerplate headers / footers; harmless but noisy.

  • Huge chunks (>500 tokens) — the chunker couldn’t find a sentence break; consider lowering the chunk size in Configuration.

Group-bys: Source, Attachment, Attached model.

Tip

The Chunks view is the canonical debugger for “the AI said it doesn’t know this document — why?” questions. Filter by Attachment and look at the chunk contents: the extractor may have only captured the cover page.