Report Format
The final report is plain text written to --output (default: stdout).
It contains 11 sections in the order described below. Each major section is
separated by two blank lines for readability.
Section 1: Configuration Summary
Shows the effective configuration used for the crawl, including all URL classification lists.
Section 2: Statistics Summary
Overall crawl statistics including elapsed time, total requests, bytes downloaded, and per-domain request breakdown.
Section 3: Broken Links
Links that returned 4xx or 5xx HTTP status codes or connection errors. Grouped by the source page that contained the link.
Section 4: Broken Anchors
Fragment references (e.g. #section) where the target anchor ID does not
exist in the target page. Grouped by the target URL and fragment.
Section 5: Non-200 Responses
All URLs that returned a non-200 final status after redirects. Includes broken links (4xx/5xx) and also other non-200 responses (e.g. 403 Forbidden). Grouped by HTTP status code.
Section 6: Redirects
URLs where the server issued a 3xx redirect to a different final URL. Shows the original URL, final URL, and the HTTP status code of the first redirect hop (e.g. 301, 302). Redirects that ultimately resolve to 200 do not cause a non-zero exit code — they are informational only.
Note
Redirects are recorded using the status code of the first redirect response (e.g. 301), not the final 200. Only genuine server-side redirects to different URLs are listed; same-URL redirects caused by URL normalization are suppressed.
When ignore_http_to_https_redirects is enabled, redirects where only
the scheme changes from http to https (same host, path, and query)
are also suppressed. The section header will note
[http→https upgrades suppressed] when this option is active.
Section 7: Misplaced Assets
Only present when asset_urls is configured. Assets found outside their
expected locations, grouped by asset type (Image, Document, Data,
Infrastructure, Other).
Section 8: Ignore URL Matches
URLs that matched an ignore_urls prefix and were skipped entirely.
Listed so site owners know which ignored URLs are still being referenced.
Section 9: Non-HTTP Scheme Links
Links with non-HTTP schemes (mailto:, tel:, ftp:, etc.) that were
encountered during the crawl.
Section 10: SSL Warnings
Domains that had SSL certificate errors. Grouped by domain. Crawling continues after SSL errors.
Section 11: Unvalidated Anchors
Fragment references that could not be validated because the target page’s HTML was not parsed (due to no-crawl, depth limit, or external status).
Referencing Page Truncation
In all sections that list referencing pages, the count is limited to
--max-referencing-pages (default 10). When exceeded, a note is appended:
... and N more referencing pages