Usage

Synopsis

link_check [OPTIONS] [ROOT_URL]

ROOT_URL is the URL to start crawling from. It may also be specified in a configuration file.

Options

Flag

Type

Default

Description

ROOT_URL

string

required unless set in config

Root URL to begin crawling when not provided by --config-file.

-o, --output

path

stdout

File path for the final plain-text report.

--log-file

path

stderr

File path for log messages.

--log-level

string

INFO

Minimum log level: DEBUG, INFO, WARNING, ERROR, CRITICAL. Case-insensitive (e.g. debug and DEBUG both work).

--timeout

int

10

Timeout in seconds for each HTTP request.

--retries

int

3

Number of retry attempts for transient failures.

--max-requests

int

unlimited

Maximum total HTTP requests to issue.

--max-depth

int

unlimited

Maximum directory depth to crawl.

--max-threads

int

10

Maximum number of concurrent threads.

--max-referencing-pages

int

10

Max referencing pages per URL in report.

--config-file

path

none

Path to a YAML configuration file.

--version

flag

Print version and exit.

Examples

Check a website with default settings:

link_check https://example.com

Save report to a file, limit depth and threads:

link_check https://example.com --max-depth 3 --max-threads 20 -o report.txt

Use a configuration file:

link_check --config-file config.yaml

Override config file timeout on the command line:

link_check --config-file config.yaml --timeout 30

Enable debug logging to a file:

link_check https://example.com --log-level debug --log-file crawl.log

Progress Reporting

While a crawl is running, a progress line is printed to stderr every 5 seconds:

[Progress] 365/~410 URLs checked | 12.2 URLs/s | 45 in queue | 8 threads active | 0m 30s elapsed

The fields are:

  • checked/~estimate — URLs fully processed so far, and an estimate of the total (checked + currently queued).

  • URLs/s — average request rate since the crawl started.

  • in queue — URLs waiting to be submitted to the thread pool.

  • threads active — worker threads currently executing (bounded by --max-threads).

  • elapsed — wall-clock time since the crawl started.

Interrupting a Crawl

Press Ctrl-C once to abort. Any in-flight HTTP requests are allowed to finish, then a partial report is generated from results collected so far and written to the configured output destination. A second Ctrl-C issues an immediate hard kill.

Exit Codes

Code

Meaning

0

All checks passed: no broken links, no non-200 responses, no broken anchors, no misplaced assets, no SSL errors.

1

One or more problems detected.

2

Fatal error: invalid arguments, config file not found or has unknown keys, etc.

130

Crawl was interrupted with Ctrl-C; report contains partial results.