Usage ===== Synopsis -------- .. code-block:: text link_check [OPTIONS] [ROOT_URL] ``ROOT_URL`` is the URL to start crawling from. It may also be specified in a :doc:`configuration file `. Options ------- .. list-table:: :widths: 25 10 15 50 :header-rows: 1 * - Flag - Type - Default - Description * - ``ROOT_URL`` - string - required unless set in config - Root URL to begin crawling when not provided by ``--config-file``. * - ``-o``, ``--output`` - path - stdout - File path for the final plain-text report. * - ``--log-file`` - path - stderr - File path for log messages. * - ``--log-level`` - string - ``INFO`` - Minimum log level: ``DEBUG``, ``INFO``, ``WARNING``, ``ERROR``, ``CRITICAL``. Case-insensitive (e.g. ``debug`` and ``DEBUG`` both work). * - ``--timeout`` - int - ``10`` - Timeout in seconds for each HTTP request. * - ``--retries`` - int - ``3`` - Number of retry attempts for transient failures. * - ``--max-requests`` - int - unlimited - Maximum total HTTP requests to issue. * - ``--max-depth`` - int - unlimited - Maximum directory depth to crawl. * - ``--max-threads`` - int - ``10`` - Maximum number of concurrent threads. * - ``--max-referencing-pages`` - int - ``10`` - Max referencing pages per URL in report. * - ``--config-file`` - path - none - Path to a YAML configuration file. * - ``--version`` - flag - — - Print version and exit. Examples -------- Check a website with default settings: .. code-block:: bash link_check https://example.com Save report to a file, limit depth and threads: .. code-block:: bash link_check https://example.com --max-depth 3 --max-threads 20 -o report.txt Use a configuration file: .. code-block:: bash link_check --config-file config.yaml Override config file timeout on the command line: .. code-block:: bash link_check --config-file config.yaml --timeout 30 Enable debug logging to a file: .. code-block:: bash link_check https://example.com --log-level debug --log-file crawl.log Progress Reporting ------------------ While a crawl is running, a progress line is printed to stderr every 5 seconds:: [Progress] 365/~410 URLs checked | 12.2 URLs/s | 45 in queue | 8 threads active | 0m 30s elapsed The fields are: - **checked/~estimate** — URLs fully processed so far, and an estimate of the total (checked + currently queued). - **URLs/s** — average request rate since the crawl started. - **in queue** — URLs waiting to be submitted to the thread pool. - **threads active** — worker threads currently executing (bounded by ``--max-threads``). - **elapsed** — wall-clock time since the crawl started. Interrupting a Crawl -------------------- Press **Ctrl-C** once to abort. Any in-flight HTTP requests are allowed to finish, then a partial report is generated from results collected so far and written to the configured output destination. A second Ctrl-C issues an immediate hard kill. Exit Codes ---------- .. list-table:: :widths: 10 90 :header-rows: 1 * - Code - Meaning * - ``0`` - All checks passed: no broken links, no non-200 responses, no broken anchors, no misplaced assets, no SSL errors. * - ``1`` - One or more problems detected. * - ``2`` - Fatal error: invalid arguments, config file not found or has unknown keys, etc. * - ``130`` - Crawl was interrupted with Ctrl-C; report contains partial results.