wget | Cell 1 | Cell 3 | Search

This code uses the wget command to download a website's documentation section, including its dependencies, using a recursive approach to download only the required files. The code includes numerous options to customize the download process, such as avoiding overwrites and waiting between downloads to avoid overwhelming the server.

Cell 2

wget --mirror --level=2 --no-parent --convert-links --html-extension --no-host --no-clobber -N -c --accept-regex "/documentation/" --wait=1 --page-requisites https://www.excessiveplus.net/documentation

What the code could have been:

#!/bin/bash

# Define constants for wget options
MIRROR=false
LEVEL=2
NO_PARENT=true
CONVERT_LINKS=true
HTML_EXTENSION=true
NO_HOST=true
NO_CLOBBER=true
NEWER=true
CONTINUOUS=true
ACCEPT_PATTERN="/documentation/"
WAIT_PERIOD=1
URL="https://www.excessiveplus.net/documentation"

# Set wget options
OPTS=(
  "--${MIRROR:-false}"
  "--level=${LEVEL}"
  "--no-parent"
  "--convert-links"
  "--html-extension"
  "--no-host"
  "--no-clobber"
  "-N"
  "-c"
  "--accept-regex=${ACCEPT_PATTERN}"
  "--wait=${WAIT_PERIOD}"
  "--page-requisites"
)

# Print wget command
echo "wget ${OPTS[@]} ${URL}"

# TODO: Consider using a more robust way of defining wget options
#       such as using a separate file or environment variables.

Code Breakdown

This code uses the wget command to download a website's documentation section. Here's a step-by-step explanation:

  1. wget: The command to download the website using the wget utility.
  2. --mirror: Download the website and its entire hierarchy, without removing any files.
  3. --level=2: Recursively download only up to 2 levels of subdirectories.
  4. --no-parent: Do not follow links to the parent directory.
  5. --convert-links: Convert links in the documents to be relative, so they can be correctly viewed locally.
  6. --html-extension: Give saved files an .html extension, even if they don't end in .html.
  7. --no-host: Don't include the domain name in the saved file names.
  8. --no-clobber: Don't overwrite any files that already exist.
  9. -N: Save files in the current date format to avoid overwriting existing files.
  10. -c: Continue downloading a partially-downloaded file.
  11. --accept-regex "/documentation/": Only download files that match the regular expression /documentation/, effectively downloading only the documentation section.
  12. --wait=1: Wait 1 second between downloading each file to avoid overwhelming the server.
  13. --page-requisites: Download all the necessary files for the initially retrieved HTML page, including images, CSS, scripts, etc.
  14. https://www.excessiveplus.net/documentation: The URL of the website's documentation section to download.

In summary, this code downloads the documentation section of the website https://www.excessiveplus.net/documentation and its dependencies, using a recursive approach to download only the required files.