The provided wget
command downloads a website's contents with various options to customize the download process, including speed limits, downloading linked pages, and converting file references. The command options include settings for download speed, user agent, recursive downloading, and ignoring the robots.txt file to access the website's contents.
wget --wait=2 \
--level=inf \
--limit-rate=20K \
--recursive \
--page-requisites \
--user-agent=Mozilla \
--no-parent \
--convert-links \
--adjust-extension \
--no-clobber \
-e robots=off \
https://example.com
#!/bin/bash
# Constants for WGET OPTIONS
WAIT_TIME=2 # Wait time between each download in seconds
MAX_LEVEL=inf # Maximum recursion level
LIMIT_RATE=20K # Download limit in kilobytes per second
# Function to download a website
download_website() {
local url=$1
# Set WGET OPTIONS
local wget_options="
--wait=${WAIT_TIME}
--level=${MAX_LEVEL}
--limit-rate=${LIMIT_RATE}
--recursive
--page-requisites
--user-agent=Mozilla
--no-parent
--convert-links
--adjust-extension
--no-clobber
-e robots=off
"
# Download the website
echo "Downloading website at ${url}..."
wget ${wget_options} "${url}"
echo "Download complete."
}
# Usage
if [ $# -ne 1 ]; then
echo "Usage: $0 "
exit 1
fi
download_website "$1"
wget Command Breakdown
The provided code is a wget
command used to download a website's contents. Here's a breakdown of the options used:
--wait=2
: Waits for 2 seconds between each download to avoid overwhelming the server.--level=inf
: Enables infinite recursion, downloading all linked pages.--limit-rate=20K
: Sets the download speed limit to 20 kilobytes per second.--recursive
: Enables recursive downloading of linked pages.--page-requisites
: Downloads all necessary files for the page to display correctly, including images, stylesheets, and scripts.--user-agent=Mozilla
: Specifies the user agent string used to identify the client.--no-parent
: Prevents wget from going up a directory level and downloading files from the parent directory.--convert-links
: Converts references to local files in HTML documents to make them suitable for local viewing.--adjust-extension
: Adjusts the file extension based on the MIME type.--no-clobber
: Prevents wget from overwriting existing files.-e robots=off
: Disregards the robots.txt file, which is used to control access to a website.https://example.com
: The URL of the website to be downloaded.