Analytics | calculations for life | movie database | Search

data collection

Crime reports

This code defines a function scrapeAlert that fetches and saves data from a website based on a given ID, and exports it for use elsewhere.

https://www.amazon.com/gp/yourstore/iyr/ref=pd_ys_iyr_nextie=UTF8&collection=watched&iyrGroup=&maxItem=616&minItem=600

meta search all

The searchAll function is a main exported function that retrieves search results from multiple search engines in parallel using the multiCrawl function, and saves the results to a JSON file in the user's Collections/searches directory. The function takes an optional query parameter and returns a promise that resolves to an object containing search results, with the file name constructed from the query string and the current date.

schedule search all

The code imports necessary modules, defines an options object, and exports a scheduleSearch function that creates a new event on a Google Calendar with a customizable search query. The scheduleSearch function checks for authentication, creates a new event, and returns a Promise that can be resolved with the event's details.

tell joke

The getJoke function imports required modules and makes a GET request to a web page to retrieve a list of jokes, extracting the questions and answers using regular expressions. It then returns a random joke from the list, or resolves with the existing joke data if it has already been loaded.

multi crawl

The code imports necessary modules, defines constants for timeouts and connections, and implements two key functions: deQueue for recursively dequeuing tasks from an input queue and multiCrawl for parallel crawling using Selenium connections. The multiCrawl function uses deQueue to crawl through an input list and returns a promise with the crawl results.

crawl domain

crawlRecursive(url, depth, searches)**:

The crawlRecursive function is a recursive web crawler that starts at a specified initial URL, iteratively retrieves links from the crawled pages, and stores them in a cache, with the ability to manage recursion depth and cache updates. The function uses a series of steps, including crawling, cache management, link extraction, recursion, and termination, to fetch and process links from the web pages.

domain cache tools

This code snippet appears to be a Node.js module that handles caching web pages, importing various modules, and defining functions to cache and retrieve data based on URLs. The functions include caching file creation, searching for existing caches, checking cache validity, and storing cache data in files, with various options for cache restraint and URL sanitization.

browser crawler tools

This code snippet relies on the puppeteer library and internal modules to extract information from web pages, including style URLs, links, and HTML content. It also includes utility functions to calculate expiration dates based on Cache-Control headers and extract URLs from CSS content using regular expressions.

analyze cache file

The analyzeCache function analyzes the cache file for a given URL, extracting statistics such as the number of cache objects, distinct domains, and repeated URLs. It returns an object with various statistics, including the count of pages, caches, and domains, as well as the URLs for the 10 largest objects and repeated URLs.

schedule crawl domain

This JavaScript code imports the Google Calendar API and defines an options object with a calendar ID. It also exports a scheduleSearch function that takes a search parameter and schedules a new event on the specified calendar, using OAuth authentication if it is defined in the options object.

collect all bookmarks

This Node.js script uses various custom modules to scrape websites, save PDFs and screenshots, and collect bookmarks from Google Takeout, with error handling and logging in place.

search results as json

The searchResultsToJson(url) function extracts search results from a given URL and returns them in JSON format, containing the URL, query, and results. It logs the URL and session ID, sends a request, extracts the search query and results, maps them to a desired format, and catches any errors that occur during the process.