Imagine you're a penetration tester hired by a mid-sized law firm to assess their cybersecurity. They've used the same WiFi password for years and are concerned it might be easy to guess.
While the IT department claims it's secure, they don’t enforce strict password policies. Strict password policies require passwords to be long, complex, unique, and regularly updated—using a mix of uppercase and lowercase letters, numbers, and special characters, while avoiding common words, predictable patterns, or reused passwords. Without these measures, passwords are much easier to crack.
Your job is to test how easily an attacker could break in. Instead of trying every possible password combination, which would take too long, hackers and security professionals use wordlists—collections of commonly used passwords or words people often include in their passwords. These lists help attackers find weak passwords much faster.
How Wordlists Work in Password Cracking
A wordlist is a collection of common words, names, and patterns used to generate likely passwords. Hackers and security professionals use these lists in dictionary attacks, where software systematically tests each word by hashing it and checking if it matches the stored hash.
Example Hashing Process:
Password: Justice2025
SHA-256 Hash: b61f66aecae3165af130b360cfa4152ff885269a8a11ebca17f8e50befd4dd82
When a system stores a password, it doesn’t save the actual text. Instead, it runs the password through a hashing algorithm, converting it into a fixed-length scrambled version. When a user logs in, the system hashes the entered password and compares it to the stored hash. If they match, access is granted.
How Attackers Crack Hashed Passwords
Since hashes can’t be reversed, the only way to crack them is by guessing the original password, hashing each guess, and checking for a match.
In WiFi password cracking, an attacker captures the encrypted WiFi handshake when a device connects. This handshake includes a hashed version of the WiFi password. The attacker then uses a dictionary attack, hashing each word from a wordlist and comparing it to the captured hash.
If a hash from a guessed password matches the captured hash, the attacker now has the correct password and can log in to the network.
This is why weak or common passwords are a security risk—if they appear in a wordlist, they can be cracked quickly.
Types of Wordlists
There are two types of wordlists:
General wordlists – These contain common passwords, leaked credentials, and frequently used variations. While useful for testing generic weak passwords, they are inefficient when dealing with passwords that include specific, unique words.
Targeted wordlists – These are customized based on knowledge about the target, significantly reducing the number of guesses needed. Instead of testing millions of unrelated passwords, you focus only on words relevant to the target, such as names, locations, and industry terms.
For example, if the law office’s password is "Justice2025", a generic wordlist with random words might take days or weeks to find it. But a wordlist generated from the firm's website will include "Justice" because it appears frequently, and attackers know that adding a year is a common pattern.
Last week, I covered how to crack WiFi passwords using Hashcat and Aircrack-ng. If you missed it, check out the full guide here:
Generating a Targeted Wordlist from a Website
To efficiently generate a targeted wordlist, we need to extract words that are most likely to appear in a password.
A company’s website is often a goldmine of useful terms because businesses frequently use names, locations, phone numbers, slogans, or industry-specific words in their passwords.
Instead of blindly guessing or relying on massive, generic wordlists, which can take forever to process, we can scrape the target website to build a customized wordlist. This method significantly increases the likelihood of cracking a password in less time by focusing only on words that the company is already associated with.
For example, if a law office’s website frequently mentions words like “Justice,” “Attorney,” or “Verdict,” it’s reasonable to assume these could be part of a password. If they also list their phone number or address, elements of those might be included as well (e.g., Justice2025, 555MainStreet, Verdict123).
To streamline this process, I wrote a script called ‘Wordlist Generator from URL’. It automatically scrapes a website, extracts relevant words, removes unnecessary ones, and formats them into a structured wordlist. This eliminates the need for manual wordlist creation and significantly speeds up targeted password cracking.
Password Cracking Script (Wordlist from URL)
The Wordlist Generator from URL automates extracting and refining words from a website for use in password cracking.
This script crawls web pages, extracts text, filters out common words, and prioritizes the most relevant terms. Instead of manually collecting words, the script does all the processing, generating a wordlist optimized for cracking passwords quickly.
The screen recording below demonstrates the script in action.
Here’s how it works step by step:
Crawls a Website – The script starts at a user-specified URL and follows internal links to gather text. You can set it to crawl all linked pages; limit it to a specific number of pages; or only process the given URL by setting the depth to 1.
Extracts Text – It pulls all visible words and numbers from the website.
Handles Phone Numbers and Addresses – The script recognizes and extracts numbers like phone numbers, breaking them into components (e.g., area code, prefix, and line number), which are often used in passwords.
Filters Out Stop Words (Common Words) – The script removes generic words that don’t add value to a password guess, such as "and," "the," or "this." You can customize the list to exclude specific words.
Sorts by Frequency – Words that appear frequently on the website are prioritized, since high-frequency terms are more likely to be part of a password.
Generates a Wordlist File – The final processed words are saved into a wordlist.txt file, ready for use with cracking tools like Hashcat or Aircrack-ng. You can combine this list with other wordlists or apply Hashcat rule-based attacks to modify and expand the guesses dynamically.
Instead of wasting time running a massive, generic wordlist that might not contain relevant words, this approach zeroes in on the words that actually matter. If a company is likely to use a variation of its business name, slogan, or founding year, this script will identify and prioritize those words.
For example, after running the script on a law firm’s website, the wordlist might contain:
Justice
Verdict
Attorney
Courtroom
Lawyer
MainStreet
5551234567
Verdict2025
This targeted approach dramatically reduces the number of guesses needed compared to testing millions of unrelated passwords.
Running the Script: Step-by-Step Guide
Visit GitHub (Wordlist Generator from URL) for the script information - or clone it to your machine using the code below.
Prerequisites
Python: Download and install the latest version from python.org.
Beautiful Soup and Requests: These are external packages used for parsing HTML and making HTTP requests.
Setup Instructions
Clone or Download the Repository
git clone https://github.com/dark-marc/password-cracking-wordlist-generator-from-url.git
cd password-cracking-wordlist-generator-from-url
Create a Virtual Environment
python -m venv venv
Activate the Virtual Environment
Windows:
venv\Scripts\activate
macOS/Linux:
source venv/bin/activate
Install Dependencies
pip install -r requirements.txt
How the Script Works
When the script starts, it presents an interactive admin panel that prompts you for:
Target URL: The website you want to scrape. The script automatically corrects the URL by adding "http://" or "https://" if necessary.
Stop Words: A list of common words to ignore. You can use a default list or provide your own custom, comma-separated list.
Page Limit: The number of pages to scan. You can enter a specific number or type "all" to process every linked page.
Save Options:
Confirm whether to save the results as a text file.
Specify how many of the top words to include in the output.
Enter the directory where the file "wordlist.txt" should be saved.
Data Processing
The script then proceeds through these steps:
Link Collection – It scrapes the target URL to collect all internal links on the site.
Content Extraction – It visits each collected page and extracts all text and numbers.
Phone Number Extraction – The script uses a regular expression to identify phone numbers (with or without country codes). It captures the complete phone number and also extracts parts of the number (such as the area code, prefix, and line number). This is useful because these individual components are often used as password variants in cracking tools.
Filtering – It removes words that match the stop words (either the default set or your custom list).
Counting – It counts the frequency of each remaining word using a counter and sorts the words from most to least frequent.
Output
Saves the complete, sorted wordlist as
wordlist.txt
in the specified directory, with each word on its own line.Displays a summary of the analysis in the terminal by printing the top 10 most frequent words along with their counts in JSON format.
Running the Script
Run the script using the command:
python wordlist-generator.py
Follow the on-screen prompts to generate your wordlist.
Ethical Use:
This script is intended for ethical security testing and research. Use it only on websites where you have explicit permission to test or as part of an authorized security audit. Its purpose is to help identify weak password choices and improve security—not for illegal access.
Next Steps: Using the Wordlist for Cracking
Now that you’ve generated a custom, targeted wordlist, the next step is using it in a real-world password-cracking scenario.
To see how to apply this wordlist with Hashcat and Aircrack-ng to crack WiFi passwords, check out the previous article where we go into detail on the full process:
➡ WiFi Password Cracking with Hashcat and Aircrack-ng
That guide covers everything from capturing WiFi handshakes to running dictionary attacks with your new wordlist. Happy cracking!
This was a fascinating read—always wild to see just how much weak passwords can be exploited with the right tools. It’s a good reminder that even a “strong” password isn’t always enough if it’s predictable. I’ve been telling people to use passphrases over complex strings, but I’m curious in your opinion what’s the best balance between security and practicality when it comes to Wi-Fi passwords? Love getting other cyber professionals perspectives
Good stuff and reminder ! to change passwords we tend to get lazy with that. Just as an aside a pen tester back like ten yrs ago had the list of most common password names — and for men “superman”
was right up there usually with birth month & year. lol