Domain Collector

This script allows you to extract unique domains visited when browsing a given URL in a browser. It uses Playwright to open a browser, allows user interaction, and then saves the visited domains to a file.

Русская документация

Features

Opens a URL in a browser.
Allows user interaction with the browser.
Extracts all unique domains visited during the browsing session.
Saves the list of unique domains to a text file.

Usage

To use the script, you need to have uv installed.

Install uv:

Follow the instructions for your system at: https://docs.astral.sh/uv/getting-started/installation/
Install Playwright and Chromium:
```
uvx playwright install chromium
```
Run the script:
```
uvx domain-collector <URL>
```
Replace <URL> with the URL you want to open in the browser. For example:
```
uvx domain-collector https://ya.ru
```
You can also provide a URL without the scheme (e.g., ya.ru), and the script will automatically add https://.
Interact with the browser:

The script will open a browser window. You can interact with the page as you normally would.
Close the browser:

After you are done browsing, press Enter in the terminal to close the browser and save the domains.
Output:

The script will save the unique domains to a file named <domain>_domains.txt (e.g., ya_ru_domains.txt) in the same directory where you ran the script. If the file already exists, new domains will be added to the existing list, avoiding duplicates.

Example

uvx domain-collector https://www.wikipedia.org

This will open the Wikipedia homepage in a browser. After you interact with the page and close the browser, the script will save the visited domains to a file named wikipedia_org_domains.txt.

Cleanup

To remove all artifacts, you can use the following commands:

uvx playwright uninstall --all
uv cache clean # Use with caution, this will remove all uv cache

Dependencies

Python 3.7+
Playwright
argparse

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
docs		docs
src/domain_collector		src/domain_collector
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Domain Collector

Features

Usage

Example

Cleanup

Dependencies

License

About

Releases

Packages

Languages

License

neiromaster/domain-collector

Folders and files

Latest commit

History

Repository files navigation

Domain Collector

Features

Usage

Example

Cleanup

Dependencies

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages