Chrome Extensions Archive: No updates since Feb 4. 2019

In maintenance: disk is full ! (2 To)

The goal is to provide a complete archive of the chrome web store with version history.

You can see the current status of what's archived and download the files here: dam.io/chrome-extensions-archive/

Installing the extensions

To install an extension, go to chrome://extensions/ and drop the file.

To avoid the auto-update, load it as an unpacked extension

Files are named as .zip but they are the exact same .crx stored on the store.

Running the scripts

scripts are python 3.5+ only

Install dependencies: pip3 install -r req.txt

Create some folders and initialize some files:

mkdir data
mkdir crawled
mkdir crawled/sitemap
mkdir crawled/pages
mkdir crawled/crx
mkdir crawled/tmp
mkdir ../site
mkdir ../site/chrome-extensions-archive
mkdir ../site/chrome-extensions-archive/ext
echo "{}" > data/not_in_sitemap.json

Crawling:

crawl_sitemap.py: gets you the list of all the extensions in data/sitemap.json
crawl_crx.py: use data/sitemap.json to download the crx

Site & stats:

scan_pages_history_to_big_list.py: makes data/PAGES.json by scanning the pages you crawled
crx_stats.py: makes data/crx_stats.json (what's currently stored)
make_site.py: use data/crx_stats.json + data/PAGES.json to generate the site
make_json_site.py: data/crx_stats.json + data/PAGES.json to generate JSON

Then I serve the files directly with nginx (see nginx.conf file for example)

Helping out

I have a few things in mind for the future:

diff of extensions versions as a web interface
malware/adware analysis
running an alternative web store (better search, firefox support,...)

Don't hesitate to reach out (here on issues, [email protected] or @dam_io on twitter)

To propose changes, just do a PR.

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
extstats		extstats
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
TODO		TODO
crawl_crx.py		crawl_crx.py
crawl_infos.py		crawl_infos.py
crawl_sitemap.py		crawl_sitemap.py
cron.fish		cron.fish
crx_stats.py		crx_stats.py
extract_all.py		extract_all.py
make_site.py		make_site.py
removal_requests.py		removal_requests.py
req.txt		req.txt
scan_pages_history_to_big_list.py		scan_pages_history_to_big_list.py
source_server.py		source_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chrome Extensions Archive: No updates since Feb 4. 2019

Installing the extensions

Running the scripts

Helping out

About

Releases

Packages

Contributors 4

Languages

License

mdamien/chrome-extensions-archive

Folders and files

Latest commit

History

Repository files navigation

Chrome Extensions Archive: No updates since Feb 4. 2019

Installing the extensions

Running the scripts

Helping out

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages