This repository contains two scripts for downloading and extracting content from publicly accessible Issuu pages. These scripts are intended for educational purposes only. Please ensure you comply with Issuu's terms of service when using these tools.
downloader.py
– A Python script to download an Issuu document as a PDF by fetching individual pages and converting them into a single PDF.url-scraper.js
– A JavaScript snippet to be executed in the browser console to extract all project URLs from an Issuu user's page.
The Python script downloads pages from an Issuu document (provided via a URL) as images and converts them into a single PDF file. The tool supports both individual URLs and batch processing from a file.
- Python 3.x
requests
libraryimg2pdf
libraryre
andos
(built-in modules)
You can install the necessary dependencies by running:
pip install requests img2pdf
-
Direct URL input:
- When prompted, input an Issuu document URL in the format:
https://issuu.com/username/docs/document_id
- When prompted, input an Issuu document URL in the format:
-
Batch processing:
- If you have a file containing multiple Issuu URLs (one per line), you can input the file path when prompted.
-
The downloaded PDF will be saved in the current directory with the name based on the document ID.
Enter the Issuu PDF URL (or press Enter to input a file path): https://issuu.com/someuser/docs/document_id
The output file will be saved as document_id.pdf
.
- The script first fetches the document's JSON data.
- It then extracts the image URLs of the document pages.
- The images are downloaded and combined into a single PDF.
- Temporary images are cleaned up after conversion.
This JavaScript snippet can be used to extract all publication URLs from an Issuu user's page. It needs to be run in the browser's console on an Issuu user profile page.
- Open the developer console (press
F12
orCtrl + Shift + I
in most browsers). - Navigate to the Issuu user's page (e.g.,
https://issuu.com/jotunpaintsarabia
). - Paste the contents of
url-scraper.js
into the console and press Enter. - The console will output the URLs of all publications found on the page.
/* JavaScript Code for Console */
// Select all publication links
let links = document.querySelectorAll('div[data-testid="publication-card"] a');
// Loop through each link and log the href attribute
links.forEach((link, index) => {
console.log(`Link ${index + 1}: ${link.href}`);
});
- Educational Use Only: These tools are intended for educational purposes. Always respect the terms of service of Issuu and other websites.
- Responsibility: The repository author is not responsible for any misuse of these tools.
This project is licensed under the MIT License - see the LICENSE file for details.