artvee-scraper is an easy to use library for fetching public domain artwork from Artvee.
Artvee-scraper is a web scraper which concurrently extracts artwork from Artvee. Callbacks are notified asynchronously for each scraped artwork so that user-defined actions may be taken. These actions are typically used to store the artwork, which can subsequently be used for display, machine learning, or other applications.
If you are seeking a command line utility, please note that it has been relocated to a separate project - artvee-scraper-cli. Alternatively, you may still use artvee-scraper 3.0.1.
Using PyPI
$ python -m pip install artvee-scraper
Python 3.10+ is officially supported.
- Create callbacks (lambda, function, method).
# Use a lambda to log the event log_event = lambda artwork, thrown: logger.info( "Processing '%s' by %s", artwork.title, artwork.artist ) # Write the artwork to a file as JSON format def on_artwork_received(artwork: Artwork, thrown: Exception | None = None) -> None: if thrown is None: with open(f"/tmp/{artwork.resource}.json", "w", encoding="UTF-8") as fout: json.dump(artwork.to_dict(), fout, ensure_ascii=False)
- Initialize the scraper.
scraper = ArtveeScraper() # scrapes all categories by default
- Register callbacks. The callbacks will be notified asynchronously for each event in the order that they are registered.
scraper.register_listener(log_event).register_listener(on_artwork_received)
- Start scraping. Use either the context manager construct, or join to block until done.
Example 1 - using context manager
with scraper as s: s.start() # blocks until done
Example 2 - using join()
scraper.start() ... // do other things scraper.join() # blocks until done
Create app.py
import logging
import os
from artvee_scraper.artvee_client import CategoryType
from artvee_scraper.artwork import Artwork
from artvee_scraper.scraper import ArtveeScraper
# Set up logging configuration
logging.basicConfig(
level=logging.DEBUG,
format="%(asctime)s.%(msecs)03d %(levelname)s [%(threadName)s] %(module)s.%(funcName)s(%(lineno)d) | %(message)s",
datefmt="%Y-%m-%d %H:%M:%S"
)
logger = logging.getLogger(__name__)
def handle_event(artwork: Artwork, thrown: Exception | None = None) -> None:
"""A callback for handling the result of an artwork processing event."""
if thrown is not None:
# An error occurred; the artwork is partially populated (missing artwork.image.raw)
logger.error("Failed to process artist=%s, title=%s, url=%s; %s", artwork.artist, artwork.title, artwork.url, thrown)
else:
file_path = os.path.expanduser(f"~/Downloads/{artwork.resource}.jpg") # create a unique filename
logger.info("Writing %s to %s", artwork.title, file_path)
# Write the raw image bytes to a file.
with open(file_path, "wb") as fout:
fout.write(artwork.image.raw)
def main():
# Choose which categories to scrape. Using `list(CategoryType)` creates a list of all categories.
categories = [CategoryType.ABSTRACT, CategoryType.DRAWINGS]
# Initialize the scraper
scraper = ArtveeScraper(categories=categories)
# Register listener functions
scraper.register_listener(handle_event)
# Start scraping
with scraper as s:
s.start() # blocks until done
if __name__ == "__main__":
main()
Run app.py
me@linux-desktop:~$ python app.py
2038-01-19 19:36:36.839 DEBUG [MainThread] scraper.start(125) | Starting
2038-01-19 19:36:36.839 DEBUG [Thread-1 (_exec)] scraper._exec(152) | Executing scraper for categories [<CategoryType.ABSTRACT: 'abstract'>, <CategoryType.DRAWINGS: 'drawings'>]
2038-01-19 19:36:36.839 DEBUG [Thread-1 (_exec)] artvee_client.get_page_count(113) | Retrieving page count; category=abstract
2038-01-19 19:36:36.854 DEBUG [Thread-1 (_exec)] connectionpool._new_conn(1051) | Starting new HTTPS connection (1): artvee.com:443
2038-01-19 19:36:37.737 DEBUG [Thread-1 (_exec)] connectionpool._make_request(546) | https://artvee.com:443 "GET /c/abstract/page/1/?per_page=70 HTTP/11" 301 0
2038-01-19 19:36:37.827 DEBUG [Thread-1 (_exec)] connectionpool._make_request(546) | https://artvee.com:443 "GET /c/abstract/?per_page=70 HTTP/11" 200 19573
2038-01-19 19:36:37.955 DEBUG [Thread-1 (_exec)] scraper._exec(160) | Category abstract has 108 page(s)
2038-01-19 19:36:37.955 DEBUG [Thread-1 (_exec)] scraper._exec(166) | Processing category abstract, page (1/108)
2038-01-19 19:36:37.955 DEBUG [Thread-1 (_exec)] artvee_client.get_metadata(152) | Retrieving metadata; category=abstract, page=1
...
API documentation is available on Read the Docs.