Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added display_crawl_results #20

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Added display_crawl_results #20

wants to merge 9 commits into from

Commits on Apr 9, 2021

  1. Removed collect_content from PySparkS3Dataset

    Downloading files via the SparkContext was much slower than
    downloading via boto (which is what S3Dataset does.
    So now both classes use the same method, as PySparkS3Dataset
    inherits from S3Dataset
    Stefan Zabka committed Apr 9, 2021
    Configuration menu
    Copy the full SHA
    33bb9a2 View commit details
    Browse the repository at this point in the history
  2. Added mode parameter to PySparkS3Dataset

    This parameter allows for filtering out VisitIds that are part of
    `incompleted_visits` or that had a command with a command_status other than
    "ok" since users probably shouldn't consider them for analysis
    
    This filtering functionality is extracted into the TableFilter class to
    be reused by other Datasets.
    vringar authored and Stefan Zabka committed Apr 9, 2021
    Configuration menu
    Copy the full SHA
    cb8a25f View commit details
    Browse the repository at this point in the history
  3. Added display_crawl_results

    vringar authored and Stefan Zabka committed Apr 9, 2021
    Configuration menu
    Copy the full SHA
    c43cee8 View commit details
    Browse the repository at this point in the history
  4. Rewrote crawlhistory.py

    vringar authored and Stefan Zabka committed Apr 9, 2021
    Configuration menu
    Copy the full SHA
    5925ac9 View commit details
    Browse the repository at this point in the history
  5. Used typeannotations

    vringar authored and Stefan Zabka committed Apr 9, 2021
    Configuration menu
    Copy the full SHA
    5639239 View commit details
    Browse the repository at this point in the history
  6. Fixing display_crawl_history

    Stefan Zabka committed Apr 9, 2021
    Configuration menu
    Copy the full SHA
    2312e0e View commit details
    Browse the repository at this point in the history
  7. Added docstrings

    Stefan Zabka committed Apr 9, 2021
    Configuration menu
    Copy the full SHA
    3deea98 View commit details
    Browse the repository at this point in the history
  8. Added demo file

    Stefan Zabka committed Apr 9, 2021
    Configuration menu
    Copy the full SHA
    cb19511 View commit details
    Browse the repository at this point in the history
  9. Backporting from next

    Stefan Zabka committed Apr 9, 2021
    Configuration menu
    Copy the full SHA
    247adea View commit details
    Browse the repository at this point in the history