Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #3314 duplicate code error only shows up with pylint jobs 1 #3458

Commits on Oct 27, 2020

  1. Configuration menu
    Copy the full SHA
    b3f9228 View commit details
    Browse the repository at this point in the history
  2. mapreduce| Adds map/reduce functionality to SimilarChecker

    Before adding a new mixin this proves the concept works, adding tests as
    examples of how this would work in the main linter.
    
    The idea here is that, because `check_parallel()` uses a multiprocess
    `map` function, that the natural follow on is to use a 'reduce`
    paradigm. This should demonstrate that.
    doublethefish committed Oct 27, 2020
    Configuration menu
    Copy the full SHA
    796b293 View commit details
    Browse the repository at this point in the history

Commits on Oct 28, 2020

  1. mapreduce| Fixes -jN for map/reduce Checkers (e.g. SimilarChecker)

    This integrate the map/reduce functionality into lint.check_process().
    
    We previously had `map` being invoked, here we add `reduce` support.
    
    We do this by collecting the map-data by worker and then passing it to a
    reducer function on the Checker object, if available - determined by
    whether they confirm to the `mapreduce_checker.MapReduceMixin` mixin
    interface or nor.
    
    This allows Checker objects to function across file-streams when using
    multiprocessing/-j2+. For example SimilarChecker needs to be able to
    compare data across all files.
    
    The tests, that we also add here, check that a Checker instance returns
    and reports expected data and errors, such as error-messages and stats -
    at least in a exit-ok (0) situation.
    
    On a personal note, as we are copying more data across process
    boundaries, I suspect that the memory implications of this might cause
    issues for large projects already running with -jN and duplicate code
    detection on. That said, given that it takes a long time to perform
    lints of large code bases that is an issue for the [near?] future and
    likely to be part of the performance work. Either way but let's get it
    working first and deal with memory and perforamnce considerations later
    - I say this as there are many quick wins we can make here, e.g.
    file-batching, hashing lines, data compression and so on.
    doublethefish committed Oct 28, 2020
    Configuration menu
    Copy the full SHA
    cd21a1d View commit details
    Browse the repository at this point in the history