Fix #3314 duplicate code error only shows up with pylint jobs 1 #3458

Before adding a new mixin this proves the concept works, adding tests as examples of how this would work in the main linter. The idea here is that, because `check_parallel()` uses a multiprocess `map` function, that the natural follow on is to use a 'reduce` paradigm. This should demonstrate that.

This integrate the map/reduce functionality into lint.check_process(). We previously had `map` being invoked, here we add `reduce` support. We do this by collecting the map-data by worker and then passing it to a reducer function on the Checker object, if available - determined by whether they confirm to the `mapreduce_checker.MapReduceMixin` mixin interface or nor. This allows Checker objects to function across file-streams when using multiprocessing/-j2+. For example SimilarChecker needs to be able to compare data across all files. The tests, that we also add here, check that a Checker instance returns and reports expected data and errors, such as error-messages and stats - at least in a exit-ok (0) situation. On a personal note, as we are copying more data across process boundaries, I suspect that the memory implications of this might cause issues for large projects already running with -jN and duplicate code detection on. That said, given that it takes a long time to perform lints of large code bases that is an issue for the [near?] future and likely to be part of the performance work. Either way but let's get it working first and deal with memory and perforamnce considerations later - I say this as there are many quick wins we can make here, e.g. file-batching, hashing lines, data compression and so on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #3314 duplicate code error only shows up with pylint jobs 1 #3458

Fix #3314 duplicate code error only shows up with pylint jobs 1 #3458

Commits on Oct 27, 2020

Commits on Oct 28, 2020