Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle plugin registration failure ContextualVersionConflict with log instead of raising error #1542

Conversation

noklam
Copy link
Contributor

@noklam noklam commented May 16, 2022

Signed-off-by: noklam [email protected]

Description

Fix #1487

Currently, when using kedro with plugins, the conflict version between libraries will stop kedro completely to load. The more ideal way is to ignore the plugins and throw warnings.

These errors can at least happen in 3 different ways.

  1. Conflict library version specified in kedro itself. This could happen if you install kedro, then downgrade pip. In theory, if you use Python API it will run successfully, but anything involving the entry points will fail. The main reason is that pkg_resource always validates the dependencies, and there seems to be no easy way to fix it.
  2. Conflict library version coming from kedro-xxxxx plugins. This is the direct cause of the creation of this PR, the less intrusive way to deal with it is just to log the error instead of terminating the entire program. I choose to use a higher level of LOG level as for certain plugins, it could mean the run is actually no longer valid. (i.e. kedro-mlflow or any plugin that actually has additional behavior for a kedro run.
  3. Load entry points when we want to call the __main__ of a kedro package programmatically.

Development notes

  1. This PR replaces the usage of pkg_resource with importlib-xxx for a few reasons.
    1.1 The pkg_resource is an old library and maybe deprecated soon, even the official website of pkg_resource suggest using importlib-xxx alternatives whenever possible. pluggy and pytest replaced pkg_resource with importlib too.
    1.2. Improve import speed - the pkg_resource import lots of thing at the top-level, which slow down the CLI. #1476
    1.3. Reduce the chance that the ContextualVersionConflict types of error. This is not solved unfortunately, it seems that this validation still happens if one of kedro's dependencies uses pkg_resource. dynaconf is one of them, and importing dynaconf will trigger the same error.
  2. When the plugin registry fails, kedro will log them instead of terminate the program.
  3. Move setuptools to pyproject.toml (PEP-518, not really related to the PR but found out this when studying how the python build system works.)
  4. micropkg is still using pkg_resource for requirement parsing, it is not handled now since it is out of scope.

Compatibility issue:

https://bugs.python.org/issue44246
Unfortunately the entry_points() API is not consistent between Python version, importlib.metadata is only available with python>=3.8. I spend quite a while to figure this out and make all test passed. For now, the backward compatible importlib_metadata seems to be a better option, but it is not with the standard lib importlib.

This is how it looks if the plugin's registration fails. For example I explicitly downgrade plotly so kedro-viz is not registered successfully.
image

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

@noklam noklam changed the title ContextualVersionConflict Error stop kedro to load Replacing pkg_resources with importlib_metadata for entrypoints discovery May 18, 2022
@noklam noklam changed the title Replacing pkg_resources with importlib_metadata for entrypoints discovery Handle plugin registration failure with gentle logging instead of raising error May 18, 2022
@noklam noklam changed the title Handle plugin registration failure with gentle logging instead of raising error Handle plugin registration failure ContextualVersionConflict with log instead of raising error May 19, 2022
@noklam noklam marked this pull request as ready for review May 19, 2022 12:56
@noklam noklam requested a review from idanov as a code owner May 19, 2022 12:56
@noklam noklam requested review from merelcht and antonymilne May 19, 2022 12:56
Comment on lines 336 to 345
except Exception as exc:
raise KedroCliError(f"Loading {name} commands from {entry_point}") from exc
except Exception as exc: # pylint: disable=broad-except
logger.warning(
KedroCliError(f"Fail to load {name} commands from {entry_point}")
)
logger.warning(exc)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most important part. It's hard to determine from the framework side whether a kedro's run is still valid if plugins fails. kedro-viz is defintely ok to just ignore it, but for kedro-mlflow type of plugin that actually modify the run behavior, it may be problematic so I am using warning level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar suggestion here to the other occurrence in cli.py.

pyproject.toml Outdated
Comment on lines 1 to 5
# PEP-518 https://peps.python.org/pep-0518/
[build-system]
# Minimum requirements for the build system to execute.
requires = ["setuptools>=38.0", "wheel"] # PEP 508 specifications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I just come across when I work on this PR, no related to the PR particularly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've got "setuptools>=38.0" and wheel requirements in various other places in the repo (just do a Ctrl+F to see them). Does this mean we can remove them from there?

If yes then that would be nice but a bit of a bigger change, so probably worth doing in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, created a new PR and will revert this soon.

@@ -299,37 +299,44 @@ def test_project_groups(self, entry_points, entry_point):
entry_point.load.return_value = "groups"
groups = load_entry_points("project")
assert groups == ["groups"]
entry_points.assert_called_once_with(group="kedro.project_commands")
entry_points.return_value.select.assert_called_once_with(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the test, it's mainly just catching the log instead of catching the error. Due to the entry_points API, I have to mock 2 layers which makes the assertion a bit longer.

@noklam
Copy link
Contributor Author

noklam commented May 19, 2022

I still have to make it work for py37... but feel free to drop comments.

noklam and others added 4 commits May 19, 2022 14:57
Signed-off-by: noklam <[email protected]>
Signed-off-by: noklam <[email protected]>
…ct-error-stop-kedro-running-when-dependency-clashes' into fix/1487-contextualversionconflict-error-stop-kedro-running-when-dependency-clashes

Signed-off-by: noklam <[email protected]>
@noklam noklam requested a review from SajidAlamQB May 19, 2022 15:05
Copy link
Contributor

@antonymilne antonymilne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Another ticket which turned out to be much harder than expected, great job tackling it ⭐

I think a more conventional way to deal with a backported package would be

  • try to import importlib.metadata and only if that fails import importlib_metadata
  • add python_version < '3.8' to importlib_metadata in requirements so that it's only installed if necessary

See, for example, https://github.com/pallets/click/pull/1890/files.

But I don't know if this is possible here because of the API change you mention 😬

@@ -65,10 +68,9 @@ def info():
plugin_versions = {}
plugin_entry_points = defaultdict(set)
for plugin_entry_point, group in ENTRY_POINT_GROUPS.items():
for entry_point in pkg_resources.iter_entry_points(group=group):
for entry_point in importlib_metadata.entry_points().select(group=group):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this also raise some kind of exception? Or is it just the entry_point.load that we should wrap in the try?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I guess this is just kedro info rather than being triggered on every single kedro command like the other instances so it doesn't matter if there's uncaught exceptions anyway.

Copy link
Contributor Author

@noklam noklam May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed the solution I go with initially, but later I find out the API was not consistent. So technically only Python 3.10 can use the stdlib.

Originally I have this conditional import block in kedro.utils, otherwise I have to do this conditionally everywhere. Do you think it is the right place to do so? 3c12bcc

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could fail, but like you said it should be just kedro info, unlike other plugin where the program still run without loading it. I think it is ok to leave it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Urghh, what a pain. I think what you did in 3c12bcc was ok, although probably it's more usual to just repeat the conditional import in multiple files.

BUT if the standard library API is only right in Python 3.10 then let's just forget about it and go for importlib_metadata all the way like you're doing now 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have considered that, but it also means we have to copy the same block in multiple files, and also anywhere that use mocker.patch, which is quite hard to read.

For the context, I started this PR with importlib_metadata, so I wasn't aware of the inconsistent API between the Python version. In theory, if I avoid using the select API, it could be compatible with python3.8-3.10. But for both case, we still need to have the conditional import, so I would rather just stick with importlib_metadata, unless this extra dependencies is causing trouble.

Copy link
Contributor

@antonymilne antonymilne May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that totally makes sense. I'm definitely happy to go with importlib_metadata throughout. It's not worth the extra complexity doing it with standard library only for Python 3.10.

Comment on lines 108 to 109
logger.warning(KedroCliError(f"Fail to initialize {entry_point}"))
logger.warning(exc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.warning(KedroCliError(f"Fail to initialize {entry_point}"))
logger.warning(exc)
logger.warning(f"Failed to initialise %s. Full exception: %s", entry_point, exc)

Unless there's as good reason to use the KedroCliError still?

Comment on lines 336 to 345
except Exception as exc:
raise KedroCliError(f"Loading {name} commands from {entry_point}") from exc
except Exception as exc: # pylint: disable=broad-except
logger.warning(
KedroCliError(f"Fail to load {name} commands from {entry_point}")
)
logger.warning(exc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar suggestion here to the other occurrence in cli.py.

@@ -98,12 +100,13 @@ def docs():

def _init_plugins():
group = ENTRY_POINT_GROUPS["init"]
for entry_point in pkg_resources.iter_entry_points(group=group):
for entry_point in importlib_metadata.entry_points().select(group=group):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to refactor this function a bit so it uses load_entry_points in utils.py? At a glance they look very similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original test for _init_plugins is actually incorrectly testing the load exception, it should be testing the error during hook initialisation instead.

init_hook = entry_point.load() # previously the the error was triggered here
init_hook()  # this is what we want to test instead

entry_point.module_name = "bob.fred"

result = CliRunner().invoke(cli, ["info"])
print(result.output)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(result.output)

pyproject.toml Outdated
Comment on lines 1 to 5
# PEP-518 https://peps.python.org/pep-0518/
[build-system]
# Minimum requirements for the build system to execute.
requires = ["setuptools>=38.0", "wheel"] # PEP 508 specifications.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've got "setuptools>=38.0" and wheel requirements in various other places in the repo (just do a Ctrl+F to see them). Does this mean we can remove them from there?

If yes then that would be nice but a bit of a bigger change, so probably worth doing in a separate PR.

@noklam
Copy link
Contributor Author

noklam commented May 19, 2022

Couples notes for the reviews:

  • refactor the entry_points loop for load_entry_points() and _init_plugins()
  • Revert PEP-518 changes and do it in a separate PR instead.
  • Consider just log the error instead of using KedroCLIError

@noklam noklam requested a review from antonymilne May 20, 2022 11:31
@noklam noklam removed the request for review from idanov May 20, 2022 12:11
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! ⭐ Thanks for the thorough description on the PR, that made it very easy to understand the changes made.
I'm happy to see the move to importlib_metadata now pkg_resources doesn't seem to be the proper standard anymore.

Copy link
Contributor

@antonymilne antonymilne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great work! Just one small comment. And also don't forget to add to RELEASE.md.

init_hook()
except Exception as exc:
raise KedroCliError(f"Initializing {entry_point}") from exc
except Exception as exc: # pylint: disable=broad-except
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you've made it clearer exactly what raises the exception here, I actually think we should remove the try/except altogether and just do a plain

for init_hook in init_hooks:
    init_hooks()

Reasoning:

  • exceptions raised by entry_point.load() are now handled by load_entry_points
  • exceptions raised by init_hook() should still be raised, not hidden and logged, and there's not much point wrapping them in KedroCliError like we used to. This is analogous to what happens e.g. if a plugin with a hook raises an exception - it doesn't get caught and logged or converted into a kedro error message

All this is kind of minor because realistically no one uses init_hooks and it's probably redundant now we have before_command_run anyway... But if it's not too annoying to chance I think this would be a small improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I think it is actually wrong to catch the error, error should be raised!

requirements.txt Outdated Show resolved Hide resolved
noklam added 3 commits May 20, 2022 17:11
…ct-error-stop-kedro-running-when-dependency-clashes' into fix/1487-contextualversionconflict-error-stop-kedro-running-when-dependency-clashes

Signed-off-by: noklam <[email protected]>
Signed-off-by: noklam <[email protected]>
@noklam noklam merged commit 8b357c9 into main May 20, 2022
@noklam noklam deleted the fix/1487-contextualversionconflict-error-stop-kedro-running-when-dependency-clashes branch May 20, 2022 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ContextualVersionConflict Error Stop Kedro running when dependency clashes
3 participants