Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seems like the rename from class_obj to dataset_type wasn't complete #3369

Closed
wants to merge 4 commits into from

Conversation

merelcht
Copy link
Member

Description

While working with datasets and using an environment where I had the kedro develop version installed I ran into an issue: "UnboundLocalError: cannot access local variable 'class_obj' where it is not associated with a value"

I think this happened because the rename done from class_obj to dataset_type in #3272 wasn't completely finished.

Development notes

Renamed all occurrences of class_obj in parse_dataset_definition to dataset_type.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

@merelcht merelcht self-assigned this Nov 30, 2023
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a case that I can reproduce the error? I am surprised no tests catch the error.

This may be a bug for an edge case, class_obj was refactor into two different variables class_obj and dataset_type, so it's not just a rename. At some point class_obj is just str but then get recast as an object, so the refactoring keep them separately to make this clear.

I am probably missing an if somewhere but I can't see it clearly. It should be an easy fix if I have an example.

@merelcht
Copy link
Member Author

merelcht commented Dec 4, 2023

Do you have a case that I can reproduce the error? I am surprised no tests catch the error.

This may be a bug for an edge case, class_obj was refactor into two different variables class_obj and dataset_type, so it's not just a rename. At some point class_obj is just str but then get recast as an object, so the refactoring keep them separately to make this clear.

I am probably missing an if somewhere but I can't see it clearly. It should be an easy fix if I have an example.

@noklam I found the issue when I was running the dataset tests against the develop branch.

@noklam
Copy link
Contributor

noklam commented Dec 7, 2023

For the record, I can reproduce the error on kedro-datasets test

pytest tests/partitions/test_partitioned_dataset.py

It appears that from the test we expect type support both str or Python object, which I am not so sure about (the docstring said type should be the name of the class).

I found this from PartitionDataSet , which is more explicit about what type can be.

dataset: Underlying dataset definition. This is used to instantiate
the dataset for each file located inside the path.
Accepted formats are:
a) object of a class that inherits from AbstractDataset
b) a string representing a fully qualified class name to such class
c) a dictionary with type key pointing to a string from b),

However, currently we actually testing d), which is:

a dictionary with ``type`` key pointing to a class from a)

Is this an accident that we allow non-string type? We can either:

  1. Make sure type must be str, we shouldn’t support type: PythonClass
  2. Support both (Current behavior), but then we should update the documentation and add tests explicitly for this.
    Which one do you prefer?

@noklam noklam mentioned this pull request Dec 7, 2023
7 tasks
@noklam
Copy link
Contributor

noklam commented Dec 8, 2023

Close in favor of #3400

@merelcht
Copy link
Member Author

Closing in favour of #3400

@merelcht merelcht closed this Dec 11, 2023
@merelcht merelcht deleted the fix-incomplete-rename branch January 4, 2024 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants