Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processor options may have incorrect column options #363

Open
dale-wahl opened this issue May 18, 2023 · 2 comments
Open

Processor options may have incorrect column options #363

dale-wahl opened this issue May 18, 2023 · 2 comments

Comments

@dale-wahl
Copy link
Member

Describe the bug
Discovered in a particular dataset whose map_item returned a {} on the first item. This causes a number of issues in itself, but seemed to first strike get_item_keys which would return an empty list and cause various failures. Some very odd behavoir was noted with the processor options on this bad dataset: they returned different column options than expected. It seems probable that something fails when calling get_columns (which in this case returns a []) and instead of raising an error, the options for another dataset or perhaps call of get_options is used instead. Likely something is not being updated properly due to the [].

Interestingly, I was able to modify get_item_keys to correctly return a list of keys (by iterating map_item until a {} was not returned) and updated the view_datasets preview function to use this method. When doing that, the processor options were properly loaded from that dataset. It is my opinion that they should not since get_columns was not updated and it directly collects keys from the first item in the dataset!

@dale-wahl
Copy link
Member Author

It's some dictionary sharing across classes.

When we call get_options, we usually set options = cls.options and then check if parent_dataset and parent_dataset.get_columns(): before updating options. That is going to be False if get_columns returns an empty []. And then get_options just returns the cls.options which is whatever it was set to last.

options = {} is actually in BasicProcessor, so it's possible that all of our subclasses can access the same options. We just happen to be setting them and updating them before we use them. Not ideal and possible difficult to fix without breaking anything else. Unsure.

@dale-wahl
Copy link
Member Author

Removing options = {} from BasicProcessor not enough. Somehow options are still sharing. Interestingly, I ran a rank values processor and the column options afterwards were "date", "item", and "value". As if that dataset was being used (is get_options being called on that dataset somewhere? Perhaps to find available sub processors...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant