Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/csckan-311 - add custom flags for full and label imports #316

Closed
wants to merge 1 commit into from

Conversation

D-GopalKrishna
Copy link
Contributor

@D-GopalKrishna D-GopalKrishna commented Oct 10, 2024

Jira link - SCKAN-311

Task description:

  1. Add flags for full and label imports
  2. update the neurodm script for ingest_statement with log_error() to log the error in terminal.
    a. also - the logic uses - default configurations if flags are not provided manually.
  3. add a missing migration
  4. update the doc for the same

Confirmed - that the tests are not affected from the new changes here - test_neurondm_processing.py file.

@ddelpiano - do we need to update the test?

@D-GopalKrishna
Copy link
Contributor Author

Upon inspection, I found the following behavior. Please let me know @ddelpiano if I should look into this. thanks!

2024-10-10.20-38-31.mp4


def handle(self, *args, **options):
update_upstream = options['update_upstream']
update_anatomical_entities = options['update_anatomical_entities']
should_overwrite = options['overwrite']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: I assume this is a left over of other task. We probably should remove it from this task since it's making the command fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @afonsobspinto !

README.md Outdated
```
here - `apinat-partial-orders` and `apinat-pops-more sparc-nlp` are the full imports and `apinatomy-neuron-populations` and `../../npo` are the label imports.

NOTE: full imports are done from `https://raw.githubusercontent.com/SciCrunch/NIF-Ontology/neurons/ttl/generated/neurons/*.ttl` and label imports are done from `https://raw.githubusercontent.com/SciCrunch/NIF-Ontology/neurons/ttl/generated/neurons/*.ttl`, and if you want a different path, you can use it relatively like - `../../npo`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@D-GopalKrishna please remove this, it is not our responsibility how the content gets passed and this will change soon once neurondm will pull data from Interlex, I understand we digged into how the tool works but we are not mainting that and this can be misleading in the future. Thanks.

Copy link
Member

@afonsobspinto afonsobspinto Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully agree with the above. We are not maintaining the data but we are the ones that define where we are fetching the data from. If in the future this changes, we will need to change our script.

Now it's true that the part I highlighted was originally provided by them, but at this moment it's code we deliver and maintain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @afonsobspinto , the intent behind having this in README was also to have a form of documentation for ourselves for the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afonsobspinto are we defining where we are fetching the data? or we are defining which labels we have to use for the imports? imho these are 2 different things, the fact neurondm download the ttls internally it's a neurondm behaviour which I don't understand why we need to be aware of, if this code will be refactored later on and data are imported and filtered by label in a different way, as long our code works that is not our problem.

Manually download files and upload them in the same folders (one of @D-GopalKrishna previous proposals if I understood correctly) is hijacking the internals of a library which we do not own and we should not worry what it does.

Copy link
Member

@afonsobspinto afonsobspinto Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ddelpiano, I’ve linked the code that shows how we define the data source for neurondm to use:
we are the ones that define where we are fetching the data from

Could you take a look and let me know if that clarifies things?

@@ -14,8 +14,8 @@
logger_service = LoggerService()


def ingest_statements(update_upstream=False, update_anatomical_entities=False):
statements_list = get_statements_from_neurondm(logger_service_param=logger_service)
def ingest_statements(update_upstream=False, update_anatomical_entities=False, should_overwrite=True, full_imports=[], label_imports=[]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: We should not use mutable objects as default arguments.

It's better to do:

def ingest_statements(update_upstream=False, update_anatomical_entities=False, should_overwrite=True, full_imports=None, label_imports=None):
    if full_imports is None:
        full_imports = []
    if label_imports is None:
        label_imports = []

Here's why it's a problem

def append_to_list(value, my_list=[]):
    my_list.append(value)
    return my_list

# First call - works as expected
print(append_to_list(1))  # Output: [1]

# Second call - unexpected behavior
print(append_to_list(2))  # Output: [1, 2]

# Third call - still using the same list!
print(append_to_list(3))  # Output: [1, 2, 3]

@afonsobspinto
Copy link
Member

Upon inspection, I found the following behavior. Please let me know @ddelpiano if I should look into this. thanks!

2024-10-10.20-38-31.mp4

I don't fully understand the very first problem. Is it only happening with a specific .ttf file?
For the exception raised it sounds correct to me, if the original content is not complete (doesn't have a partial order) we can't create our statements. It shouldn't crash the ingestion; it should log the problem and continue without creating that specific statement

@D-GopalKrishna
Copy link
Contributor Author

D-GopalKrishna commented Oct 14, 2024

Upon inspection, I found the following behavior. Please let me know @ddelpiano if I should look into this. thanks!
2024-10-10.20-38-31.mp4

I don't fully understand the very first problem. Is it only happening with a specific .ttf file? For the exception raised it sounds correct to me, if the original content is not complete (doesn't have a partial order) we can't create our statements. It shouldn't crash the ingestion; it should log the problem and continue without creating that specific statement.

I am not fully aware of the concept of partial order. But it does do - It shouldn't crash the ingestion; it should log the problem and continue without creating that specific statement for invalid inputs or when things don't work.

As in the video - it doesn't ingest (or doesn't recognizes) the statements for when I use - "apinat-partial-orders" alone without any other full-imports. But when I use it with others like - "apinat-simple-sheet", it does import (114 statements).

@afonsobspinto
Copy link
Member

I am not fully aware of the concept of partial order. But it does do - It shouldn't crash the ingestion; it should log the problem and continue without creating that specific statement for invalid inputs or when things don't work.

Partial order is a property that we expect/require the neurondm neuronal populations to have so that we can understand the path of the connectivity statements (what connects to what). If a neuronal population doesn't have partial order, we skip it. That's the expected behaviour.

As in the video - it doesn't ingest (or doesn't recognizes) the statements for when I use - "apinat-partial-orders" alone without any other full-imports. But when I use it with others like - "apinat-simple-sheet", it does import (114 statements).

Sounds to me like a problem with their ttf files, it might be worth letting them know but I don't think it's a blocker for this PR (aka the problem is likely on their side).

@ddelpiano
Copy link
Member

@ddelpiano - do we need to update the test?

@D-GopalKrishna yes please, thanks

@ddelpiano ddelpiano changed the base branch from develop to feature/SCKAN-312 October 23, 2024 09:52
@ddelpiano ddelpiano closed this Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants