Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/mx 1677 orcid extractor #361

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

vyvytranngoc
Copy link
Contributor

@vyvytranngoc vyvytranngoc commented Jan 9, 2025

PR Context

connector in mex-common that takes a string or orcid ID as argument
returns orcid person
convenience function: search for person, if one is found, transform to ExtractedPerson

Added

  • Connector class for retrieving ORCID data by ID or name
  • methods for extracting data from orcid
  • methods to transform from OcidPerson to mex person
  • model class for orcid data
  • unit tests

Changes

Deprecated

Removed

Fixed

Security

@vyvytranngoc
Copy link
Contributor Author

FAILED tests/primary_source/test_extract.py::test_extract_seed_primary_sources - AssertionError: assert 5 == 4
 +  where 5 = len([SeedPrimarySource(identifier='mex', title=[Text(value='Metadata Exchange', language=<TextLanguage.EN: 'en'>)]), SeedPrimarySource(identifier='organigram', title=[Text(value='Organizational Units', language=<TextLanguage.EN: 'en'>)]), SeedPrimarySource(identifier='ldap', title=[Text(value='Active Directory', language=<TextLanguage.EN: 'en'>)]), SeedPrimarySource(identifier='wikidata', title=[Text(value='Wikidata APIs', language=<TextLanguage.EN: 'en'>)]), SeedPrimarySource(identifier='orcid', title=[Text(value='Open Researcher Contributor Identification Initiative', language=<TextLanguage.EN: 'en'>)])])

I think this occurs because I have added orcid as a primary source?

@cutoffthetop
Copy link
Contributor

cutoffthetop commented Jan 10, 2025

tests/orcid/model.yaml seems to be empty, did you have plans for that? done ✔️

@cutoffthetop
Copy link
Contributor

FAILED tests/primary_source/test_extract.py::test_extract_seed_primary_sources - AssertionError: assert 5 == 4

I think this occurs because I have added orcid as a primary source?

yep, that's it. feel free to bump the expected length up to 5

Copy link
Contributor

@cutoffthetop cutoffthetop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good work already! i found some unused bits and pieces, and i had one bigger issue with the way the orcid models are written. could you take a look at that first?

i didn't review the tests yet, but i think they might change a bit anyway once the models are updated.

mex/common/settings.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
mex/common/orcid/connector.py Outdated Show resolved Hide resolved
mex/common/orcid/connector.py Outdated Show resolved Hide resolved
mex/common/orcid/connector.py Outdated Show resolved Hide resolved
mex/common/orcid/transform.py Outdated Show resolved Hide resolved
mex/common/orcid/extract.py Outdated Show resolved Hide resolved
mex/common/orcid/transform.py Outdated Show resolved Hide resolved
mex/common/orcid/models/person.py Outdated Show resolved Hide resolved
mex/common/orcid/extract.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
mex/common/orcid/extract.py Outdated Show resolved Hide resolved
mex/common/orcid/connector.py Outdated Show resolved Hide resolved
mex/common/orcid/extract.py Outdated Show resolved Hide resolved
mex/common/orcid/transform.py Outdated Show resolved Hide resolved
mex/common/orcid/models/person.py Outdated Show resolved Hide resolved
Copy link
Contributor

@cutoffthetop cutoffthetop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's coming along! we just need to ensure proper unit test isolation

mex/common/orcid/connector.py Outdated Show resolved Hide resolved
mex/common/settings.py Outdated Show resolved Hide resolved
tests/orcid/test_connector.py Outdated Show resolved Hide resolved
tests/orcid/conftest.py Outdated Show resolved Hide resolved
@vyvytranngoc
Copy link
Contributor Author

vyvytranngoc commented Jan 30, 2025

I was running my unit tests after pulling all new changes from main. But now I encounter this error:

mex\common\types\vocabulary.py:71: in <listcomp>
    Concept.model_validate(raw_vocabulary)
E   pydantic_core._pydantic_core.ValidationError: 1 validation error for Concept
E   altLabel
E     Input should be a valid list [type=list_type, input_value={'de': 'RESTful API', 'en': 'RESTful API'}, input_type=dict]
E       For further information visit https://errors.pydantic.dev/2.9/v/list_type

@vyvytranngoc
Copy link
Contributor Author

But I fixed the requested changes and made sure I'm running unit tests and no integration tests

Copy link
Contributor

@cutoffthetop cutoffthetop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few simplifications and were good to go

Comment on lines +38 to +51
@staticmethod
def get_data_by_id(orcid_id: str) -> dict[str, Any]:
"""Retrieve data by UNIQUE ORCID ID.

Args:
orcid_id: Unique identifier in ORCID system.

Returns:
Personal data of the single matching id.
"""
orcidapi = OrcidConnector.get()
# or endpoint = f"{orcid_id}/person"
endpoint = f"{orcid_id}/record"
return dict(orcidapi.request(method="GET", endpoint=endpoint))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this have to be static?

Suggested change
@staticmethod
def get_data_by_id(orcid_id: str) -> dict[str, Any]:
"""Retrieve data by UNIQUE ORCID ID.
Args:
orcid_id: Unique identifier in ORCID system.
Returns:
Personal data of the single matching id.
"""
orcidapi = OrcidConnector.get()
# or endpoint = f"{orcid_id}/person"
endpoint = f"{orcid_id}/record"
return dict(orcidapi.request(method="GET", endpoint=endpoint))
def get_data_by_id(self, orcid_id: str) -> dict[str, Any]:
"""Retrieve data by UNIQUE ORCID ID.
Args:
orcid_id: Unique identifier in ORCID system.
Returns:
Personal data of the single matching id.
"""
endpoint = f"{orcid_id}/record"
return dict(self.request(method="GET", endpoint=endpoint))

response = self._send_request("HEAD", url=url, params={})
response.raise_for_status()

def check_orcid_id_exists(self, orcid_id: str) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is unused in mex-common, do you need it for mex-backend?


def fetch(self, filters: dict[str, Any]) -> dict[str, Any]:
"""Perform a search query against the ORCID API."""
query = OrcidConnector.build_query(filters)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
query = OrcidConnector.build_query(filters)
query = self.build_query(filters)

return dict(orcidapi.request(method="GET", endpoint=endpoint))

@staticmethod
def get_data_by_name(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, why make this static?

instead of orcidapi = OrcidConnector.get() you could just use self

Comment on lines +252 to +259

def build_query(filters: dict[str, Any]) -> str:
"""Construct the ORCID API query string."""
if "givennames" in filters:
return "givennames:Josiah AND familyname:Carberry"
return "given-names:Josiah AND family-name:Carberry"

monkeypatch.setattr(OrcidConnector, "build_query", staticmethod(build_query))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def build_query(filters: dict[str, Any]) -> str:
"""Construct the ORCID API query string."""
if "givennames" in filters:
return "givennames:Josiah AND familyname:Carberry"
return "given-names:Josiah AND family-name:Carberry"
monkeypatch.setattr(OrcidConnector, "build_query", staticmethod(build_query))

you don't need to mock this method. since it does not do any network-io, it is "unit-test-safe"

Returns:
Orcidrecord of the matching person by name.
"""
orcid_data = OrcidConnector.get_data_by_name(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you make get_data_by_name non-static, this would become:

Suggested change
orcid_data = OrcidConnector.get_data_by_name(
orcid_data = OrcidConnector.get().get_data_by_name(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants