Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOI: Better handling of pages with many DOIs #3311

Open
dstillman opened this issue Jun 4, 2024 · 8 comments
Open

DOI: Better handling of pages with many DOIs #3311

dstillman opened this issue Jun 4, 2024 · 8 comments
Assignees

Comments

@dstillman
Copy link
Member

dstillman commented Jun 4, 2024

https://forums.zotero.org/discussion/114892/zotero-connector-not-saving

https://journals.ametsoc.org/view/journals/phoc/53/1/JPO-D-22-0001.1.xml

The translator finds 84 DOIs on this page, and then proceeds to download data for all of them in rapid succession, and then one returns a 404 and that seems to fail the whole process.

We should 1) maybe put a more reasonable limit on the number of DOIs we try to get data for, 2) check Crossref's guidance for rate-limiting and see if we're within it (though it may be hard to find guidance on content negotiation specifically rather than the API), 3) check if the connector/translate is failing incorrectly if lookup for a single DOI on the page fails (@adomasven), and 4) merge #3009 if it adds a translator for this page.

@adomasven
Copy link
Member

In my estimation this await 7c4cc22#diff-df0ac0c2a1d09b7146a41c727101b54ea5acf3f7f1dec1775adc9675bb05e361L217 broke the DOI translator such that one retrieval failure breaks translation with DOI completely, because it causes a throw there.

@dstillman
Copy link
Member Author

Yep, looks like it. So for starters we probably want to add a try/catch there that just logs the error and continues on, which seems like it was the intended behavior previously (translate.setHandler("error", function () {});).

@AbeJellinek
Copy link
Member

PR opened for the try-catch.

The translator only makes 5-10 requests per second for me. Not sure about content negotiation, but Crossref's current API rate limit (X-Rate-Limit-Limit header on api.crossref.org) is 50/second, which we probably realistically aren't going to hit if we keep our requests sequential. They say that they send 429s when you exceed the rate limit. It looks like we do handle those in the Connector Zotero.HTTP (but not in the client Zotero.HTTP - we only retry for 5xx errors there).

@AbeJellinek
Copy link
Member

I'll finish up the PubFactory translator and get that merged.

@AbeJellinek
Copy link
Member

Merged #3009. (EM already handled PubFactory article pages fine, though - we barely make any adjustments in the PubFactory translator. Not sure why the person who reported this on the forums was only getting DOI detection.)

@dstillman
Copy link
Member Author

For https://journals.ametsoc.org/view/journals/phoc/53/1/JPO-D-22-0001.1.xml ? I'm not getting EM there.

Works now with PubFactory, though.

@dstillman
Copy link
Member Author

Oh wait, maybe this is MV3 breakage. I'm not getting EM in Chrome MV3. Shows up in Firefox and Edge.

@dstillman
Copy link
Member Author

Firefox:

(4)(+0000309): Translate: Parsing code for Embedded Metadata (951c027d-74ac-47d4-a107-9c3069ab7b48, 2024-03-27 20:15:00)

(3)(+0000000): Translate: Prefix 'og' => 'http://ogp.me/ns#'

(3)(+0000001): Translate: Prefix 'fb' => 'http://ogp.me/ns/fb#'

(3)(+0000000): Translate: Prefix 'article' => 'http://ogp.me/ns/article#'

(3)(+0000000): Translate: Embedded Metadata: found 121 meta tags.

Chrome MV3:

(4)(+0000006): Translate: Parsing code for Embedded Metadata (951c027d-74ac-47d4-a107-9c3069ab7b48, 2024-03-27 20:15:00)

(3)(+0000000): Translate: Prefix 'og' => 'http://ogp.me/ns#'

(3)(+0000001): Translate: Prefix 'fb' => 'http://ogp.me/ns/fb#'

(3)(+0000000): Translate: Prefix 'article' => 'http://ogp.me/ns/article#'

(3)(+0000000): Translate: Embedded Metadata: found 0 meta tags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants