Bringing Crossref, Semantic Scholar, Open Citations and Open Alex lookup + auto-import to Cita for Zotero 7 #300

thebluepotato · 2024-09-22T21:22:04Z

Hi! I adapted the (now stale) PR #139 to the new Zotero 7 branch so it has a chance to be swept up in the new release. The general logic is unchanged from the other PR, but I've made quite a few updates for efficiency, code clarity and type safety as well as fixed a few failing Promises here and there. I've tested quite a bit already, but it could definitely use more in-depth testing.

And I've also added a button to citations to auto-import that reference into Zotero with one click and then link it. It's similar to what https://github.com/MuiseDestiny/zotero-reference does, but I find that addon confusing at best and it doesn't help that all the info is in Mandarin Chinese...

All in all, probably still a WIP, but happy to receive code reviews and have some people test this!

Dominic-DallOsto · 2024-09-23T21:32:12Z

Thanks a lot for this!

It'll take me a little while to review this in detail sorry, but this is great!

thebluepotato · 2024-09-24T16:47:59Z

One thing that could/should be considered, is that while adding the references for which Crossref has a DOI or ISBN is quite robust, adding items as book or journal merely on the title that they have is unsatisfactory. For instance, with DOI:10.1145/2786451.2786465, some of the references are sections from the same book (a different author per section), yet they all appear in Crossref as Author + Book title (instead of section title). Maybe it should be up to the user to enable what is actually imported.

To avoid type errors and to avoid overusing `any`, I copied the TypeScript definitions from zotero/translators and slightly tweaked them.

thebluepotato · 2024-09-26T13:35:46Z

The latest commit adds a new IndexerBase abstract class that abstracts the common logic between various "indexers" (couldn't think of a better name). This allows us to more simply add various such "indexers", which now includes Semantic Scholar and Open Alex as well. They all have their pros and cons, but this should give the user a lot of options to automatically fetch these citations.

Based on initial (limited) experimentation:

Crossref: citations seem more "official" than the other sources, but not all items with DOIs have references
Semantic Scholar: because it analyses the indexed papers, it includes many references, but also some random entries that are not actually cited
Open Alex: has usually fewer citations than the others

One issue that this "abstraction" brings is that the context menu when clicking on an item shows the translation keys instead of the corresponding strings.

Dominic-DallOsto · 2024-09-30T22:17:37Z

Hi, I just had a chance to quickly test this and so far things look nice, thanks so much! I haven't been able to fully review the code yet but here are some observations from testing.

Openalex build error

I get the following build error at the moment because of the openalex-sdk. Did you encounter this on your end?

    node_modules/openalex-sdk/dist/src/utils/works.js:7:37:
      7 │ const fs_1 = __importDefault(require("fs"));
        ╵                                      ~~~~

  The package "fs" wasn't found on the file system but is built into node. Are you trying to bundle
  for node? You can use "platform: 'node'" to do that, which will remove this error.

I removed the openalex SDK to test a bit further.

Auto import citations

Firstly, the auto import by identifier button is really nice! It would solve #40. One thing that might also be nice is, if the citation already has a QID attached, that this should be applied to the newly created item when it's imported?

Getting Crossref citations

Testing the auto import of citations from crossref I found some bugs, but they're mostly related to crossref's data so it was just unlucky I happened to pick a bad item haha

Add this item by DOI - 10.1007/BF01700692
Get citations from crossref
- newlines in text aren't rendered properly
- it says I will get 64 citations

Press OK
- actually I only get 2 citations, and they're both the same
  - Checking the API response, this is actually a crossref problem:
  - we get a response with 64 citations, but 62 are unstructured - maybe the message could be edited to exclude unstructured citations if we don't attempt to parse them, or a message after importing could say "imported 2/64 citations from crossref"
  - here crossref is just a bit strange in that 2 of the references have the same DOI. Could we check for duplicates within the crossref response and remove them?

Getting Semantic Scholar citations

I tested with using the item with DOI - 10.1109/ITW.2015.7133169. It got 11/14 citations because 3 had no identifiers in semantic scholar. The request was very slow though compared to getting citations from crossref. Here is an overview of the timing.

The slowdown is because the requests to arxiv are really slow. I tested the same request in the browser and it also took ~10 seconds to complete, so it doesn't seem that this is problem with Cita. Does arxiv have an alternative (faster) API? Maybe a workaround would be to update the progress message with the number of citations already downloaded, so users can see that things are progressing?

thebluepotato · 2024-09-30T23:16:53Z

Openalex build error

I get the following build error at the moment because of the openalex-sdk. Did you encounter this on your end?

Yes sorry, I'm actually entirely new to npm so I forgot to commit the patch to openalex-sdk, fixed in latest commit.

Auto import citations

Firstly, the auto import by identifier button is really nice! It would solve #40. One thing that might also be nice is, if the citation already has a QID attached, that this should be applied to the newly created item when it's imported?

I didn't really look into the Wikidata side of things, but will definitely look into ensuring the QID is imported as well. Is it usually stored in the Extra field?

Import QID

Getting Crossref citations

Testing the auto import of citations from crossref I found some bugs, but they're mostly related to crossref's data so it was just unlucky I happened to pick a bad item haha

Get newlines to show in the alert
Rephrase alert to clarify (parsed does not mean the citations will be added in the end, rephrase)
Apply duplicate filter to the citations to be added as well

Getting Semantic Scholar citations

I tested with using the item with DOI - 10.1109/ITW.2015.7133169. It got 11/14 citations because 3 had no identifiers in semantic scholar. The request was very slow though compared to getting citations from crossref. Here is an overview of the timing.

As it currently stands, the PR relies heavily on Zotero's own existing translators to avoid doing too much heavy lifting and to avoid code duplication. Therefore, if it's slow to import with Cita, it's also slow to import when using the "magic wand" tool that imports items based on their identifiers. Will look into alternatives, but it seems likely that Zotero's own translator is already quite optimized as it is.

thebluepotato · 2024-10-01T14:40:55Z

Regarding arXiv, I updated the translator locally (see: zotero/translators#3366) to use another endpoint which, based on limited testing, should be faster than the one the translator currently uses. However, when testing within Cita, it's just as slow...

EDIT: rather, depending on luck I guess, it can be as "fast" as 1s per request, but still can sometimes be as slow as the other endpoint.

Dominic-DallOsto · 2024-10-03T22:13:25Z

That's great, thanks a lot! And thanks for addressing the issues with the arXiv translator, doing it upstream in Zotero is definitely the right way.

A couple of little things I noticed:

If I right click an item, in the Cita menu it says "Get citations from Semantic" instead of "Get citations from Semantic Scholar" like it says in the More... menu
If I have an item that only has as ISBN, in the right click menu all the options for getting citations are still enabled, whereas in the More... menu they're all rightfully disabled

Otherwise this all looks good

thebluepotato · 2024-10-04T00:39:38Z

Got a little crazy and added OpenCitations capabilities again. However, within all the confusion, I need your input on whether we could/should expand the definition of PIDType to include all "IDs" we're now using and that the various indexers support searching for, or at least OpenAlex identifier and Semantic Scholar Corpus ID. In particular, it would streamline the code by using getPID everywhere

thebluepotato · 2024-10-04T00:41:11Z

If I have an item that only has as ISBN, in the right click menu all the options for getting citations are still enabled, whereas in the More... menu they're all rightfully disabled

For this, I'd like to improve the logic so it is only disabled when no supported identifiers are present. While CrossRef requires a DOI, the other indexers often can search with more identifiers.

Dominic-DallOsto · 2024-10-04T12:25:22Z

Got a little crazy and added OpenCitations capabilities again. However, within all the confusion, I need your input on whether we could/should expand the definition of PIDType to include all "IDs" we're now using and that the various indexers support searching for, or at least OpenAlex identifier and Semantic Scholar Corpus ID. In particular, it would streamline the code by using getPID everywhere

Yeah, I think that's great to abstract this out like you have.

For this, I'd like to improve the logic so it is only disabled when no supported identifiers are present. While CrossRef requires a DOI, the other indexers often can search with more identifiers.

Yeah, that makes sense. I guess how you've set it up you could just check whether IndexerBase.extractSupportedUID returns null? Maybe it'd be nice to have a specific function that does this check.

…types

Dominic-DallOsto · 2024-10-05T12:51:53Z

Do you get the same styling problem as me with the PID rows now?

If I make it so all the identifiers are visible, it looks a bit weird but I guess OK

Also, do you think it makes sense to grey out the fetch icons for identifiers that can't be fetched? I think it is more intuitive than clicking the button and then finding out that it isn't supported?

Additionally, could fetching the OMID and OPENALEX ids give a progress popup similar to fetching QIDs? I found that this took a few seconds to run so I wasn't sure whether anything until the identifier finally appeared.

thebluepotato · 2024-10-05T13:04:33Z

Do you get the same styling problem as me with the PID rows now?

Yes I do, I guess we should also no longer uppercase them all. Should we hide PMID and PMCID from this view? They're supported for searching and all, but I don't think you can get citations from them, so there's no need to highlight them as much.

Also, do you think it makes sense to grey out the fetch icons for identifiers that can't be fetched? I think it is more intuitive than clicking the button and then finding out that it isn't supported?

That'll probably encourage us to further abstract the checking logic, good idea!

Additionally, could fetching the OMID and OPENALEX ids give a progress popup similar to fetching QIDs? I found that this took a few seconds to run so I wasn't sure whether anything until the identifier finally appeared.

In my testing it was nearly instant, but we sure can have a progress indicator.

I'll be away for the week so I won't be able to look at this PR much, feel free to tweak it to your liking if you want!

Dominic-DallOsto · 2024-10-05T13:20:59Z

I played around with things quickly so now they look like this

I'll be away for the week so I won't be able to look at this PR much, feel free to tweak it to your liking if you want!

No worries - thanks a lot for your hard work! I'll try to fully review the code by then and make a roadmap for what we need before merging

Edit: hiding the PMID and PMCID rows makes sense I think, yeah

And the progress messages work great, thanks

localise PID row fetch button "Fetch" text

thebluepotato · 2024-10-05T13:21:38Z

TODOs:

Implement DOI fetcher inspired by https://github.com/bwiernik/zotero-shortdoi
Fetch DOIs from Datacite?
~~Auto-generate DOIs from arXiv?~~ EDIT: Semantic Scholar does not seem to find documents by arXiv-DOI. Therefore, we should just be happy with the arXiv ID and instead, give the "arXiv" PIDType priority over DOI for Semantic Scholar
Adapt rate-limiting to the specific indexer and provide a meaningful error message when request was rate-limited
Implement Semantic Scholar's Corpus ID as PIDType
When fetching PIDs such as OMID, also add other PIDs if contained in the response (such as DOI)

…om them

Dominic-DallOsto · 2024-10-06T16:26:25Z

src/oci/index.ts

+	name: string;
+	id: "qid" | "doi" | "omid";
+}[] = [
+	// https://opencitations.net/oci


Hmm, where did the 030 and 050 come from? https://opencitations.net/oci only has 010, 020, 040, and 06[1-9]0? Did the specification change at some point?

In saying that, I don't know if the parseOci function below works if omid's can be arbitrary length? https://registry.identifiers.org/registry/oci

From my understanding, those prefixes come from old code and from a time when the OpenCitations Corpus was still a separate thing. Seems they should no longer be in use though.

Ok cool, that makes sense.

Dominic-DallOsto · 2024-10-06T16:29:29Z

src/oci/index.ts

+const suppliers: {
+	prefix: string;
+	name: string;
+	id: "qid" | "doi" | "omid";


Maybe these could be capitalised to match PID types? What do you think?

Dominic-DallOsto · 2024-10-06T16:41:24Z

src/cita/citation.ts

@@ -15,7 +15,7 @@ class Citation {
 	ocis: {
 		citingId: string;
 		citedId: string;
-		idType: "qid" | "doi" | "occ";
+		idType: "qid" | "doi" | "omid";


These could maybe be capitalised too?

Dominic-DallOsto · 2024-10-06T16:58:59Z

src/cita/itemWrapper.ts

+			case "arXiv": {
+				const field = this.item.getField("archiveID");
+				if (field && field.startsWith("arXiv:")) {
+					pid = field;


Because we explicitly call this an arXiv type, maybe we can strip out the arXiv: prefix?

Not really with that field. Zotero (and the arXiv translator) store the arXiv ID in the "archiveID" field and in the Extra field. The "archiveID" field is meant to hold IDs of other resources as well based on the scarce documentation.
In short, in this field, the prefix is required, whereas in the Extra field, "arXiv:" is the name of the field (and therefore not part of the value).

Dominic-DallOsto · 2024-10-06T17:07:36Z

src/cita/itemWrapper.ts

-						type.toUpperCase(),
-					),
-				);
+			case "OMID": {


All of these switch statements start to make me think it'd be easier to have a PID class with fetching/getting/setting/... methods, similar to how you did for the indexers. Do you think that would be clearer?

Dominic-DallOsto · 2024-10-06T17:10:50Z

src/cita/zoteroOverlay.tsx

-		Crossref.getCitations();
+		const items = await this.getSelectedItems(menuName, true);
+		if (items.length) {
+			new Crossref().addCitationsToItems(items);


Could this be a static method so we don't need to recreate the indexer every time?

That was my first try but from my (very limited and recent) TypeScript understanding, you can't enforce static functions in abstract classes. So if we want static, we lose the abstraction. I might be wrong though!

Yeah, looks like you're right: microsoft/TypeScript#34516

Maybe this would work with an interface instead?

Dominic-DallOsto · 2024-10-06T17:24:43Z

src/cita/sourceItemWrapper.ts

+
+		return citations;
+
+		// const citations = await Promise.all(


Can we remove this?

Dominic-DallOsto · 2024-10-06T21:28:19Z

src/cita/indexer.ts

+			Services.prompt.alert(
+				window as mozIDOMWindowProxy,
+				Wikicite.formatString(
+					"wikicite.indexer.get-citations.no-doi-title",


Shouldn't these messages say: "No items with a supported identifer provided" found instead of "No items with a DOI provided"?

This message could also then also have the list of supported identifiers?

Bringing Crossref lookup and auto-import to Cita for Zotero 7

806682a

thebluepotato marked this pull request as draft September 22, 2024 21:22

thebluepotato marked this pull request as ready for review September 22, 2024 21:22

thebluepotato added 3 commits September 24, 2024 22:40

Updated duplicate detection logic

bceb456

General cleanup and fixing type errors

185caa8

To avoid type errors and to avoid overusing `any`, I copied the TypeScript definitions from zotero/translators and slightly tweaked them.

Added support for Semantic Scholar and Open Alex as well

92dde74

thebluepotato changed the title ~~Bringing Crossref lookup and auto-import to Cita for Zotero 7~~ Bringing Crossref, Semantic Scholar and Open Alex lookup + auto-import to Cita for Zotero 7 Sep 26, 2024

thebluepotato added 3 commits September 26, 2024 15:48

Fixes to translations

a35bf29

Expand Crossref types

0e621a1

Fixed item submenu labels

2aba06d

Commit patch to openalex-sdk (avoiding fs dependency)

043b859

Various fixes

6c1c361

thebluepotato added 3 commits October 1, 2024 16:56

Merge branch 'zotero7' into zotero7

24b6c48

Update package-lock.json

cfa211c

Improve prompts with citation counts

ee5212a

Added support for OpenCitatations and further refactored

f9bdb16

thebluepotato changed the title ~~Bringing Crossref, Semantic Scholar and Open Alex lookup + auto-import to Cita for Zotero 7~~ Bringing Crossref, Semantic Scholar, Open Citations and Open Alex lookup + auto-import to Cita for Zotero 7 Oct 4, 2024

thebluepotato added 2 commits October 4, 2024 02:24

Fix menu naming

70d20aa

Merge branch 'zotero7' into zotero7

3fca65f

Preliminary work to expand PIDType

7bc48c8

thebluepotato and others added 5 commits October 4, 2024 14:36

Expand PIDType, simplify Indexer logic and cleanup

11016e2

Fix submenu enabled/disabled

f97bffa

Add support for fetching OMID

1c420b5

Add ability to wetch OpenAlex work ID

612f737

Remove declare const Services because we have it defined in zotero-…

9de4d25

…types

Dominic-DallOsto and others added 3 commits October 5, 2024 15:16

Make PID rows fit new labels

b4d3294

Disable fetch PID button if fetching isn't implemented for that PID

dfabd20

Preliminary support for showing fetch progress

5882332

Dominic-DallOsto and others added 6 commits October 5, 2024 15:23

Slightly reduce width of PID row labels now that they're not uppercase

62d390a

Don't show PMID or PMCID PID rows because we can't fetch citations fr…

9ccce0f

…om them

Implemented DOI fetching (Crossref only)

450610f

Localise fetch button tooltip in pid rows

692e6e7

Make a DOI type

7866b30

Update DOI type in sourceItemWrapper

a61595e

Dominic-DallOsto reviewed Oct 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bringing Crossref, Semantic Scholar, Open Citations and Open Alex lookup + auto-import to Cita for Zotero 7 #300

Bringing Crossref, Semantic Scholar, Open Citations and Open Alex lookup + auto-import to Cita for Zotero 7 #300

thebluepotato commented Sep 22, 2024

Dominic-DallOsto commented Sep 23, 2024

thebluepotato commented Sep 24, 2024

thebluepotato commented Sep 26, 2024 •

edited

Loading

Dominic-DallOsto commented Sep 30, 2024

thebluepotato commented Sep 30, 2024 •

edited

Loading

Openalex build error

Auto import citations

Getting Crossref citations

Getting Semantic Scholar citations

thebluepotato commented Oct 1, 2024 •

edited

Loading

Dominic-DallOsto commented Oct 3, 2024 •

edited

Loading

thebluepotato commented Oct 4, 2024

thebluepotato commented Oct 4, 2024

Dominic-DallOsto commented Oct 4, 2024

Dominic-DallOsto commented Oct 5, 2024

thebluepotato commented Oct 5, 2024

Dominic-DallOsto commented Oct 5, 2024 •

edited

Loading

thebluepotato commented Oct 5, 2024 •

edited

Loading

Dominic-DallOsto Oct 6, 2024

thebluepotato Oct 6, 2024

Dominic-DallOsto Oct 8, 2024

Dominic-DallOsto Oct 6, 2024

Dominic-DallOsto Oct 6, 2024

Dominic-DallOsto Oct 6, 2024

thebluepotato Oct 8, 2024

Dominic-DallOsto Oct 6, 2024

Dominic-DallOsto Oct 6, 2024

thebluepotato Oct 6, 2024

Dominic-DallOsto Oct 8, 2024

Dominic-DallOsto Oct 6, 2024

Dominic-DallOsto Oct 6, 2024

Bringing Crossref, Semantic Scholar, Open Citations and Open Alex lookup + auto-import to Cita for Zotero 7 #300

Are you sure you want to change the base?

Bringing Crossref, Semantic Scholar, Open Citations and Open Alex lookup + auto-import to Cita for Zotero 7 #300

Conversation

thebluepotato commented Sep 22, 2024

Dominic-DallOsto commented Sep 23, 2024

thebluepotato commented Sep 24, 2024

thebluepotato commented Sep 26, 2024 • edited Loading

Dominic-DallOsto commented Sep 30, 2024

Openalex build error

Auto import citations

Getting Crossref citations

Getting Semantic Scholar citations

thebluepotato commented Sep 30, 2024 • edited Loading

Openalex build error

Auto import citations

Getting Crossref citations

Getting Semantic Scholar citations

thebluepotato commented Oct 1, 2024 • edited Loading

Dominic-DallOsto commented Oct 3, 2024 • edited Loading

thebluepotato commented Oct 4, 2024

thebluepotato commented Oct 4, 2024

Dominic-DallOsto commented Oct 4, 2024

Dominic-DallOsto commented Oct 5, 2024

thebluepotato commented Oct 5, 2024

Dominic-DallOsto commented Oct 5, 2024 • edited Loading

thebluepotato commented Oct 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thebluepotato commented Sep 26, 2024 •

edited

Loading

thebluepotato commented Sep 30, 2024 •

edited

Loading

thebluepotato commented Oct 1, 2024 •

edited

Loading

Dominic-DallOsto commented Oct 3, 2024 •

edited

Loading

Dominic-DallOsto commented Oct 5, 2024 •

edited

Loading

thebluepotato commented Oct 5, 2024 •

edited

Loading