-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bringing Crossref, Semantic Scholar, Open Citations and Open Alex lookup + auto-import to Cita for Zotero 7 #300
base: zotero7
Are you sure you want to change the base?
Conversation
Thanks a lot for this! It'll take me a little while to review this in detail sorry, but this is great! |
One thing that could/should be considered, is that while adding the references for which Crossref has a DOI or ISBN is quite robust, adding items as book or journal merely on the title that they have is unsatisfactory. For instance, with DOI:10.1145/2786451.2786465, some of the references are sections from the same book (a different author per section), yet they all appear in Crossref as Author + Book title (instead of section title). Maybe it should be up to the user to enable what is actually imported. |
To avoid type errors and to avoid overusing `any`, I copied the TypeScript definitions from zotero/translators and slightly tweaked them.
The latest commit adds a new Based on initial (limited) experimentation:
One issue that this "abstraction" brings is that the context menu when clicking on an item shows the translation keys instead of the corresponding strings. |
Hi, I just had a chance to quickly test this and so far things look nice, thanks so much! I haven't been able to fully review the code yet but here are some observations from testing. Openalex build errorI get the following build error at the moment because of the openalex-sdk. Did you encounter this on your end? node_modules/openalex-sdk/dist/src/utils/works.js:7:37:
7 │ const fs_1 = __importDefault(require("fs"));
╵ ~~~~
The package "fs" wasn't found on the file system but is built into node. Are you trying to bundle
for node? You can use "platform: 'node'" to do that, which will remove this error. I removed the openalex SDK to test a bit further. Auto import citationsFirstly, the auto import by identifier button is really nice! It would solve #40. One thing that might also be nice is, if the citation already has a QID attached, that this should be applied to the newly created item when it's imported? Getting Crossref citationsTesting the auto import of citations from crossref I found some bugs, but they're mostly related to crossref's data so it was just unlucky I happened to pick a bad item haha
Getting Semantic Scholar citationsI tested with using the item with DOI - 10.1109/ITW.2015.7133169. It got 11/14 citations because 3 had no identifiers in semantic scholar. The request was very slow though compared to getting citations from crossref. Here is an overview of the timing. The slowdown is because the requests to arxiv are really slow. I tested the same request in the browser and it also took ~10 seconds to complete, so it doesn't seem that this is problem with Cita. Does arxiv have an alternative (faster) API? Maybe a workaround would be to update the progress message with the number of citations already downloaded, so users can see that things are progressing? |
Yes sorry, I'm actually entirely new to
I didn't really look into the Wikidata side of things, but will definitely look into ensuring the QID is imported as well. Is it usually stored in the Extra field?
As it currently stands, the PR relies heavily on Zotero's own existing translators to avoid doing too much heavy lifting and to avoid code duplication. Therefore, if it's slow to import with Cita, it's also slow to import when using the "magic wand" tool that imports items based on their identifiers. Will look into alternatives, but it seems likely that Zotero's own translator is already quite optimized as it is. |
Regarding arXiv, I updated the translator locally (see: zotero/translators#3366) to use another endpoint which, based on limited testing, should be faster than the one the translator currently uses. However, when testing within Cita, it's just as slow... EDIT: rather, depending on luck I guess, it can be as "fast" as 1s per request, but still can sometimes be as slow as the other endpoint. |
That's great, thanks a lot! And thanks for addressing the issues with the arXiv translator, doing it upstream in Zotero is definitely the right way. A couple of little things I noticed:
Otherwise this all looks good |
Got a little crazy and added OpenCitations capabilities again. However, within all the confusion, I need your input on whether we could/should expand the definition of |
For this, I'd like to improve the logic so it is only disabled when no supported identifiers are present. While CrossRef requires a DOI, the other indexers often can search with more identifiers. |
Yeah, I think that's great to abstract this out like you have.
Yeah, that makes sense. I guess how you've set it up you could just check whether |
Do you get the same styling problem as me with the PID rows now? If I make it so all the identifiers are visible, it looks a bit weird but I guess OK Also, do you think it makes sense to grey out the fetch icons for identifiers that can't be fetched? I think it is more intuitive than clicking the button and then finding out that it isn't supported? Additionally, could fetching the OMID and OPENALEX ids give a progress popup similar to fetching QIDs? I found that this took a few seconds to run so I wasn't sure whether anything until the identifier finally appeared. |
Yes I do, I guess we should also no longer uppercase them all. Should we hide PMID and PMCID from this view? They're supported for searching and all, but I don't think you can get citations from them, so there's no need to highlight them as much.
That'll probably encourage us to further abstract the checking logic, good idea!
In my testing it was nearly instant, but we sure can have a progress indicator. I'll be away for the week so I won't be able to look at this PR much, feel free to tweak it to your liking if you want! |
I played around with things quickly so now they look like this
No worries - thanks a lot for your hard work! I'll try to fully review the code by then and make a roadmap for what we need before merging Edit: hiding the PMID and PMCID rows makes sense I think, yeah And the progress messages work great, thanks
|
TODOs:
|
name: string; | ||
id: "qid" | "doi" | "omid"; | ||
}[] = [ | ||
// https://opencitations.net/oci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, where did the 030 and 050 come from? https://opencitations.net/oci only has 010, 020, 040, and 06[1-9]0? Did the specification change at some point?
In saying that, I don't know if the parseOci
function below works if omid's can be arbitrary length? https://registry.identifiers.org/registry/oci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, those prefixes come from old code and from a time when the OpenCitations Corpus was still a separate thing. Seems they should no longer be in use though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok cool, that makes sense.
const suppliers: { | ||
prefix: string; | ||
name: string; | ||
id: "qid" | "doi" | "omid"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe these could be capitalised to match PID types? What do you think?
@@ -15,7 +15,7 @@ class Citation { | |||
ocis: { | |||
citingId: string; | |||
citedId: string; | |||
idType: "qid" | "doi" | "occ"; | |||
idType: "qid" | "doi" | "omid"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could maybe be capitalised too?
case "arXiv": { | ||
const field = this.item.getField("archiveID"); | ||
if (field && field.startsWith("arXiv:")) { | ||
pid = field; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we explicitly call this an arXiv type, maybe we can strip out the arXiv:
prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really with that field. Zotero (and the arXiv translator) store the arXiv ID in the "archiveID" field and in the Extra field. The "archiveID" field is meant to hold IDs of other resources as well based on the scarce documentation.
In short, in this field, the prefix is required, whereas in the Extra field, "arXiv:" is the name of the field (and therefore not part of the value).
type.toUpperCase(), | ||
), | ||
); | ||
case "OMID": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these switch statements start to make me think it'd be easier to have a PID
class with fetching/getting/setting/... methods, similar to how you did for the indexers. Do you think that would be clearer?
Crossref.getCitations(); | ||
const items = await this.getSelectedItems(menuName, true); | ||
if (items.length) { | ||
new Crossref().addCitationsToItems(items); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a static method so we don't need to recreate the indexer every time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my first try but from my (very limited and recent) TypeScript understanding, you can't enforce static functions in abstract classes. So if we want static, we lose the abstraction. I might be wrong though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, looks like you're right: microsoft/TypeScript#34516
Maybe this would work with an interface instead?
|
||
return citations; | ||
|
||
// const citations = await Promise.all( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this?
Services.prompt.alert( | ||
window as mozIDOMWindowProxy, | ||
Wikicite.formatString( | ||
"wikicite.indexer.get-citations.no-doi-title", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these messages say: "No items with a supported identifer provided" found instead of "No items with a DOI provided"?
This message could also then also have the list of supported identifiers?
Hi! I adapted the (now stale) PR #139 to the new Zotero 7 branch so it has a chance to be swept up in the new release. The general logic is unchanged from the other PR, but I've made quite a few updates for efficiency, code clarity and type safety as well as fixed a few failing Promises here and there. I've tested quite a bit already, but it could definitely use more in-depth testing.
And I've also added a button to citations to auto-import that reference into Zotero with one click and then link it. It's similar to what https://github.com/MuiseDestiny/zotero-reference does, but I find that addon confusing at best and it doesn't help that all the info is in Mandarin Chinese...
All in all, probably still a WIP, but happy to receive code reviews and have some people test this!