-
Notifications
You must be signed in to change notification settings - Fork 753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for CISG-Online #3169
base: master
Are you sure you want to change the base?
Conversation
Instead of filling the abstract, info is now saved as formatted notes. Now supports downloading multiple PDF attachments. Parsing logic cleaned up by avoiding trailing array functions.
57c52e4
to
91c2648
Compare
CISG-Online.js
Outdated
|
||
|
||
function detectWeb(doc, url) { | ||
// TODO: adjust the logic here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove TODOs
if (url.includes('caseId=')) { | ||
return 'case'; | ||
} | ||
else if (url.includes('search-for-cases')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always true due to target regex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's better? A target regex that matches the entire website with a detectWeb that may return false, or a specific target regex where detectWeb always returns an itemType?
CISG-Online.js
Outdated
function getSearchResults(doc, checkOnly) { | ||
var items = {}; | ||
var found = false; | ||
let selector = 'div.search-result div.col-md-10 a.search-result-link'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove div
specifiers (not important) and .col-md-10
for (let block of blocks.values()) { | ||
result += `<h2>${label}</h2><ul>`; | ||
// TODO: check fragility | ||
let rows = block.querySelectorAll("div.row > div > div.row"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does look pretty fragile! Anything else we can match on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tragically no... Example:
<div class="is-content">
<div class="row">
<div class="col-xs-10">
<div class="row">
<div class="col-xs-12 col-lg-4">
<b>Name</b>
</div>
<div class="col-xs-12 col-lg-8">
Brands International Corporation
</div>
</div>
<div class="row">
<div class="col-xs-12 col-lg-4">
<b>Place of business</b>
</div>
<div class="col-xs-12 col-lg-8">
Canada
</div>
</div>
<div class="row">
<div class="col-xs-12 col-lg-4">
<b>Role in transaction</b>
</div>
<div class="col-xs-12 col-lg-8">
Seller
</div>
</div>
</div>
<div class="col-xs-2 text-right">
<img src="/css/flags/blank.gif" class="flag flag-ca align-top" />
</div>
</div>
</div>
Let's just say the website does not really adhere to semantic web...
CISG-Online.js
Outdated
result += `<h2>${label}</h2><ul>`; | ||
// TODO: check fragility | ||
let rows = block.querySelectorAll("div.row > div > div.row"); | ||
for (let row of rows.values()) result += `<li>${labellize(row)}</li>`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be no need for .values()
(here and elsewhere)
CISG-Online.js
Outdated
for (let row of rows.values()) result += `<li>${labellize(row)}</li>`; | ||
result += "</ul>\n"; | ||
} | ||
Z.debug(result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove debugs before merge
CISG-Online.js
Outdated
|
||
let result = "<h2>Case History</h2><ul>"; | ||
for (let cisgCase of cases.values()) { | ||
//result += `<li>${labellize(row)}</li>`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented-out lines
"url": "https://cisg-online.org/search-for-cases?caseId=14425", | ||
"attachments": [ | ||
{ | ||
"title": "Full text of decision Original language: English", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full Text PDF
(can match on common title patterns like "Full text of decision"). Language should go in language
field on item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This database provides both the original language PDF and translations (most of the time). The name of the PDF files comes from the database itself. How should the files be "canonically" named in Zotero when there are multiple?
CISG-Online is (chiefly) a database of cases related to the UN Convention on United Nations Convention on Contracts for the International Sale of Goods. This translator allows Zotero to save cases found on there.