Skip to content

Processor input (JSON)

Bruce D'Arcus edited this page Jul 12, 2020 · 8 revisions

Naming Conventions

In order to come up with definitive naming conventions for CSL Processor input, here is a (work in progress) collection of inconsistencies found in the current JSON format. Since the majority of attributes use a minus to separate terms, this list includes strictly camel-cased names.

Citation Data

  • citationItems
  • noteIndex (properties)
  • ciationID
  • sortedItems
  • sortkeys (not sort-keys)

Items

The valid items seem to distinguish between minus and underscore in that the minus seems to be used to distinguish between subtypes and the underscore is used to connect terms. For example, it is motion_picture' but article-journal'. This approach is consistent by itself, but since the most other attributes use a minus as a connector I find this confusing; is the distinction really necessary, or could we use a minus everywhere? An alternative would be to use `/' for sub-typing.

  • shortTitle
  • journalAbbreviations
  • archive_location / archive-place

Aside from Bruce: keep in mind the origin of the data schema, which is effectively a direct mapping of CSL terms. The intention of the term is indeed close to the assumption above: hyphens where to indicate subtyping (article-magazine), but also relations (container-title). By contrast, multiple word get separated by an underscore (motion_picture). I think that still makes sense for the original scope (CSL proper), but may need to be reconsidered for the data input format.

Names

Both non-dropping-particle is really long; wouldn't it be easier to have particle and dropping-particle and demote-particle?

  • isInstitution

Bibliography Input

To filter items to be included in the bibliography, a list of conditions can be passed in an array. As far as I can tell, the order of the individual elements in the array is of no concern; each element has exactly two properties: 'field' and 'value'. Instead of an array of hashes, it seems to me, a single hash would suffice. For example

var myarg = {
  "select" : [
    {
      "field" : "type",
      "value" : "book"
    },
    {
      "field" : "categories",
      "value" : "1990s"
    }
  ]
}

Could then be written as:

var myarg = {
  "select" : {
    "type" : "book",
    "categories" : "1990s"
  }
}

Also, in the Ruby implementation we're mapping select, include, exclude, quash to all, any, none, and skip, respectively, as the first three directly correspond to list filters in Ruby. Since the citeproc-js manual describes the meaning of select, include, and exclude using exactly these terms (all, any, none), I wonder if they would not be the more intuitive choice in the first place.

Bibliography Output

  • bibliography_errors (the contents too, but perhaps these can remain implementation depependent?)

These are just suggestions:

  • line-spacing instead of linespacing
  • entry-spacing instead of entryspacing
  • indent or hanging-indent instead of hangingindent
  • offset or max-offset instead of maxoffset
  • preamble or before instead of bibstart
  • postamble or after instead of bibend (I know postamble is probably not proper English outside of Computer Science, but GNU Make uses it, for example)

Citation Output

  • citation_errors (same as above)

Dates

Date variables, as currently defined, can be either single dates or date ranges; this leads to unnecessarily complex implementations. Wouldn't it be cleaner to distinguish between dates and ranges in the first place? In the same way, open ranges could be defined more explicitly: right now an open range is defined by adding date parts containing zeroes (which are otherwise invalid date values) to the date parts array.

Instead of specifying individual date and date range types, an easy solution would be to pick a subset of EDTF as date input. This way, date variables would be written as strings thus simplifying the processor input (but would require for the processor to parse the EDTF strings).

Mendeley CSL-JSON

Mendeley provided the following documentation on their use of CSL-JSON:

Support for the CSL Embedded Citation Object format is available in Mendeley Desktop 1.0 and later.

The CSL citation data object consists of:

  • a required "schema" element of type "string", set to the URI of the schema
  • a required "citationID" element of type "string" or "number", set to ???
  • a "citationItems" element of type "array", containing "objects" with the data of the individual cites. The individual cite object are structured as:
    • a required "id" element of type "string" or "number", set to an unique cite ID
    • a "itemData" element of type "object", described in csl-data.json/#/items , containing the metadata of a single bibliographic item (this object is returned in citeproc-js by sys.retrieveItem() )
    • an "uris" element of type "array", which can contain any number of URIs (of type "string") to the bibliographic item
    • a "prefix" element of type "string"
    • a "suffix" element of type "string"
    • a "locator" element of type "string"
    • a "label" element of type "string", set to one of the CSL locator types (see https://docs.citationstyles.org/en/1.0.1/specification.html#locators)
    • a "suppress-author" element of type "string", "boolean" or "number"
    • a "author-only" element of type "string", "boolean" or "number"
  • a "properties" element of type "object" containing:
    • a "noteIndex" element of type "number", set to the index of the footnote or endnote
  • (Mendeley-specific) a "mendeley" element of type "object" containing:
    • a "previouslyFormattedCitation" element of type "string", set to the rendered output of the cite of the previous rendering round (this can be used to determine if the user manually altered the output)
    • a "manualFormatting" element of type "string", set to the user-customized output of the cite (this output will be used in favor of the generated output)

The method to embed metadata for citations and bibliographies typically varies between word processors. Currently Mendeley uses:

  • For Word for Windows: in-text citations and bibliographies are represented by a field code of type wdFieldAddin or temporarily represented as a bookmark if exporting to OpenOffice.

  • For OpenOffice: in-text citations are of type com.sun.star.text.ReferenceMark. Bibliographies are of type com.sun.star.text.TextSection or temporarily represented as a bookmark if exporting to Word.

An example of an embedded citation object from Mendeley:

{
    "schema": "https://resource.citationstyles.org/schema/latest/input/json/csl-citation.json",
    "citationID": "12rsus7rlj",
    "citationItems": [
        {
            "id": "ITEM-1",
            "itemData": {
                "id": "ITEM-1",
                "issued": {
                    "date-parts": [
                        [
                            "2007"
                        ]
                    ]
                },
                "title": "My paper",
                "type": "journal-article"
            },
            "locator": "21",
            "label": "page",
            "uris": [
                "http://www.mendeley.com/documents/?uuid=970e7ce0-8a21-482e-b7d6-e77794a2d37d",
                "http://www.zotero.org/uniqueDocumentId"
            ]
        }
    ],
    "mendeley": {
        "previouslyFormattedCitation": "(2007)",
        "manualFormatting": "2007b"
    },
    "properties": {
        "noteIndex": 1
    }
}