todo.txt

manu
---

WAIT When there is a <ptr target="bib:xxxxxx"/> anywhere in the file and that is not wrapped by <bib>, display the siglum as clickable. XXX todo

dans le physical display de <expan>, ajouter l'expansion de(s) abbrs en infobulle (et supprimer les infobulles internes expan/abbr expan/ex)

WAIT <citedRange unit="page|pages">, use explicit singular and plural.

====


# crited

on va avoir une représentation intermédiaire d'une édition, que je suis en train de définir. l'idée est que ça puisse représenter tout type d'édition. il faut que ce soit fait proprement. we can get something working by doing a transform to the intermediate representation.

distinction entre:
* les éditions en ligne intégrées dans la base de données
* éditions latex pour produire un PDF. je pense qu'il ne vaut pas la peine de faire ça proprement. ça ne peut être fait que de manière ad hoc, parce qu'il faut toujours des interventions manuelles pour produire une publication de qualité.


compatibilité: modifier le modèle des éditions de manuscrits de telle sorte que les éditions d'inscriptions en soient un sous-ensemble. faire en sorte qu'il soit possible de traiter les deux types sans avoir besoin de savoir si l'on a affaire à une édition de manuscrit ou d'inscription.

inversement: simplifier l'encodage des éditions de manuscripts lorsqu'il est plus cimplexe que celui des inscriptions. plus petit dénominateur.


<sourceDesc>
	<listWit>
		...

change <editor>...</editor> to <respStmt><resp>author of digital edition</resp>...</respStmt>

we have <rdg source="bib:Foo"> in EGD, but <rdg wit="#foo"> in EGC. why this discrepancy? apparently both @source and wit can be used at the same time, see the tei doc
likewise, EGD has sigla in the biblio, but they are in <witness> in EGC

For the <titleStmt>. Only allow a single <title> element, like in the EGD?

For the apparatus. We currently allow both <variantEncoding method="location-referenced"> and <variantEncoding method="parallel-segmentation">. But a single file uses <variantEncoding method="location-referenced"> (DHARMA_CritEdKakavinParthayajna), so we might want to abandon this method. The apparatus would be assumed to be inlined in the edition _unless_ there is a <div type="apparatus"> (sibling of <div type="edition">).

weirdness with <div type="dyad|metrical|liminal">; we don't really care about the hierarchy  of other types, it only matters do the display.


-----

see https://github.com/erc-dharma/project-documentation/issues/334#issuecomment-2607595057

# arie

# metadata

commencer par métadonnées du texte

dans le menu de gauche: metadata + visual documentation

# validation

need something to ensure that the order of milestone-like elements is what we expect. best would be to write a parser for it. though maybe could be done with relaxng. also note that there is a definite order of succession for <p>, <div>, etc. and milestone-like elements, defined in the EGD.


=================================

https://mail.google.com/mail/u/0/#label/Work/FMfcgzGxTFdwWWMKhmGhVzhnHBXcBwQV
https://mail.google.com/mail/u/0/#label/Work/FMfcgzQXJtDPgHCpTSWgzdvRZrVzdPdJ
https://mail.google.com/mail/u/0/#label/Work/FMfcgzGxTPFzPVKbpkWPhGsWTkBBKSnB
https://mail.google.com/mail/u/0/#label/Work/FMfcgzQXJsztsXjqQVmFQZcttVBzmXGd
https://mail.google.com/mail/u/0/#label/Work/FMfcgzQXJsxlqtxSJzqxPHbcFtfjSWQZ

---


work on sii, then arie

mettre les SII, un fichier ppour chaque inscription dans exchange_aurorachana; regarder Michaël_memo pour détail.

-----

crited et dipled, fichiers prioritaires

DiplEd
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_DiplEdCandrakiranaGriaCau.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_DiplEdCandrakiranaPerpusnasL241.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_DiplEdCandrakiranaPerpusnasL298.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_DiplEdCandrakiranaPerpusnasL631.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_DiplEdKutaraManavaLeidenOr2215.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_DiplEdKutaraManavaManchesterB2.xml

https://github.com/erc-dharma/tfd-sanskrit-philology/blob/master/texts/xml/DHARMA_DiplEdSarvavajrodayaCodex.xml

CritEd (ces fichiers peuvent avoir des fichier _transEng01.xml liés)
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_CritEdBhuvanakosa.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_CritEdCandrakirana.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_CritEdKakavinParthayajna.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_CritEdSarasamuccayaVararuci.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_CritEdSasanaMahaguru.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_CritEdSiksaGuru.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_CritEdSiksaKandangKaresian.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_CritEdSvayambhu.xml

https://github.com/erc-dharma/tfd-sanskrit-philology/blob/master/texts/xml/DHARMA_CritEdSarvavajrodaya.xml


------

images: nakala (humanum)
manu emploie didomena (EHESS)
3d: conservatoire national des données 3d (humanum)

store several URLs for each inscription; add a menu in left margin that enumerates these URLs.

------------

react:
https://github.com/erc-dharma/project-documentation/issues/310#issuecomment-2444531530


# Manu copy-paste

manu a win10 pro; test conserver mise en forme quand copier/coller; Word; marche pas à cause de Windows visiblement; essayer de se limiter aux tags basiques <i>, <b> ou (peut-être) ne pas employer de classes
suis sur que les tags suivants sont OK: gras = strong / italique = em / couleur = ?
il faut aussi conserver des niveaux de titres cohérents, ça se copie automatiquement sous word.


# Manu reste

demander amandine si ses metadata complètes, puis faire un export dans dépôt
perso, puis parser

suivre ariane puis faire demande télétravail


===


find a way to merge the parallels db with the main one; it should be updated
every week or so


in biblio, move sharedocs links to notes; requires to have
* a mechanism for updating the biblio
* a mechanism for adding notes and linking them to an existing entry

Tamil: word split vs. metrical split

attr to <p> for marking up blessings/curses? @ana? find something generic for
all custom stuff (additions to the egd).

prosodic patterns; be careful about placemenbt of guillemets and footnote nums.


===

in xpath maybe support yielding strings (only need this for the last component,
for use in the command-line tool)

allow selection of attrs in xpath, useful for doing searches from the cmd line

need to fix the datatype mess in rng schema; distinguish between attrs that accept
a single value and the others.

shorthand in bib

when apparatus hidden, remove its headings from submenu; and add ellipsis after
apparatus heading

deal with divs grantha; for this need to use a tree, not bytecode


=================

elements that can be ignored and should be removed eventually:

prefixDef, listPrefixDef, schemaRef
but first verify that nothing depends on them in xslt files
and also make sure they do not appear in templates

//TEI/teiHeader/fileDesc/publicationStmt
//TEI/teiHeader/fileDesc/sourceDesc/msDesc/msIdentifier
//TEI/teiHeader/encodingDesc
//TEI/teiHeader/revisionDesc

revisionDesc can always be ignored

=================

pour languages dans display, tous ceux de div type=edition + écrfiture
écritures à gérer
"Tamil in Tamil Script; Sanskrit in Grnatha script":

================ end manu

.pagelike: <pb> + <milestone type="page-like">
  Use a single instruction for representing them, taking into account <fw>
.linelike: <lb>
.gridlike  <milestone type="grid-like">


rm dipled schema, use ins schema for this.


## Refactoring


we must do NFC normalization at some point; when? not before storing the
file in the db (might need the original later on for e.g. hashing); simplest
would be to do that just before parsing the file, but this will mess up
columns numbers. still should do it, because we don't refer to columns for
now and because all string comparisons will be messed up. do the final
normalization step (new lines, etc.) when outputting documents.

shouldn't store separately the app from all the data files, because we need the
data files to be present to do anything with the db. the app code is useless
on its own. add projdoc as a submodule? can we still do a git pull in the app
repo without having git complain that the repo has been modified?
everything should be in the same repo. maybe use a reload command that reloads
data files _but not the code_?

in fact we have 2 build levels: fetch files from the fs, and parse the
documents. should do the minimum whenever possible.


## Validation

Script maturity is for use only with the class "Brahmi and derivatives" (and
its subcategories); for any other script classes it is not optional but
"forbidden". For Brahmi, it is mandatory. Amend rules accordingly.

deal with uniqueness of phys elements:
* pb and pagelike milestones must have unique @n in the whole div/edition

Check for multiple uses of the same bib entry as in https://dharman.in/display/DHARMA_INSBengalCharters00050#bibliography (disallow?)

actually use dharma.rng, only in /texts for now, afterwards distribute it;


## Display

for invalid inscriptions, show the xml, but without formatting tags, etc.
need to convert the error (line,column) to an offset
use xmlparser.ErrorByteIndex instead of donig a manual conversion
https://docs.python.org/3/library/pyexpat.html

deal with rendition and xml:lang, which must cover the whole text in div type edition.
must be dealt with in tree.py

manu: In physical display, do not display editorial hyphens, but do show them
in logical display. For this to work, need to tag languages. XXX
hyphens between words? or at the end fo a line?

manu: grantha translit with button several states translit methods

* fix incorrect verse numbers that should be in Roman in DHARMA_INSCIC00066
  https://dharmalekha.info/texts/DHARMA_INSCIC00066#translation-1

* Sort out languages tagging; assign language categories (lang of
  the ed. or of the rest, main or secondary lang; probably not
  useful to keep track of <foreign>)

add tooltip for expan in <abbr><expan> in phys disp; but need to know how to do
that

div rendition="class:38768 maturity:83213" (grantha) à mettre en gras pas seulement hi rend=grantha ; pour SII0501358
idem pour <lg rendition=...> dans Tiruvavatuturai01

don't think we are formarring abbr/expan as supposed

should use the lang attribute in html to tag appropriately xml elements
with an @xml:lang.


## Problems with @n.

The repetitive scheme is not clear and unpredictable. Should have a clearer
convention.


## XML display

Put the tab button in the sidebar, call it "view source". Should allow resizing
the sidebar, too. when it is completemy closed, what to display?

* when displaying the sidebar, add toc headings for navigating the xml: header,
  edition, translation, etc.
* Need to have a pretty-print func that preserves space and
  doesn't add unnecessary space.
* Also add line numbers
* Style the thing with a color for comments and tags, maybe different colors
  for milestones and logical elements.
* Add error messages with popups in the XML.


## Website

for https://github.com/erc-dharma/project-documentation/issues/266#issue-2207593274
don't use href in in-page links, it's confusing; use data-href instead; and this
would allow us to distinguish page-internal links from the others.

when a file is completely invalid, show the raw xml in the displqay (not pretty-printed).

do a redirect /foo/ -> /foo in nginx _but_ watch out with the /zotero-proxy
stuff.

cumulative timeout for flashing https://developer.mozilla.org/fr/docs/Web/API/setTimeout

Make the top menu sticky on pc? no. Add a button to show/hide the left sidebar (on
pc); where? left of the top menu downward-pointing > thing. The left sidebar
shouldn't pop when we arrive to the page footer, how? The left sidebard should
be resizable, but then dimensions need to be saved as a cookie because reloading
the page will mess up the size.

Generate a site map (wget?).

Use the w3c validator API https://validator.w3.org/docs/api.html with random
urls to detect issues; submit URLs like so:

  https://validator.w3.org/nu/?out=json&doc=$URL

Add a "status" search field to catalog to filter by error status.

add global table of gaiji symbols actually found in inscriptions.
we will add links to inscriptions within this table so that we can find which
inscriptions, etc. contain which symbols.


--------------------- Enhancements

# Parallels

Allow quoting part of the input with "..." to force an exact substring match.
Still keep using the same similarity measure. When there are several quoted
passages, allow overlaps viz. "foo"f"fo" match "foo". Or not? require the
matched strings to occur in the same order? In fact having a second field for
filtering seems better.

This should be linked to the catalog search features, but we must first
integrate with the main db.


## Duplicate file idents and zotero idents

would be convenient to have position-independent files, viz. assume file
basenames are unique AND also extension-independent files to allow people to
move files around.

find some way to report non unique files ; could use an intermediate table
that stores duplicates, like for zotero; for duplicate files in the same repo,
we are sure there is a problem and we can complain early on while processing the
repo itself (to whom, however?); but if the files are in distinct repos, we
cannot tell whether the file is being moved or anything, because there is no
global commit across all repos and the order of operations is not guaranteed.

in any case, we must preserve the fact that a given ident always corresponds to
exactly one elem; so if we have a duplicate ident, do not use this
duplicate ident, instead generate new ones and delete these when appropriate.

== EGD

<ref target="C00007.xml">C. 7</ref>
<ref n="tfa-pallava-epigraphy" target="Pallava00001.xml">Pallava 1</ref>
  new format: <ref target="DHARMA_INSPallava00001.xml"

====
questions: titlecase or not?
q and quote; treated identically
affiliation in people file
repo description?