-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtodo.txt
384 lines (227 loc) · 14.4 KB
/
todo.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
manu
---
WAIT When there is a <ptr target="bib:xxxxxx"/> anywhere in the file and that is not wrapped by <bib>, display the siglum as clickable. XXX todo
dans le physical display de <expan>, ajouter l'expansion de(s) abbrs en infobulle (et supprimer les infobulles internes expan/abbr expan/ex)
WAIT <citedRange unit="page|pages">, use explicit singular and plural.
====
# crited
on va avoir une représentation intermédiaire d'une édition, que je suis en train de définir. l'idée est que ça puisse représenter tout type d'édition. il faut que ce soit fait proprement. we can get something working by doing a transform to the intermediate representation.
distinction entre:
* les éditions en ligne intégrées dans la base de données
* éditions latex pour produire un PDF. je pense qu'il ne vaut pas la peine de faire ça proprement. ça ne peut être fait que de manière ad hoc, parce qu'il faut toujours des interventions manuelles pour produire une publication de qualité.
compatibilité: modifier le modèle des éditions de manuscrits de telle sorte que les éditions d'inscriptions en soient un sous-ensemble. faire en sorte qu'il soit possible de traiter les deux types sans avoir besoin de savoir si l'on a affaire à une édition de manuscrit ou d'inscription.
inversement: simplifier l'encodage des éditions de manuscripts lorsqu'il est plus cimplexe que celui des inscriptions. plus petit dénominateur.
<sourceDesc>
<listWit>
...
change <editor>...</editor> to <respStmt><resp>author of digital edition</resp>...</respStmt>
we have <rdg source="bib:Foo"> in EGD, but <rdg wit="#foo"> in EGC. why this discrepancy? apparently both @source and wit can be used at the same time, see the tei doc
likewise, EGD has sigla in the biblio, but they are in <witness> in EGC
For the <titleStmt>. Only allow a single <title> element, like in the EGD?
For the apparatus. We currently allow both <variantEncoding method="location-referenced"> and <variantEncoding method="parallel-segmentation">. But a single file uses <variantEncoding method="location-referenced"> (DHARMA_CritEdKakavinParthayajna), so we might want to abandon this method. The apparatus would be assumed to be inlined in the edition _unless_ there is a <div type="apparatus"> (sibling of <div type="edition">).
weirdness with <div type="dyad|metrical|liminal">; we don't really care about the hierarchy of other types, it only matters do the display.
-----
see https://github.com/erc-dharma/project-documentation/issues/334#issuecomment-2607595057
# arie
# metadata
commencer par métadonnées du texte
dans le menu de gauche: metadata + visual documentation
# validation
need something to ensure that the order of milestone-like elements is what we expect. best would be to write a parser for it. though maybe could be done with relaxng. also note that there is a definite order of succession for <p>, <div>, etc. and milestone-like elements, defined in the EGD.
=================================
https://mail.google.com/mail/u/0/#label/Work/FMfcgzGxTFdwWWMKhmGhVzhnHBXcBwQV
https://mail.google.com/mail/u/0/#label/Work/FMfcgzQXJtDPgHCpTSWgzdvRZrVzdPdJ
https://mail.google.com/mail/u/0/#label/Work/FMfcgzGxTPFzPVKbpkWPhGsWTkBBKSnB
https://mail.google.com/mail/u/0/#label/Work/FMfcgzQXJsztsXjqQVmFQZcttVBzmXGd
https://mail.google.com/mail/u/0/#label/Work/FMfcgzQXJsxlqtxSJzqxPHbcFtfjSWQZ
---
work on sii, then arie
mettre les SII, un fichier ppour chaque inscription dans exchange_aurorachana; regarder Michaël_memo pour détail.
-----
crited et dipled, fichiers prioritaires
DiplEd
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_DiplEdCandrakiranaGriaCau.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_DiplEdCandrakiranaPerpusnasL241.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_DiplEdCandrakiranaPerpusnasL298.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_DiplEdCandrakiranaPerpusnasL631.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_DiplEdKutaraManavaLeidenOr2215.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_DiplEdKutaraManavaManchesterB2.xml
https://github.com/erc-dharma/tfd-sanskrit-philology/blob/master/texts/xml/DHARMA_DiplEdSarvavajrodayaCodex.xml
CritEd (ces fichiers peuvent avoir des fichier _transEng01.xml liés)
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_CritEdBhuvanakosa.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_CritEdCandrakirana.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_CritEdKakavinParthayajna.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_CritEdSarasamuccayaVararuci.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_CritEdSasanaMahaguru.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_CritEdSiksaGuru.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_CritEdSiksaKandangKaresian.xml
https://github.com/erc-dharma/tfd-nusantara-philology/blob/master/editions/DHARMA_CritEdSvayambhu.xml
https://github.com/erc-dharma/tfd-sanskrit-philology/blob/master/texts/xml/DHARMA_CritEdSarvavajrodaya.xml
------
images: nakala (humanum)
manu emploie didomena (EHESS)
3d: conservatoire national des données 3d (humanum)
store several URLs for each inscription; add a menu in left margin that enumerates these URLs.
------------
react:
https://github.com/erc-dharma/project-documentation/issues/310#issuecomment-2444531530
# Manu copy-paste
manu a win10 pro; test conserver mise en forme quand copier/coller; Word; marche pas à cause de Windows visiblement; essayer de se limiter aux tags basiques <i>, <b> ou (peut-être) ne pas employer de classes
suis sur que les tags suivants sont OK: gras = strong / italique = em / couleur = ?
il faut aussi conserver des niveaux de titres cohérents, ça se copie automatiquement sous word.
# Manu reste
demander amandine si ses metadata complètes, puis faire un export dans dépôt
perso, puis parser
suivre ariane puis faire demande télétravail
===
find a way to merge the parallels db with the main one; it should be updated
every week or so
in biblio, move sharedocs links to notes; requires to have
* a mechanism for updating the biblio
* a mechanism for adding notes and linking them to an existing entry
Tamil: word split vs. metrical split
attr to <p> for marking up blessings/curses? @ana? find something generic for
all custom stuff (additions to the egd).
prosodic patterns; be careful about placemenbt of guillemets and footnote nums.
===
in xpath maybe support yielding strings (only need this for the last component,
for use in the command-line tool)
allow selection of attrs in xpath, useful for doing searches from the cmd line
need to fix the datatype mess in rng schema; distinguish between attrs that accept
a single value and the others.
shorthand in bib
when apparatus hidden, remove its headings from submenu; and add ellipsis after
apparatus heading
deal with divs grantha; for this need to use a tree, not bytecode
=================
elements that can be ignored and should be removed eventually:
prefixDef, listPrefixDef, schemaRef
but first verify that nothing depends on them in xslt files
and also make sure they do not appear in templates
//TEI/teiHeader/fileDesc/publicationStmt
//TEI/teiHeader/fileDesc/sourceDesc/msDesc/msIdentifier
//TEI/teiHeader/encodingDesc
//TEI/teiHeader/revisionDesc
revisionDesc can always be ignored
=================
pour languages dans display, tous ceux de div type=edition + écrfiture
écritures à gérer
"Tamil in Tamil Script; Sanskrit in Grnatha script":
================ end manu
.pagelike: <pb> + <milestone type="page-like">
Use a single instruction for representing them, taking into account <fw>
.linelike: <lb>
.gridlike <milestone type="grid-like">
rm dipled schema, use ins schema for this.
## Refactoring
we must do NFC normalization at some point; when? not before storing the
file in the db (might need the original later on for e.g. hashing); simplest
would be to do that just before parsing the file, but this will mess up
columns numbers. still should do it, because we don't refer to columns for
now and because all string comparisons will be messed up. do the final
normalization step (new lines, etc.) when outputting documents.
shouldn't store separately the app from all the data files, because we need the
data files to be present to do anything with the db. the app code is useless
on its own. add projdoc as a submodule? can we still do a git pull in the app
repo without having git complain that the repo has been modified?
everything should be in the same repo. maybe use a reload command that reloads
data files _but not the code_?
in fact we have 2 build levels: fetch files from the fs, and parse the
documents. should do the minimum whenever possible.
## Validation
Script maturity is for use only with the class "Brahmi and derivatives" (and
its subcategories); for any other script classes it is not optional but
"forbidden". For Brahmi, it is mandatory. Amend rules accordingly.
deal with uniqueness of phys elements:
* pb and pagelike milestones must have unique @n in the whole div/edition
Check for multiple uses of the same bib entry as in https://dharman.in/display/DHARMA_INSBengalCharters00050#bibliography (disallow?)
actually use dharma.rng, only in /texts for now, afterwards distribute it;
## Display
for invalid inscriptions, show the xml, but without formatting tags, etc.
need to convert the error (line,column) to an offset
use xmlparser.ErrorByteIndex instead of donig a manual conversion
https://docs.python.org/3/library/pyexpat.html
deal with rendition and xml:lang, which must cover the whole text in div type edition.
must be dealt with in tree.py
manu: In physical display, do not display editorial hyphens, but do show them
in logical display. For this to work, need to tag languages. XXX
hyphens between words? or at the end fo a line?
manu: grantha translit with button several states translit methods
* fix incorrect verse numbers that should be in Roman in DHARMA_INSCIC00066
https://dharmalekha.info/texts/DHARMA_INSCIC00066#translation-1
* Sort out languages tagging; assign language categories (lang of
the ed. or of the rest, main or secondary lang; probably not
useful to keep track of <foreign>)
add tooltip for expan in <abbr><expan> in phys disp; but need to know how to do
that
div rendition="class:38768 maturity:83213" (grantha) à mettre en gras pas seulement hi rend=grantha ; pour SII0501358
idem pour <lg rendition=...> dans Tiruvavatuturai01
don't think we are formarring abbr/expan as supposed
should use the lang attribute in html to tag appropriately xml elements
with an @xml:lang.
## Problems with @n.
The repetitive scheme is not clear and unpredictable. Should have a clearer
convention.
## XML display
Put the tab button in the sidebar, call it "view source". Should allow resizing
the sidebar, too. when it is completemy closed, what to display?
* when displaying the sidebar, add toc headings for navigating the xml: header,
edition, translation, etc.
* Need to have a pretty-print func that preserves space and
doesn't add unnecessary space.
* Also add line numbers
* Style the thing with a color for comments and tags, maybe different colors
for milestones and logical elements.
* Add error messages with popups in the XML.
## Website
for https://github.com/erc-dharma/project-documentation/issues/266#issue-2207593274
don't use href in in-page links, it's confusing; use data-href instead; and this
would allow us to distinguish page-internal links from the others.
when a file is completely invalid, show the raw xml in the displqay (not pretty-printed).
do a redirect /foo/ -> /foo in nginx _but_ watch out with the /zotero-proxy
stuff.
cumulative timeout for flashing https://developer.mozilla.org/fr/docs/Web/API/setTimeout
Make the top menu sticky on pc? no. Add a button to show/hide the left sidebar (on
pc); where? left of the top menu downward-pointing > thing. The left sidebar
shouldn't pop when we arrive to the page footer, how? The left sidebard should
be resizable, but then dimensions need to be saved as a cookie because reloading
the page will mess up the size.
Generate a site map (wget?).
Use the w3c validator API https://validator.w3.org/docs/api.html with random
urls to detect issues; submit URLs like so:
https://validator.w3.org/nu/?out=json&doc=$URL
Add a "status" search field to catalog to filter by error status.
add global table of gaiji symbols actually found in inscriptions.
we will add links to inscriptions within this table so that we can find which
inscriptions, etc. contain which symbols.
--------------------- Enhancements
# Parallels
Allow quoting part of the input with "..." to force an exact substring match.
Still keep using the same similarity measure. When there are several quoted
passages, allow overlaps viz. "foo"f"fo" match "foo". Or not? require the
matched strings to occur in the same order? In fact having a second field for
filtering seems better.
This should be linked to the catalog search features, but we must first
integrate with the main db.
## Duplicate file idents and zotero idents
would be convenient to have position-independent files, viz. assume file
basenames are unique AND also extension-independent files to allow people to
move files around.
find some way to report non unique files ; could use an intermediate table
that stores duplicates, like for zotero; for duplicate files in the same repo,
we are sure there is a problem and we can complain early on while processing the
repo itself (to whom, however?); but if the files are in distinct repos, we
cannot tell whether the file is being moved or anything, because there is no
global commit across all repos and the order of operations is not guaranteed.
in any case, we must preserve the fact that a given ident always corresponds to
exactly one elem; so if we have a duplicate ident, do not use this
duplicate ident, instead generate new ones and delete these when appropriate.
== EGD
<ref target="C00007.xml">C. 7</ref>
<ref n="tfa-pallava-epigraphy" target="Pallava00001.xml">Pallava 1</ref>
new format: <ref target="DHARMA_INSPallava00001.xml"
====
questions: titlecase or not?
q and quote; treated identically
affiliation in people file
repo description?