-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathfields_def.txt
423 lines (422 loc) · 15.9 KB
/
fields_def.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
#
# Search and output field tag definition
#
# Version 1.7 - pam, 13.02.2024
#
# Fields must be declared in the following order:
#
# TG: Once. Tag. Field identifier to be used in an API request in the value of parameter field or query
# DE: Once or more. Description. Describes the content of the field, Used in Help page for API users.
# SH: None or once. Short tag. Alternate value to be used in an API request in the value of parameter field or query
# PR: Once or more. Prefix. Prefix in cellosaurus line of txt version selected by short or long tag
# XP: Once or more. Xpath expr. XML path in cellosaurus XML version selected by short or long tag
# //: Once. Terminator
#
TG id
DE Recommended name. Most frequently the name of the cell line as provided in the original publication.
PR ID
XP /cell-line/name-list/name[@type="identifier"]
CC I know its an exception: that I call it name when it is called ID/identifier
//
TG sy
DE List of synonyms. We try to list all the different synonyms for the cell line, including alternative use
DE of lower and upper cases characters. Misspellings are not included in synonyms (see the "misspelling" tag).
PR SY
XP /cell-line/name-list/name[@type="synonym"]
//
TG idsy
DE Recommended name with all its synonyms. Concatenation of ID and SY.
PR ID
PR SY
XP /cell-line/name-list/name
//
TG ac
DE Primary accession. It is the unique identifier of the cell line.
DE It is normally stable across Cellosaurus versions but when two entries are merged, one of the two
DE accessions stays primary while the second one becomes secondary (see ACAS)
PR AC
XP /cell-line/accession-list/accession[@type="primary"]
//
TG acas
DE Primary and secondary accession. Secondary accession are former primary accession kept here to ensure the access to a cell
DE line via old identifiers.
PR AC
PR AS
XP /cell-line/accession-list/accession
CC No need to allow separate query/retrieval of primary and secondary AC
CC Pam: ok but useful for solr and technical reasons
CC Pam note: we cannot use 'AS' as a solr filed name
//
TG dr
DE Cross-references to external resources: cell lines catalogs, databases,
DE resources listing cell lines as samples or to ontologies. A cross-reference has two parts:
DE the short name of the resource (i.e. CCLE) and an identifier used to locate a particular
DE entry of the resource related to the cell line.
DE For a formal description of all the resources referred to in the Cellosaurus,
DE see <a target="_blank" href="https://ftp.expasy.org/databases/cellosaurus/cellosaurus_xrefs.txt">here</href> .
PR DR
XP /cell-line/xref-list/xref
CC No need to have distinct fields for the different resources:
CC there are too many and there are too many changes from release to release
//
TG ref
DE Publication references. Mostly publications describing the establishment of a cell line or its characterization.
DE Can be journal articles, book chapters, patents and theses. Contains the cross-reference of the publication,
DE its title, authors (or group/consortium) and citation elements.
PR RX
PR RA
PR RG
PR RT
PR RL
XP /cell-line/reference-list/reference
XP /publication
CC Pam: everything about publications and cell-line internal-id referencing them
//
TG rx
DE Publication cross-reference. A unique identifier allowing to access the publication online.
DE The cross-reference has two parts: the shortname of the online resource (i.e. PubMed, DOI,
DE PMCID, CLPUB or Patent) and an identifier used to locate the particular publication related
DE to the cell line. For a formal description of all the resources referred to in the Cellosaurus,
DE see <a target="_blank" href="https://ftp.expasy.org/databases/cellosaurus/cellosaurus_xrefs.txt">here</href> .
PR RX
XP /cell-line/reference-list/reference
XP /publication/attribute::internal-id
XP /publication/xref-list/xref
CC Pam: publication xrefs with cell-line internal-id referencing them
//
TG ra
DE Publication authors. List of authors of a publication referenced in a cell line entry.
PR RA
PR RG
XP /cell-line/reference-list/reference
XP /publication/attribute::internal-id
XP /publication/author-list/person
XP /publication/author-list/consortium
//
TG rt
DE Publication title. Title of a publication referenced in a cell line entry.
PR RT
XP /cell-line/reference-list/reference
XP /publication/attribute::internal-id
XP /publication/title
CC Note: only a few references with a RG line instead of RA (8 so far).
CC Alain has put the name of the consortium inside the author-list (ok) as
CC a "<person name" (example: <person name="The HD iPSC consortium"/>).
CC Not very elegant but no need to change. Only question do we change the short
CC tag from "RA" to "RARG" which is not very elegant but would be more correct
CC in term of content
//
TG rl
DE Publication citation elements. Citation elements of a publication referenced in a cell line entry.
PR RL
XP /cell-line/reference-list/reference
XP /publication/attribute::type
XP /publication/attribute::date
XP /publication/attribute::journal-name
XP /publication/attribute::book-title
XP /publication/attribute::conference-title
XP /publication/attribute::document-title
XP /publication/attribute::document-serie-title
XP /publication/attribute::issn-13
XP /publication/attribute::volume
XP /publication/attribute::last-page
XP /publication/attribute::first-page
XP /publication/attribute::publisher
XP /publication/attribute::city
XP /publication/attribute::institution
XP /publication/attribute::country
XP /publication/attribute::internal-id
//
TG ww
DE Web page related to the cell line
PR WW
XP /cell-line/web-page-list/web-page
//
TG genome-ancestry
DE Estimated ethnic ancestry of the donor of a human cell line based on the analysis of its genome.
SH anc
PR CC Genome ancestry
XP /cell-line/genome-ancestry
//
TG hla
DE HLA typing information. Alleles identified on the MHC type I and type II genes of the donor of a human cell line.
PR CC HLA typing
XP /cell-line/hla-typing-list/hla-typing
CC For both the API and the full text search: no need to go deep in the structure of the HLA information:
CC if one really want to get cell lines with a specific HLA allele value for a particular gene lets do that in SPARQL
//
TG registration
DE Official list, or register in which the cell line is registered.
SH reg
PR CC Registration
XP /cell-line/registration-list/registration
//
TG sequence-variation
DE Important sequence variations of the cell line compared to the reference genome of the species.
SH var
PR CC Sequence variation
XP /cell-line/sequence-variation-list/sequence-variation
//
TG anecdotal
DE Anecdotal details regarding the cell line (its origin, its name or any other particularity).
SH anec
PR CC Anecdotal
XP /cell-line/comment-list/comment[@category="Anecdotal"]
//
TG biotechnology
DE Type of use of the cell line in a biotechnological context.
SH biot
PR CC Biotechnology
XP /cell-line/comment-list/comment[@category="Biotechnology"]
//
TG breed
DE Breed or subspecies an animal cell line is derived from with breed identifiers from FlyBase_Strain, RS and VBO.
PR CC Breed/subspecies
XP /cell-line/breed
CC Pam: problem with slash in attribute name "Breed/subspecies", workaround by using starts-with() in XP
//
TG caution
DE Errors, inconsistencies, ambiguities regarding the origin or other aspects of the cell line.
PR CC Caution
XP /cell-line/comment-list/comment[@category="Caution"]
//
TG cell-type
DE Cell type from which the cell line is derived.
SH cell
PR CC Cell type
XP /cell-line/cell-type
//
TG characteristics
DE Production process or specific biological properties of the cell line.
SH char
PR CC Characteristics
XP /cell-line/comment-list/comment[@category="Characteristics"]
//
TG donor-info
DE Miscellaneous information relevant to the donor of the cell line.
SH donor
PR CC Donor information
XP /cell-line/comment-list/comment[@category="Donor information"]
CC Added october 2022
//
TG derived-from-site
DE Body part (tissue or organ) the cell line is derived from.
SH site
PR CC Derived from site
XP /cell-line/derived-from-site-list/derived-from-site
//
TG discontinued
DE Discontinuation status of the cell line in a cell line catalog.
SH disc
PR CC Discontinued
XP /cell-line/comment-list/comment[@category="Discontinued"]
XP /cell-line/xref-list/xref/discontinued/..
//
TG doubling-time
DE Population doubling-time of the cell line.
SH time
PR CC Doubling time
XP /cell-line/doubling-time-list/doubling-time
CC For the short tag: I hesitated: if I put CC-DT, it looks like if its concern the "DT" (Date) line. And If put CC-PDT (for population doubling time), a often used abbreviation, it seems weird not to use it inside the Cellosaurus
//
TG from
DE Laboratory, research institute, university having established the cell line.
PR CC From
XP /cell-line/comment-list/comment[@category="From"]
//
TG group
DE Specific group the cell line belongs to (example: fish cell lines, vaccine production cell lines).
PR CC Group
XP /cell-line/comment-list/comment[@category="Group"]
//
TG karyotype
DE Information relevant to the chromosomes of a cell line (often to describe chromosomal abnormalities).
SH kar
PR CC Karyotypic information
XP /cell-line/comment-list/comment[@category="Karyotypic information"]
//
TG knockout
DE Gene(s) knocked-out in the cell line and method to obtain the KO.
SH ko
PR CC Knockout cell
XP /cell-line/knockout-cell-list/knockout-cell
//
TG msi
DE Microsatellite instability degree.
PR CC Microsatellite instability
XP /cell-line/microsatellite-instability-list/microsatellite-instability
//
TG miscellaneous
DE Miscellaneous remarks about the cell line.
SH misc
PR CC Miscellaneous
XP /cell-line/comment-list/comment[@category="Miscellaneous"]
//
TG misspelling
DE Identified misspelling(s) of the cell line name with in some case the specific publication or external resource entry where it appears.
SH miss
PR CC Misspelling
XP /cell-line/misspelling-list/misspelling
//
TG mab-isotype
DE Monoclonal antibody isotype. Examples: IgG2a, kappa; IgM, lambda.
SH mabi
PR CC Monoclonal antibody isotype
XP /cell-line/monoclonal-antibody-isotype-list/monoclonal-antibody-isotype
//
TG mab-target
DE Monoclonal antibody target molecule. Generally a specific protein or chemical compound.
SH mabt
PR CC Monoclonal antibody target
XP /cell-line/monoclonal-antibody-target-list/monoclonal-antibody-target
//
TG omics
DE "Omics" study(ies) carried out on the cell line.
PR CC Omics
XP /cell-line/comment-list/comment[@category="Omics"]
//
TG part-of
DE The cell line belongs to a specific panel or collection of cell lines.
SH part
PR CC Part of
XP /cell-line/comment-list/comment[@category="Part of"]
//
TG population
DE Ethnic group, nationality of the individual from which the cell line was sampled.
SH pop
PR CC Population
XP /cell-line/comment-list/comment[@category="Population"]
//
TG problematic
DE Known problem(s) related to the cell line: contaminated, misidentified, misclassified cell line or appearing in a retracted paper.
SH prob
PR CC Problematic cell line
XP /cell-line/comment-list/comment[@category="Problematic cell line"]
//
TG resistance
DE Selected to be resistant to some chemical compound (like a drug used in chemotherapy) or toxin. with a cross-reference to either ChEBI,
DE DrugBank, NCIt or UniProtKB.
SH res
PR CC Selected for resistance to
XP /cell-line/resistance-list/resistance
//
TG senescence
DE When a finite cell line will senesce.
SH sen
PR CC Senescence
XP /cell-line/comment-list/comment[@category="Senescence"]
//
TG integrated
DE Genetic element(s) integrated in the cell line: gene name and identifier in CGNC, FlyBase, FPbase, HGNC, MGI, RGD, UniProtKB, and VGNC.
SH int
PR CC Genetic integration
XP /cell-line/genetic-integration-list/genetic-integration
//
TG transformant
DE What caused the cell line to be transformed: generally a virus (with a cross-reference to NCBI taxon identifier), a chemical compound
DE (with a cross-reference to ChEBI) or a form of irradiation (with a cross-reference to NCIt).
SH tfor
PR CC Transformant
XP /cell-line/transformant-list/transformant
//
TG virology
DE Susceptibility of the cell line to viral infection, presence of integrated viruses or any other virology-related information.
SH vir
PR CC Virology
XP /cell-line/comment-list/comment[@category="Virology"]
//
TG cc
DE Comment(s). Any content described in fields
DE genome-ancestry, hla, registration, sequence-variation, anecdotal, biotechnology, breed, caution, characteristics,
DE discontinued, donor-info, doubling-time, from, group, karyotype, knockout, miscellaneous,
DE misspelling, mab-isotype, mab-target, msi, omics, population, problematic, resistance, senescence, transfected, transformant, virology.
PR CC
XP /cell-line/comment-list/comment
//
TG str
DE Short tandem repeat profile.
PR ST
XP /cell-line/str-list
CC Pam: for solr index we miss the marker id attribute value
CC For both the API and the full text search: no need to go deep in the structure
CC of the STR profile: CLASTR is used for search in STR profiles and if one really wants
CC to get cell lines with a specific STR value for a marker lets do that in SPARQL
//
TG di
DE Disease(s) suffered by the individual from which the cell line originated with its NCI Thesaurus or ORDO identifier.
PR DI
XP /cell-line/disease-list
//
TG din
DE Disease(s) suffered by the individual from which the cell line originated, restricted to diseases having a NCI Thesaurus identifier.
PR DI NCIt
XP /cell-line/disease-list/xref[@database="NCIt"]
//
TG dio
DE Disease(s) suffered by the individual from which the cell line originated, restricted to diseases having an ORDO identifier.
PR DI ORDO
XP /cell-line/disease-list/xref[@database="ORDO"]
//
TG ox
DE Species of the individual from which the cell line originates with its NCBI taxon identifier.
PR OX
XP /cell-line/species-list
CC Pam: we miss the attribute values in solr indexes
//
TG sx
DE Sex of the individual from which the cell line originates.
PR SX
XP /cell-line/attribute::sex
//
TG ag
DE Age at sampling time of the individual from which the cell line was established.
PR AG
XP /cell-line/attribute::age
CC Will stop being an attribute and become an XML element
CC when links to developmental stage ontology are added
//
TG oi
DE Cell line(s) originating from same individual (sister cell lines).
PR OI
XP /cell-line/same-origin-as
//
TG hi
DE Parent cell line from which the cell line originates.
PR HI
XP /cell-line/derived-from
//
TG ch
DE Cell line(s) originated from the cell line (child cell lines).
PR CH
XP /cell-line/child-list/child
CC Maybe create a new XML element <children-list>
CC The only virtual field
CC Pam: I chose <child-list> to be more consstent with other <...-list> elements
//
TG ca
DE Category to which a cell line belongs, one of 14 defined terms. Example: cancer cell line, hybridoma, transformed cell line.
PR CA
XP /cell-line/attribute::category
//
TG dt
DE Creation date, last modification date and version number of the cell line Cellosaurus entry.
PR DT
XP /cell-line/attribute::created
XP /cell-line/attribute::last-updated
XP /cell-line/attribute::entry-version
//
TG dtc
DE Creation date of the cell line Cellosaurus entry.
PR DT
XP /cell-line/attribute::created
//
TG dtu
DE Last modification date of the cell line Cellosaurus entry.
PR DT
XP /cell-line/attribute::last-updated
//
TG dtv
DE Version number of the cell line Cellosaurus entry.
PR DT
XP /cell-line/attribute::entry-version
//