forked from arks-org/arkspec
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdraft-kunze-ark.xml.old
2745 lines (2444 loc) · 113 KB
/
draft-kunze-ark.xml.old
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY mdash '—' >
<!ENTITY rfc0854 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.0854.xml'>
<!ENTITY rfc1034 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.1034.xml'>
<!ENTITY rfc1321 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.1321.xml'>
<!ENTITY rfc2141 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2141.xml'>
<!ENTITY rfc2288 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2288.xml'>
<!ENTITY rfc2611 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2611.xml'>
<!ENTITY rfc2616 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2616.xml'>
<!ENTITY rfc2822 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2822.xml'>
<!ENTITY rfc2915 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2915.xml'>
<!ENTITY rfc3986 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.3986.xml'>
<!ENTITY rfc5013 PUBLIC '' 'http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5013.xml'>
]>
<!-- Extra statement used by XSLT processors to control the output style. -->
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<rfc category="info"
docName="draft-kunze-ark-latest"
ipr="trust200902">
<!-- Processing Instructions- PIs (for a complete list and description,
see file http://xml.resource.org/authoring/README.html and below. -->
<!-- Try to enforce the ID-nits conventions and DTD validity. -->
<?rfc strict="yes"?>
<!-- Items used when reviewing the document -->
<?rfc comments="no"?> <!-- Controls display of <cref> elements. -->
<?rfc inline="no"?> <!-- When no, put comments in end section. -->
<!-- When yes, insert editing marks: editing marks consist of a string
such as <29> printed in the blank line at the beginning of each
paragraph of text. -->
<?rfc editing="no"?>
<!-- Create Table of Contents (ToC) and set some options for it. -->
<?rfc toc="yes"?>
<!-- If "yes" eliminate blank lines before main section entries. -->
<?rfc tocompact="yes"?>
<!-- Set the number of levels of sections/subsections... in ToC. -->
<?rfc tocdepth="3"?>
<!-- Options for the references. -->
<!-- Some like symbolic tags in the references (and citations) and others
prefer numbers. The RFC Editor always uses symbolic tags.
The tags used are the anchor attributes of the references. -->
<?rfc symrefs="yes"?>
<!-- If "yes", causes the references to be sorted in order of tags.
This doesn't have any effect unless symrefs is "yes" also. -->
<?rfc sortrefs="yes"?>
<!-- These two save paper: Just setting compact to "yes" makes savings by
not starting each main section on a new page but does not omit the
blank lines between list items. If subcompact is also "yes" the blank
lines between list items are also omitted. -->
<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>
<front>
<!-- The abbreviated title is used in the page header - it is only necessary
if the full title is longer than 42 characters -->
<title abbrev="ARK">
The ARK Identifier Scheme
</title>
<author initials="J." surname="Kunze"
fullname="John A. Kunze">
<organization>
California Digital Library
</organization>
<address>
<postal>
<street>1111 Franklin Street</street>
<city>Oakland</city> <region>CA</region>
<code>94607</code>
<country>USA</country>
</postal>
<email>[email protected]</email>
</address>
</author>
<author initials="E." surname="Bermès"
fullname="Emmanuelle Bermès">
<organization>
Bibliothèque nationale de France
</organization>
<address>
<postal>
<street>Quai François Mauriac</street>
<city>Paris</city>
<code>75706</code>
<country>France</country>
</postal>
<email>[email protected]</email>
</address>
</author>
<date year="2022" />
<!--
NB: all elements of date are required for IETF submissions
Submission form at https://datatracker.ietf.org/idst/upload.cgi
creates for example https://tools.ietf.org/html/draft-kunze-ark-18
Differencing tool example
https://www.ietf.org/rfcdiff?url1=draft-kunze-ark-33.txt&url2=draft-kunze-ark-34.txt
-->
<abstract>
<t>
The ARK (Archival Resource Key) naming scheme is designed to facilitate
the high-quality and persistent identification of information objects.
The label "ark:" marks the start of a core ARK identifier that can be made
actionable by prepending the beginning of a URL. Meant to be usable after
today's networking technologies become obsolete, that core should be
recognizable in the future as a globally unique ARK independent of the URL
hostname, HTTP, etc. A founding principle of ARKs is that persistence is purely
a matter of service and neither inherent in an object nor conferred on it by
a particular naming syntax. The best any identifier can do is lead users to
services that support robust reference. A full-functioning ARK leads the user
to the identified object and, with the "?info" inflection appended, returns
a metadata record and a commitment statement that is both human- and
machine-readable. Tools exist for minting, binding, and resolving ARKs.
</t>
</abstract>
<!--
The term ARK itself refers both
to the scheme and to any single identifier that conforms to it.
An ARK has five components:
</t>
<t>
[https://NMA/]ark:[/]NAAN/Name[Qualifiers]
</t>
<t>
an optional and mutable Name Mapping Authority (usually a
hostname), the "ark:" label, the Name Assigning Authority Number (NAAN),
the assigned Name, and an optional and possibly mutable Qualifier
supported by the NMA. The NAAN and Name together form the immutable
persistent identifier for the object independent of the URL hostname.
An ARK is a special kind of URL that connects users to three things:
the named object, its metadata, and the provider's promise about its
persistence. When entered into the location field of a Web browser,
the ARK leads the user to the named object. That same ARK, inflected by
appending "?info", returns a metadata record that
is both human- and machine-readable. The returned record contains core
metadata and a commitment statement from the current provider.
Tools exist for minting, binding, and resolving ARKs.
</t>
-->
<note title="Responsibility for this Document">
<t>
The ARK Alliance Technical Working Group <xref target="ARKAtech"/>
is responsible for the content of this Internet Draft. The group homepage lists
monthly meeting notes and agendas starting from March 2019. Revisions of the
spec are maintained on github at <xref target="ARKdrafts"/>.
</t>
</note>
</front>
<middle>
<section title="Introduction">
<t>
This document describes a scheme for the high-quality naming of
information resources. The scheme, called the Archival Resource
Key (ARK), is well suited to long-term access and identification
of any information resources that accommodate reasonably regular
electronic description. This includes digital documents, databases,
software, and websites, as well as physical objects (books, bones,
statues, etc.) and intangible objects (chemicals, diseases,
vocabulary terms, performances). Hereafter the term "object" refers
to an information resource. The term ARK itself refers both to the
scheme and to any single identifier that conforms to it. A reasonably
concise and accessible overview and rationale for the scheme is
available at <xref target="ARK"/>.
</t>
<t>
Schemes for persistent identification of network-accessible objects
are not new. In the early 1990's, the design of the Uniform Resource
Name <xref target="RFC2141"/> responded to the observed failure rate of
URLs by articulating an indirect, non-hostname-based naming scheme and
the need for responsible name management. Meanwhile, promoters of the
Digital Object Identifier <xref target="DOI"/> succeeded in building
a community of providers around a mature software system <xref
target="Handle"/> that supports name management. The Persistent Uniform
Resource Locator <xref target="PURL"/> was another scheme that had the
advantage of working with unmodified web browsers. ARKs represent an
approach that attempts to build on the strengths and to avoid the
weaknesses of these schemes. For example, like URNs, ARKs have an internal
label ("ark:") to help them be recognizable as globally unique identifiers
in a post-HTTP Internet. Unlike DOIs and Handles, ARKs can be created without
centralized fee-based infrastructures. ARK resolvers can take advantage of
advanced resolution features such as content negotiation (like DOIs) and suffix
passthrough <xref target="SPT"/> (similar to PURL partial redirects).
Like PURLs, ARKs openly embrace URLs as the best current choice for
actionability.
</t>
<t>
A founding principle of the ARK is that persistence is purely a matter
of service. Persistence is neither inherent in an object nor conferred
on it by a particular naming syntax. Nor is the technique of name
indirection — upon which URNs, Handles, DOIs, and PURLs are
founded —
of central importance. Name indirection is an ancient and well-understood
practice; new mechanisms for it keep appearing and distracting practitioner
attention, with the Domain Name System (DNS) <xref target="RFC1034"/>
being a particularly dazzling
and elegant example. What is often forgotten is that maintenance of an
indirection table is an unavoidable cost to the
organization providing persistence, and that cost is equivalent across
naming schemes. That indirection has always been a native part of the web
while being so lightly utilized for the persistence of web-based objects
indicates how unsuited most organizations will probably be to the task of
table maintenance and to the much more fundamental challenge of keeping
the objects themselves viable.
</t>
<t>
Persistence is achieved
through a provider's successful stewardship of objects and their
identifiers. The highest level of persistence will be reinforced by a
provider's robust contingency, redundancy, and succession strategies.
It is further safeguarded to the extent that a provider's mission is
shielded from funding and political instabilities. These are by far
the major challenges confronting persistence providers, and no
identifier scheme has any direct impact on them. In fact, some
schemes may actually be liabilities for persistence because they create
short- and long-term dependencies for every object access on complex,
special-purpose infrastructures, parts of which are
proprietary and all of which increase the carry-forward burden for
the preservation community. It is for this reason that the ARK scheme
relies only on educated name assignment and light use of general-purpose
infrastructures that are maintained mostly by the Internet community at
large (the DNS, web servers, and web browsers).
</t>
<t>
As purely a matter of service, persistence is difficult, not known to
be commercially attractive, and likely to be undertaken by only a small
fraction of content providers that have preservation in their mission.
This vision runs counter to some early predictions that technology-backed
persistent identifiers would somehow become ubiquitous. On the plus side,
persistent identifier solutions should not need to be "internet scale".
</t>
<section title="Reasons to Use ARKs">
<t>
If no persistent identifier scheme contributes directly to persistence,
why not just use URLs? A particular URL may be as durable an identifier
as it is possible to have, but nothing distinguishes it from an ordinary
URL to the recipient who is wondering if it is suitable for long-term
reference. An ARK embedded in a URL provides some of the necessary
conditions for credible persistence, inviting access to not one, but to
three things: to the object, to its metadata, and to a nuanced statement
of commitment from the provider in question (the NMA, described below)
regarding the object. Existence of the extra service can be probed
automatically by appending "?info" to the ARK.
</t>
<t>
The form of the ARK also supports the natural separation of naming
authorities into the original name assigning authority and the diverse
multiple name mapping (or servicing) authorities that in succession and
in parallel will take over custodial responsibilities from the original
assigner (assuming the assigner ever held that responsibility) for the
large majority of a long-term object's archival lifetime. The name
mapping authority, indicated by the hostname part of the URL that
contains the ARK, serves to launch the ARK into cyberspace. Should it
ever fail (and there is no reason why a well-chosen hostname for a
100-year-old cultural memory institution shouldn't last as long as the
DNS), that host name is considered disposeable and replaceable. Again,
the form of the ARK helps because it defines exactly how to recover the
core immutable object identity, and simple algorithms (one based on the
URN model) or even by-hand Internet query can be used for for locating
another mapping authority.
</t>
<t>
There are tools to assist in generating ARKs and other identifiers, such
as <xref target="NOID"/> and "uuidgen", both of which rely for uniqueness
on human-maintained registries. This document also contains some
guidelines and considerations for managing namespaces and choosing
hostnames with persistence in mind.
</t>
</section>
<section title="Three Requirements of ARKs">
<t>
The first requirement of an ARK is to give users a link from an object to
a promise of stewardship for it. That promise is a multi-faceted covenant
that binds the word of an identified service provider to a specific set of
responsibilities. It is critical for the promise to come from a current
provider and almost irrelevant, over a long period of time, what the
original assigner's intentions were. No one can tell if successful
stewardship will take place because no one can predict the future.
Reasonable conjecture, however, may be based on past performance. There
must be a way to tie a promise of persistence to a provider's
demonstrated or perceived ability — its reputation — in that
arena. Provider reputations would then rise and fall as promises are
observed variously to be kept and broken. This is perhaps the best way
we have for gauging the strength of any persistence promise.
</t>
<t>
The second requirement of an ARK is to give users a link from an object
to a description of it. The problem with a naked identifier is that
without a description real identification is incomplete. Identifiers
common today are relatively opaque, though some contain ad hoc clues
reflecting assertions that were briefly true, such as where in a
filesystem hierarchy an object lived during a short stay. Possession of
both an identifier and an object is some improvement, but positive
identification may still be uncertain since the object itself might not
include a matching identifier or might not carry evidence obvious enough
to reveal its identity without significant research. In either case,
what is called for is a record bearing witness to the identifier's
association with the object, as supported by a recorded set of object
characteristics. This descriptive record is partly an identification
"receipt" with which users and archivists can verify an object's identity
after brief inspection and a plausible match with recorded
characteristics such as title and size.
</t>
<t>
The final requirement of an ARK is to give users a link to the object
itself (or to a copy) if at all possible. Persistent identification plays
a vital supporting role but, strictly speaking, it can be construed as no more
than a record attesting to the original assignment of a never-reassigned
identifier. Object access may not be feasible for various reasons, such
as a transient service outage, a catastrophic loss, a licensing agreement
that keeps an archive "dark" for a period of years, or when an object's
own lack of tangible existence confuses normal concepts of access (e.g.,
a vocabulary term might be "accessed" through its definition). In such
cases the ARK's identification role assumes a much higher profile. But
attempts to simplify the persistence problem by decoupling access from
identification and concentrating exclusively on the latter are of
questionable utility. A perfect system for assigning forever unique
identifiers might be created, but if it did so without reducing access
failure rates, no one would be interested. The central issue — which
may be crudely summed up as the "HTTP 404 Not Found" problem — would not
have been addressed.
</t>
<t>
The central duty of an ARK is a high-quality experience of access and
identification. This means supporting reliable access during the period
described in its stewardship promise and, failing that, supporting reliable
access to a record describing the thing the ARK is associated with.
</t>
<t>
ARK resolvers must support the "?info" inflection for requesting metadata.
Older versions of this specification distinguished between two minimal
inflections: '?' (brief metadata) and '??' (more metadata).
While these older inflections are still reserved, because they have proven
hard to recognize in some environments, supporting them is optional.
</t>
</section>
<section title="Organizing Support for ARKs: Our Stuff vs. Their Stuff">
<t>
An organization and the user community it serves can often be seen to
struggle with two different areas of persistent identification: the Our
Stuff problem and the Their Stuff problem. In the Our Stuff problem,
we in the organization want our own objects to acquire persistent names.
Since we possess or control these objects, our organization tackles the
Our Stuff problem directly. Whether or not the objects are named by ARKs,
our organization is the responsible party, so it can plan for, maintain,
and make commitments about the objects.
</t>
<t>
In the Their Stuff problem, we in the organization want others' objects
to acquire persistent names. These are objects that we do not own or
control, but some of which are critically important to us. But because
they are beyond our influence as far as support is concerned, creating
and maintaining persistent identifiers for Their Stuff is not especially
purposeful or feasible for us to engage in. There is little that we can
do about someone else's stuff except encourage their uptake or adoption
of persistence services.
</t>
<t>
Co-location of persistent access and identification services is natural.
Any organization that undertakes ongoing support of true persistent
identification (which includes description) is well-served if it controls,
owns, or otherwise has clear internal access to the identified objects,
and this gives it an advantage if it wishes also to support persistent
access to outsiders. Conversely, persistent access to outsiders requires
orderly internal collection management procedures that include monitoring,
acquisition, verification, and change control over objects, which in turn
requires object identifiers persistent enough to support auditable
record keeping practices.
</t>
<t>
Although organizing ARK support under one roof thus tends to make sense,
object hosting can successfully be separated from name mapping. An
example is when a name mapping authority centrally provides uniform
resolution services via a protocol gateway on behalf of organizations that
host objects behind a variety of access protocols. It is also reasonable
to build value-added description services that rely on the underlying
services of a set of mapping authorities.
</t>
<t>
Supporting ARKs is not for every organization. By requiring specific,
revealed commitments to preservation, to object access, and to description,
the bar for providing ARK services is higher than for some other identifier
schemes. On the other hand, it would be hard to grant credence to a
persistence promise from an organization that could not muster the minimum
ARK services. Not that there isn't
a business model for an ARK-like, description-only service built on top
of another organization's full complement of ARK services. For example,
there might be competition at the description level for abstracting and
indexing a body of scientific literature archived in a combination of
open and fee-based repositories. The description-only service would
have no direct commitment to the objects, but would act as an intermediary,
forwarding commitment statements from object hosting services to requestors.
</t>
<t>
</t>
</section>
<section title="Definition of Identifier">
<t>
An identifier is not a string of character data — an identifier is an
association between a string of data and an object. This abstraction
is necessary because without it a string is just data. It's
nonsense to talk about a string's breaking, or about its being strong,
maintained, and authentic. But as a representative of an association,
a string can do, metaphorically, the things that we expect of it.
</t>
<t>
Without regard to whether an object is physical, digital, or conceptual,
to identify it is to claim an association between it and a representative
string, such as "Jane" or "ISBN 0596000278". What gives a claim
credibility is a set of verifiable assertions, or metadata, about the
object, such as age, height, title, or number of pages. In other words,
the association is made manifest by a record (e.g., a cataloging or other
metadata record) that vouches for it.
</t>
<t>
In the complete absence of any testimony (metadata) regarding an
association, a would-be identifier string is a meaningless sequence of
characters. To keep an externally visible but otherwise internal string
from being perceived as an identifier by outsiders, for example, it suffices
for an organization not to disclose the nature of its association. For
our immediate purpose, actual existence of an association record is more
important than its authenticity or verifiability, which are outside the
scope of this specification.
</t>
<t>
It is a gift to the identification process if an object carries its own
name as an inseparable part of itself, such as an identifier imprinted
on the first page of a document or embedded in a data structure element
of a digital document header. In cases where the object is large, unwieldy,
or unavailable (such as when licensing restrictions are in effect), a
metadata record that includes the identifier string will usually suffice.
That record becomes a conveniently manipulable object surrogate, acting
as both an association "receipt" and "declaration".
</t>
<t>
Note that our definition of identifier extends the one in use for Uniform
Resource Identifiers <xref target="RFC3986"/>. The present document still sometimes
(ab)uses the terms "ARK" and "identifier" as shorthand for the string
part of an identifier, but the context should make the meaning clear.
</t>
</section>
</section>
<section title="ARK Anatomy">
<t>
An ARK is represented by a sequence of characters (a string) that
contains the Label, "ark:", optionally preceded by the beginning
part of a URL. Here is a diagrammed example.
</t>
<figure>
<artwork>
ANATOMY OVERVIEW
================
Resolver Service Compact ARK
__________________ ______________________________
/ \/ \
https://example.org/ark:12345/x6np1wh8k/c3/s5.v7.xsl
\___________________________/\________/\___________/
Prefixes Base Name Suffixes
\__________________________________________________/
Mapping ARK </artwork>
</figure>
<t>
When embedded in a URL, an ARK consists of a Compact ARK preceded by a Resolver
Service. The larger URL-based ARK is known as a Mapping ARK because it is ready
to be mapped (resolved) to an information response (eg, a PDF or metadata).
A Mapping ARK is also know as a "fully qualified ARK".
The Resolver Service, which need not be limited to URLs in the future, maps the
URL according to rules and abilities of an NMA (Name Mapping Authority).
The same URL string minus the Resolver Service component is known as a
Compact ARK. The Compact ARK is globally unique and may be resolvable
via different Resolver Services over time (eg, when one archive succeeds
another) or at the same time (eg, when one archive backs up another).
</t>
<t>
At a high level, after the Label comes the NAAN (Name Assigning Authority
Number) followed by the Name that it assigns to the identified thing.
The Base Name has Prefixes (NAAN, Label, possibly a Resolver Service) and
optional Suffixes to identify Parts and Variant forms. During resolution,
a Resolver Service such as n2t.net may be able to deal with inflections query
strings, and content negotiation.
</t>
<figure>
<artwork>
ANATOMY DETAILS
===============
Base Compact Name Qualifiers
_________________ ___________
/ \/ \
https://example.org/ark:12345/x6np1wh8k/c3/s5.v7.xsl
\_________/ \__/\___/\_/\_____/\____/\_____/
NMA Label NAAN | Blade Parts Variants
Shoulder
\_____________/
Check Zone </artwork>
</figure>
<t>
In a closer view, the Compact ARK consists of a Base Compact Name followed
potentially by Qualifiers. The Base Name often, but not necessarily,
consists of a Shoulder (for subdividing a NAAN namespace) followed by a
Blade. If a check character is present in an ARK, by convention it is the
right-most character of the Base Name, and will have been computed over
the string of characters preceding it back to the beginning of the NAAN.
This string, including the check character itself, is the Check Zone.
</t>
<texttable title="Example base, compact, and fully qualified form components."
align="center">
<preamble>
Like the ARK itself, the NAAN "12345" and Shoulder "x6" have compact
and fully qualified forms.
</preamble>
<ttcol align="center">Form</ttcol>
<ttcol align="center">Base</ttcol>
<ttcol align="center">Compact Form</ttcol>
<ttcol align="center">Fully Qualified Form</ttcol>
<c>NAAN</c>
<c>12345</c>
<c>ark:12345</c>
<c>https://example.org/ark:12345</c>
<c>Shoulder</c>
<c>x6</c>
<c>ark:12345/x6</c>
<c>https://example.org/ark:12345/x6</c>
</texttable>
<t>
The ARK syntax can be summarized,
</t>
<figure>
<artwork>
[https://NMA/]ark:[/]NAAN/Name[Qualifiers] </artwork>
</figure>
<t>
where the NMA, '/', and Qualifier parts are in brackets to indicate that
they are optional.
The Base Compact Name is the substring comprising the "ark:" label,
the NAAN and the assigned Name. The Resolver Service is replaceable and
makes the ARK actionable for a period of time. Without the Resolver Service
part, what remains is the Core Immutable Identity (the "persistible")
part of the ARK.
</t>
<section title="The Name Mapping Authority (NMA)">
<t>
Before the "ark:" label may appear an optional Name Mapping Authority (NMA)
that is a temporary address where ARK service requests may be sent.
Preceded by a URI-type protocol designation such as "https://",
it specifies a Resolver Service. The NMA itself is an Internet hostname
or host/port combination, optionally followed by URI-type path components,
all ending in a '/'. The hostname has the same format and semantics as the
host/port part of a URL. In any optional path that follows it, the path is
considered to end with the '/' in the first occurrence of "/ark:".
<!--
this permits NMAs that cannot easily control the top-level of their
URL-space (eg, hard to make some software support recognize Host + "/ark:"
but they can recognize Host + path + "/ark:"
-->
</t>
<t>
The most important thing about the NMA is that it is "identity inert"
from the point of view of object identification. In other words, ARKs
that differ only in the optional NMA part identify the same object.
Thus, for example, the following three ARKs are synonyms for just
one information object:
</t>
<figure>
<artwork>
http://example.org/rslvr/ark:12345/x6np1wh8k
https://example.com/ark:12345/x6np1wh8k
ark:12345/x6np1wh8k </artwork>
</figure>
<t>
Strictly speaking, in the realm of digital objects, these ARKs may lead
over time to somewhat different or diverging instances of the originally
named object. It can be argued that divergence of persistent objects is
not desirable, but it is widely believed that digital preservation efforts
will inevitably lead to alterations in some original objects (e.g, a format
migration in order to preserve the ability to display a document). If any
of those objects are held redundantly in more than one organization
(a common preservation strategy), chances are small that all holding
organizations will perform the same precise transformations and all
maintain the same object metadata. More significant divergence would
be expected when the holding organizations serve different audiences or
compete with each other.
</t>
<t>
The NMA part makes an ARK into an actionable URL. As with many Internet
parameters, it is helpful to approach the NMA being liberal in what you
accept and conservative in what you propose. From the recipient's point
of view, the NMA part should be treated as temporary, disposable, and
replaceable. From the NMA's point of view, it should be chosen with the
greatest concern for longevity. A carefully chosen NMA should be at
least as permanent as the providing organization's own hostname.
In the case of a national or university library, for example, there is
no reason why the NMA could not be considerably more permanent than
soft-funded proxy hostnames such as hdl.handle.net, dx.doi.org, and
purl.org. In general and over time, however, it is not unexpected for
an NMA eventually to stop working and require replacement with the NMA
of a currently active service provider.
</t>
<t>
This replacement relies on a mapping authority "resolver" discovery
process, of which two alternate methods are outlined in a later section.
The ARK, URN, Handle, and DOI schemes all use a resolver discovery model
that sooner or later requires matching the original assigning authority
with a current provider servicing that authority's named objects; once
found, the resolver at that provider performs what amounts to a redirect
to a place where the object is currently held. All the schemes rely on
the ongoing functionality of currently mainstream technologies such as the
Domain Name System <xref target="RFC1034"/> and web browsers.
The Handle and DOI schemes in addition require that the Handle protocol
layer and global server grid be available at all times.
</t>
<t>
The practice of prepending "https://" and an NMA to an ARK is a way of
creating an actionable identifier by a method that is itself temporary.
Assuming that infrastructure supporting <xref target="RFC2616"/> information retrieval will
no longer be available one day, ARKs will then have to be converted into
new kinds of actionable identifiers. By that time, if ARKs see widespread
use, web browsers would presumably evolve to perform this (currently
simple) transformation automatically.
</t>
</section>
<section title="The ARK Label Part (ark:)">
<t>
The label part distinguishes an ARK from an ordinary identifier.
There is a new form of the label, "ark:", and an old form, "ark:/",
both of which must be recognized in perpetuity. Implementations should
generate new ARKs in the new form (without the "/") and resolvers must
always treat received ARKs as equivalent if they differ only in regard
to new form versus old form labels. Thus these two ARKs are equivalent:
</t>
<figure>
<artwork>
ark:/12345/x6np1wh8k
ark:12345/x6np1wh8k </artwork>
</figure>
<t>
In a URL found in the wild, the label indicates that the URL stands a
reasonable chance of being an ARK.
If the context warrants, verification that it actually is an ARK
can be done by testing it for existence of the three ARK services.
</t>
<t>
Since nothing about an identifier syntax directly affects persistence,
the "ark:" label (like "urn:", "doi:", and "hdl:") cannot tell you
whether the identifier is persistent or whether the object is available.
It does tell you that the original Name Assigning Authority (NAA) had
some sort of hopes for it, but it doesn't tell you whether that NAA is
still in existence, or whether a decade ago it ceased to have any
responsibility for providing persistence, or whether it ever had any
responsibility beyond naming.
</t>
<t>
Only a current provider can say for certain what sort of commitment it
intends, and the ARK label suggests that you can query the NMA directly
to find out exactly what kind of persistence is promised. Even if what
is promised is impersistence (i.e., a short-term identifier), saying so
is valuable information to the recipient. Thus an ARK is a high-functioning
identifier in the sense that it provides access to the object, the
metadata, and a commitment statement, even if the commitment is
explicitly very weak.
</t>
</section>
<section title="The Name Assigning Authority Number (NAAN)">
<t>
Recalling that the general form of the ARK is,
</t>
<figure>
<artwork>
[https://NMA/]ark:[/]NAAN/Name[Qualifiers] </artwork>
</figure>
<t>
the part of the ARK directly following the "ark:" (or older "ark:/") label
is the Name Assigning Authority Number (NAAN), up to but not including
the next '/' (slash) character. This part is always required, as it
identifies the organization that originally assigned the Name of the object.
Typically the organization is an institution, a department, a laboratory, or
any group that conducts a stable, policy-driven name assigning effort.
An organization may request a NAAN from the ARK Maintenance Agency <xref
target="ARKagency"/> (described in <xref target="agency"/>) by filling out
the form <xref target="NAANrequest"/>.
</t>
<t>
For received ARKs, implementations must support a minimum NAAN length of
16 octets.
NAANs are opaque strings of one or more "betanumeric" characters, specifically,
</t>
<figure>
<artwork>
0123456789bcdfghjkmnpqrstvwxz </artwork>
</figure>
<t>
which consists of digits and consonants, minus the letter 'l'. Restricting
NAANs to betanumerics (alphanumerics without vowels or 'l') serves two goals.
It reduces the chances that words -- past, present, and future -- will appear
in NAANs and carry unintended semantics. It also helps usability by not mixing
commonly confused characters ('0' and 'O', '1' and 'l') and by being compatible
with strong transcription error detection (eg, the <xref target="NOID"/> check
digit algorithm). Since 2001, every assigned NAAN has consisted of
exactly five digits.
</t>
<t>
The NAAN designates a top-level ARK namespace. Once registered for a
namespace, a NAAN is never re-registered. It is possible, however,
for there to be a succession of organizations that manage an ARK
namespace.
</t>
<texttable title="Four NAANs shared across all ARK-assigning organizations."
align="center" anchor="sharedNAANs">
<preamble>
There are currently four NAANs available to all organizations.
An ARK bearing one of these NAANs carries a specific, immutable meaning
that recipients can rely on for long term pragmatic benefit as described
below.
</preamble>
<ttcol align="center">Shared NAAN meaning</ttcol>
<ttcol align="center">
The immutable purpose, meaning, or connotation of ARKs bearing this NAAN.
</ttcol>
<ttcol align="center">Expect to resolve?</ttcol>
<ttcol align="center">OK for long term reference?</ttcol>
<c>12345 examples</c>
<c>Example ARKs appearing in documentation. They might resolve, but link checkers usually need be concerned if they don’t. They should not be considered viable for long term reference.</c>
<c>maybe</c>
<c>no</c>
<c>99152 terms</c>
<c>ARKs for controlled vocabulary and ontology terms, such as metadata element names and pick-list values. They should resolve to term definitions and are suitable for long term reference.</c>
<c>yes</c>
<c>yes</c>
<c>99166 agents</c>
<c>ARKs for people, groups, and institutions as “agents” (actors, such as creators, contributors, publishers, performers, etc). They should resolve to agent definitions and are suitable for long term reference.</c>
<c>yes</c>
<c>yes</c>
<c>99999 test ids</c>
<c>ARKs for test, development, or experimental purposes, often at scale. They might resolve, but link checkers usually need be concerned if they don’t. They should not be considered viable for long term reference.</c>
<c>maybe</c>
<c>no</c>
<postamble>
To make use of a shared NAAN, an organization has several options
described in <xref target="shoulders" />.
</postamble>
</texttable>
</section>
<section title="The Name Part">
<t>
The part of the ARK just after the NAAN is the Name assigned by the NAA,
and it is also required. Semantic opaqueness in the Name part is
strongly encouraged in order to reduce an ARK's vulnerability to era- and
language-specific change. Identifier strings containing linguistic
fragments can create support difficulties down the road. No matter how
appropriate or even meaningless they are today, such fragments may one day
create confusion, give offense, or infringe on a trademark as the semantic
environment around us and our communities evolves.
</t>
<t>
Names that look more or less like numbers avoid common problems that
defeat persistence and international acceptance. The use of digits is
highly recommended. Mixing in non-vowel alphabetic characters (eg,
betanumerics) a couple at a time is a relatively safe and easy way to
achieve a denser namespace (more possible names for a given length of
the name string). Such names have a chance of aging and traveling well.
The absence of recognizable words makes typos harder to detect in opaque
strings, so a common mitigation is to add a check character. Tools exists that
mint, bind, and resolve opaque identifiers, with or without check characters
<xref target="NOID"/>. More on naming considerations is given in a subsequent
section.
</t>
<section title="Optional: Shoulders" anchor="shoulders">
<t>
Just as an ARK namespace is subdivided by NAANs reserved for NAAs, it is
generally advantageous for an NAA to subdivide its own NAAN namespace into
"shoulders", where each shoulder is reserved for an internal department or
unit.
Like the NAAN, which is a string of characters that follows the "ark:"
label, a shoulder is a string of characters (starting with a "/")
that extends the NAAN. The base compact name assigned by the NAA consists
of the NAAN, the shoulder, a final string known as the "blade". (The shoulder
plus blade terminology mirrors locksmith jargon describing the
information-bearing parts of a key.)
</t>
<t>
The blade string is chosen by the NAA such that the string created by
concatenating the NAAN plus shoulder plus blade becomes the unique base object
name. Otherwise the blade may come from any source, for example, it might
come from a counter, a timestamp, a <xref target="NOID"/> minter,
a legacy 100-year-old accession number, etc. If there is a check digit,
it is expected to appear at the end of the blade and to be computed over
the base compact name minus the label part (see Check Zone),
which is generally the most important part of an
ARK to make opaque. In particular, check digits are not expected to
cover qualifiers, which often name subobjects of a persistent object
that are less stable and less opaquely named than the parent object
(for example, ten years hence, the object's thumbnail image will be of
a higher resolution and the OCR text file will be re-derived with improved
algorithms.
</t>
<t>
It is important not to use any delimiter between the shoulder string
and blade string, especially not a "/" since it declares an object
boundary (see the section on ARKs that reveal object hierarchy).
</t>
<figure>
<artwork>
ark:12345/x6np1wh8k/c2/s4.pdf # correct primordinal shoulder
ark:12345/x6/np1wh8k/c2/s4.pdf # INCORRECT
^ WRONG </artwork>
</figure>
<t>
This little bit of discretion shields organizations from end users making
inferences about expected levels of support based on recognizable
shoulders. To help in-house ARK administrators reliably know where the
shoulder ends, it is recommended to use the "first-digit convention"
so that shoulders are "primordinal". A primordinal shoulder is a
sequence of one or more betanumeric characters ending in a digit,
as shown above. This means
that the shoulder is all consonant letters (often just one) after the NAAN
and "/" up to and including the first digit encountered after the NAAN.
One property of primordinal shoulders is that there is an infinite number
of them possible under any NAAN.
</t>
<t>
To help manage each namespace into the future, NAAs are encouraged to
create shoulders, even if there is only one to start with.
If an organization wishes to create a shoulder under one of shared NAANs
(99999, 12345, 99152, or 99166, described in <xref target="sharedNAANs"/>),
it should fill out the Shoulder Request Form <xref target="shoulderrequest"/>.
</t>
</section>
</section>
<section title="The Qualifier Part">
<t>
The part of the ARK following the NAA-assigned Name is an optional
Qualifier. It is a string that extends the Base Name in order to create
a kind of service entry point into the object named by the NAA. At the
discretion of the providing NMA, such a service entry point permits an
ARK to support access to individual hierarchical components and
subcomponents of an object, and to variants (versions, languages, formats)
of components. A Qualifier may be invented by the NAA or by any NMA
servicing the object.
</t>
<t>
In form, the Qualifier is a ComponentPath, or a VariantPath, or a
ComponentPath followed by a VariantPath. A VariantPath is introduced
and subdivided by the reserved character '.', and a ComponentPath is
introduced and subdivided by the reserved character '/'. In this
example,
</t>
<figure>
<artwork>
https://example.org/ark:12345/x6np1wh8k/c3/s5.v7.xsl </artwork>
</figure>
<t>
the string "/s3/f8" is a ComponentPath and the string ".v05.tiff" is a
VariantPath. The ARK Qualifier is a
formalization of some currently mainstream URL syntax conventions.
This formalization specifically reserves meanings that permit
recipients to make strong inferences about logical sub-object containment
and equivalence based only on the form of the received identifiers;
there is great efficiency in not having to inspect metadata records
to discover such relationships. NMAs are free not to disclose any of
these relationships merely by avoiding the reserved characters above.
Hierarchical components and variants are discussed further in the
next two sections.
</t>
<t>
The Qualifier, if present, differs from the Name in several
important respects. First, a Qualifier may have been assigned either by
the NAA or later by the NMA. The assignment of a Qualifier by an NMA
effectively amounts to an act of publishing a service entry point within
the conceptual object originally named by the NAA. For our purposes,
an ARK extended with a Qualifier assigned by an NMA will be called an
NMA-qualified ARK.
</t>
<t>
Second, a Qualifier assignment on the part of an NMA is made in fulfillment
of its service obligations and may reflect changing service expectations
and technology requirements. NMA-qualified ARKs could therefore be
transient, even if the base, unqualified ARK is persistent. For example,
it would be reasonable for an NMA to support access to an image object
through an actionable ARK that is considered persistent even if the
experience of that access changes as linking, labeling, and presentation
conventions evolve and as format and security standards are updated.
For an image "thumbnail", that NMA could also support an NMA-qualified
ARK that is considered impersistent because the thumbnail will be
replaced with higher resolution images as network bandwidth and CPU
speeds increase. At the same time, for an originally scanned,
high-resolution master, the NMA could publish an
NMA-qualfied ARK that is itself considered persistent. Of course, the
NMA must be able to return its separate commitments to unqualified,
NAA-assigned ARKs, to NMA-qualified ARKs, and to any NAA-qualified ARKs
that it supports.
</t>
<t>
A third difference between a Qualifier and a Name concerns the semantic
opaqueness constraint. When an NMA-qualified ARK is to be used as a
transient service entry point into a persistent object, the priority
given to semantic opaqueness observed by the NAA in the Name part may be
relaxed by the NMA in the Qualifier part. If service priorities in the