Table of Contents
- Authentication
- File operations
- Directory operations
- Metalink
- Properties
- Extended Attributes
- Requesting macaroons
- Third-party transfers
From the corresponding English language Wikipeda entry,
Web Distributed Authoring and Versioning (WebDAV) is an extension of the Hypertext Transfer Protocol (HTTP) that allows clients to perform remote Web content authoring operations.
In simple terms, HTTP allows a client to upload, download and delete files, while WebDAV allows filesystem-like operations, such as to rename files and list directory contents.
Due to its overwhelming popularity, there are many HTTP clients.
Although WebDAV is less popular, there are still many clients from
which you can choose. In this chapter, we will use curl
to
illustrate most HTTP operations, and rclone
as a specific WebDAV
client. Other clients should also work and you should not read these
choices as an endorsement of those clients over others.
Authentication is the process where the client proves the identity of the user. Perhaps the most common is password based authentication, where the client proving to the server that it knows some secret code supplied by the user.
HTTP is very flexible in how it handles authentication, with may different ways a client can prove its identity. Several of these options are available to dCache clients.
Broadly speaking there are two ways of authenticating: the
Authorization
HTTP request header (which often uses some bearer
token) and through SSL/TLS.
A bearer token is a token that requires no interaction to authenticate: supplying the token as part of the request is sufficient. This is simpler than the alternatives, but comes at a cost: any agent able to observe the HTTP request has the token and can subsequently impersonate the valid client. Encryption is mandatory when using bearer tokens; however, even with transport encryption (such as SSL/TLS), bearer tokens are inherently risky, and often use restrictions to reduce the impact should they be stolen.
SSL/TLS authentication, in contrast to Authorization
header
authentication, happens before the HTTP requests. After establishing
the TCP connection, a TLS handshake takes place to ensure the
connection is encrypted. During this TLS handshake, the client can
authenticate. Unlike bearer tokens, this process is iterative. This
allows the client to authenticate without revealing all information,
allowing the authentication to take place before the encrypted
connection is established. When using TLS-based authentication, the
client makes requests without any Authorization
HTTP request
headers.
This section describes the different authentication options that dCache supports.
Please note that the actual authentication supported by any specific dCache instance is controlled by the server's configuration, so you may not have access to all these authentication options.
Basic authentication is the simplest scheme. It involves the client sending the username and complete password to dCache.
The following example shows curl using Basic authentication, prompting the user to enter their password.
curl -u paul https://dcache.example.org/users/paul/private-file
|Enter host password for user 'paul':
Although this approach is very simple and widely supported by clients,
it relies on the network connection to encrypt the content. If basic
authentication is used with an unencrypted request (a URL starting
with http://
) then the password will be sent unencrypted over the
network. Anyone who is able to capture the network traffic will learn
the username and password, and the user's account is compromised.
To counter this problem, by default dCache will reject all basic
authentication if the connection is unencrypted, with only URLs
starting https://
being accepted.
X.509 is a technology that uses asymmetric encryption for authentication. Asymmetric encryption means each identity has two keys: the public key and the private key. After some identity vetting process, an organisation (known as the certification authority) will issue a certificate, which contains the public key and some identity information. This certificate (along with the private key) is then used to prove the identity over a network connection.
TLS is the protocol used to establish encrypted network connections between web-browsers and servers. This protocol supports X.509 for authentication.
This X.509 authentication is heavily used in the world-wide web. When establishing encrypted network connections, web-browsers will check the identity of the web server. This TLS authentication involves the web-server sends its certificate and subsequently responding to the browser's challenge, so proving the server has the corresponding private key.
The TLS protocol also supports the client authentication with X.509: the client sends a certificate and responds to the server's subsequent challenge, so proving the client has the corresponding private key. This is much less common, but most popular web-browsers provide at least some support for this.
By default, dCache allows clients to authenticate using X.509, although this can only work for encrypted connections.
The following example shows curl authenticating with X.509:
curl -E ~/.globus/usercert.pem --key ~/.globus/userkey.pem https://dcache.example.org/users/paul/private-file
Enter PEM pass phrase:
It is possible to create a proxy credential from an existing X.509 credential. A proxy credential is a credential that identifies the same person, but with a much shorter lifetime (e.g., 12 hours). Such credentials may be stored on the filesystem without a password (trusting the filesystem permissions) and transferred to remote agents, so they can operate on behalf of the user.
The following example shows curl authentication with a proxy X.509 credential.
curl -E /tmp/x509up_u1000 https://dcache.example.org/users/paul/private-file
The file /tmp/x509up_u1000
contains the user's certificate, the
proxy certificate and the proxy private key.
Earlier versions of curl require both the -E
and the --key
options:
curl -E /tmp/x509up_u1000 --key /tmp/x509up_u1000 https://dcache.example.org/users/paul/private-file
Yet earlier versions of curl require a hack to ensure it sends both
the user and proxy certificates. The --cacert
options must also be
specified:
curl --cacert /tmp/x509up_u1000 -E /tmp/x509up_u1000 --key /tmp/x509up_u1000 https://dcache.example.org/users/paul/private-file
The TLS protocol, which allows the web client and web server to secure a TCP connection, has an extension called SPNEGO. SPNEGO allows the client and server to negotiate which authentication scheme is used (instead of X.509). The most common reason to use SPNEGO is to support Kerberos-based authentication.
Here is an example where curl uses Kerberos (via SPNEGO) to authenticate:
curl --negotiate -u : https://dcache.example.org/users/paul/private-file
Macaroons are bearer tokens that have caveats embedded within the token. In general, these caveats restrict who can use the macaroon, for how long the token may be used, which operations are allowed, or which files or directories may be targeted. A typical use-case is to create a macaroon that allows the bearer to download a specific file for a limited period.
A client can use a macaroon in two ways: in the Authorization HTTP request header or in the URL query part.
The authorisation request header allows the HTTP client to provide
dCache with information about the clients identity. To support
macaroon-based request authorisation, the client may include the
bearer MACAROON
value to the Authorization
request header, where
MACAROON
is the actual macaroon.
The following example shows curl making a request authorised using a macaroon:
curl -H "Authorization: bearer $MACAROON" https://dcache.example.org/users/paul/private-file
This assumes that the macaroon is stored in the environment variable
MACAROON
.
Not all clients support adding a custom request Authorization
request header. Web-browsers are common examples of such clients. For
such clients, dCache supports an alternative approach: including the
token in the URL.
The query part of the URL contains key-value pairs. dCache will
recognise the authz
key and accept the corresponding value as a
bearer token.
The following example shows curl making a request authorised using a macaroon embedded within the URL:
curl https://dcache.example.org/users/paul/private-file?authz=MACAROON
Where MACAROON
is replaced by the actual macaroon.
Embedding the macaroon within the URL has the advantage of providing a URL that will "just work" for most clients. For example, if a macaroon is created that targets a specific file and is valid for 10 minutes then a macaroon-embedded URL could be shared with any client, allowing that client to fetch the content from dCache without further authentication.
Such embedded macaroons may be used to support many advanced work-flows.
A SciToken server is an OAuth2 server that issues its clients with a token that describes what that client is allowed to do.
If configured to do so, dCache will accept SciTokens and allow operations that are compatible with the list of operations contained within the SciToken token.
The client supplies the SciToken token as a bearer token: either using the HTTP Authorization request header or embedded within the URL.
OpenID-Connect is a protocol, based on OAuth2, that allows the client to obtain a token that may be used to identify the user.
If configured to do so, dCache will accept OpenID-Connect access tokens and authenticate the user based on those tokens.
The client supplies the access token as a bearer token: either using the HTTP Authorization request header or embedded within the URL.
This section describes operations that target a file, such as uploading, downloading, discovering a file's metadata and deleting a file.
Redirection is an important feature of dCache's HTTP support. With redirection, dCache will respond to a file transfer operation (an upload or a download) with the HTTP response that tells the client to transfer the data directly with the server that has the files data (for downloads) or that will accept the data (for uploads).
By default, when the client makes a request for data, dCache will
respond with a 307
status code, with the Location
response header
containing the URL that describes the data server from which the
client can download the data directly.
This redirection is a standard feature of the HTTP protocol.
Therefore most HTTP clients support redirection on download; for
example, curl supports following redirection if the -L
command-line
option is supplied.
Although WebDAV is based on HTTP, WebDAV servers typically operate with a restricted set of responses. Therefore, many WebDAV clients do not support the full set of valid HTTP responses. As a specific example of this, most WebDAV servers never issue a redirection status code. Therefore, support in WebDAV clients for redirection on download is much poorer than for simple HTTP clients.
Redirection for upload is a non-standard HTTP extension, first introduced by the Amazon S3 service. Through the popularity of the S3 protocol, this extension is now widely supported by HTTP clients.
The redirection is an extension of the standard Expect-100 interaction.
With Expect-100, the client initiates the upload by sending just the request headers, without yet sending any of the file's data. One of the HTTP request headers requests dCache makes an intermediate (non-final) reply, confirming it will accept the pending data. This allows dCache to check the user is authorised to upload the data and that it has sufficient capacity to store the file before the client sends the file's data.
If not redirecting uploads and it will accept the upload then dCache
replies with the expected 100
intermediate status code; on receiving
this status code, the client sends the file's data. However, if
dCache knows that the upload cannot succeed (e.g., because the user is
not authorised to upload) then it replies with the appropriate error
status code. This process allows the client to avoid sending large
files if dCache knows from the request headers that the upload will be
rejected.
To support redirection, the S3 extension uses Expect-100 as above.
However, instead of replying with a 100-Continue response, dCache
replies with a 307
(temporary redirect) status code, with the
Location
response header indicating the URL to which the client
should send the data data.
Clients that do not support redirection-on-upload will likely consider
any non 100
status code as an error, and fail the upload.
dCache has a fall-back behaviour: if it replies with a redirection response, but the client sends the data (to the WebDAV door) anyway then dCache will proxy the data.
As with redirection on download, support within WebDAV for redirection on upload is poor.
The Expect-100 behaviour was introduced with HTTP v1.1. For HTTP v1.0 servers, uploading data using Expect-100 would result in a deadlock: the server is waiting for the data and the client is waiting for the 100 status code.
To avoid this deadlock, curl contains a timeout for Expect-100 responses: if dCache does not reply to the initial (header-only) request with a 100-Continue response quickly enough then curl believes dCache to be an HTTP v1.0 server and will send the file's content.
Note that, under these circumstances, the WebDAV door will accept the file's data and send it to the pool (data server) on behalf of the client. However, this places a higher load on the WebDAV door and will likely result in poorer performance.
dCache replies with a redirection only once it knows to data server will accept the data, and that the server is ready to accept the data. If all the storage nodes are busy, this may take some time to establish. While this is happening, dCache has not replied to the client's initial header-only request. If this process takes too long, curl considers the server as implementing HTTP v1.0 and will send the data.
By default, curl will wait one second after sending the expect-100
request. Since curl v7.47.0, this timeout is controlled by the
--expect100-timeout
command-line option.
Therefore, it is recommended to use curl v7.47.0 or later and to
specify the --expect100-timeout
option with a large timeout value.
Checksums are often used to verify a file's integrity.
There are three ways of discovering or otherwise influencing checksums for files stored in dCache: RFC 3230 headers, WebDAV properties, and Content-MD5 headers.
The supported checksums in dCache are the following:
ADLER32
, MD5
, MD4
, SHA-1
, SHA-256
, SHA-512
.
RFC 3230 allows the client to request that a file's checksum is sent
as part of the server's response. This is done by specifying the
Want-Digest
request header with a value describing which checksum
algorithms are acceptable to the client; for example, the request
header Want-Digest: ADLER32
requests the server provides the ADLER32
checksum in the response, while Want-Digest: SHA-256,SHA;q=0.5
indicates the client would prefer the SHA-256 checksum but to supply the
-SHA-1 if the SHA-256 checksum is unavailable.
dCache supports RFC 3230 requests on HEAD, GET, PUT and some COPY requests.
The HEAD request is typically used to fetch a file's metadata without fetching the content of the file. The following example shows curl fetching a file's checksum information through a HEAD request:
curl -H 'Want-Digest: adler32' -I http://dcache.example.org/public/my-data
|HTTP/1.1 200 OK
|Date: Mon, 23 Sep 2019 10:06:24 GMT
|Server: dCache/6.0.0-SNAPSHOT
|Accept-Ranges: bytes
|ETag: "000002BE4A36C1C84AD08BC5693EABB040E7_1548461363"
|Last-Modified: Mon, 23 Sep 2019 04:05:24 GMT
|Content-Type: application/octet-stream
|Digest: adler32=5ae07809
|Content-Length: 63296477
The Digest
response header contains the request adler32 checksum value.
If the file's content is also required, both the file's data and its
checksum value may be obtained in a single request, by specifying the
Want-Digest
request header in the GET request.
The following example shows curl obtaining the file's data along with its checksum.
curl -L -D my-data.headers -H 'Want-Digest: adler32' -o my-data http://dcache.example.org/public/my-data
| 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
|100 60.3M 100 60.3M 0 0 820k 0 0:01:15 0:01:15 --:--:-- 731k
grep Digest: /tmp/headers
|Digest: adler32=5ae07809
When uploading data, it may be useful to know the checksum of the uploaded data. This is possible by issuing a HEAD request after a successful PUT request. As an optimisation, it is also possible to obtain the checksum as part of the PUT request's response headers.
The following example shows curl obtaining the checksum of a freshly
uploaded file. Unlike the previous examples, this upload request is
authenticated (-E /tmp/x509up_u1000
).
curl -D- -L -T /bin/bash -E /tmp/x509up_u1000 -H 'Want-Digest: adler32' https://dcache.example.org/public/file
|HTTP/1.1 100 Continue
|
|HTTP/1.1 201 Created
|Date: Mon, 23 Sep 2019 10:17:05 GMT
|Server: dCache/6.0.0-SNAPSHOT
|Accept-Ranges: bytes
|ETag: "00009E56B0A0DAC5481C9BC339FAE6F7D196_1570761966"
|Digest: adler32=af543afc
|Transfer-Encoding: chunked
There is an important benefit to including the RFC-3230 Want-Digest
header in the PUT request. Typically, dCache will only calculate
checksum value for configured algorithms when it receives a new file.
By default, dCache calculates the ADLER32 checksum and not the MD5
checksum. Therefore, by default, subsequent HEAD (or GET) requests
for the MD5 checksum will not provide this information:
curl -I -H 'Want-Digest: md5' https://dcache.example.org/public/file
|HTTP/1.1 200 OK
|Date: Mon, 23 Sep 2019 10:26:27 GMT
|Server: dCache/6.0.0-SNAPSHOT
|Accept-Ranges: bytes
|ETag: "00009E56B0A0DAC5481C9BC339FAE6F7D196_1570761966"
|Last-Modified: Mon, 23 Sep 2019 10:17:05 GMT
|Content-Length: 1099016
However, by specifying a desire for the MD5 checksum using the
Want-Digest
in the PUT request, dCache can ensure this value is
calculated as it receives the file's data:
curl -D- -L -T /bin/bash -E /tmp/x509up_u1000 -H 'Want-Digest: md5' https://dcache.example.org/public/file-with-MD5
|HTTP/1.1 100 Continue
|
|HTTP/1.1 201 Created
|Date: Mon, 23 Sep 2019 10:28:27 GMT
|Server: dCache/6.0.0-SNAPSHOT
|Accept-Ranges: bytes
|ETag: "00006BC3FF06CC1047A291006C36DCDC252A_1571445045"
|Digest: md5=rFb0uPrFc5zNtFd30xO+zw==
|Transfer-Encoding: chunked
Even if the upload Digest
response header (containing the desired
MD5 checksum value) is ignored, a subsequent HEAD request will yield
the desired checksum value:
curl -I -H 'Want-Digest: md5' https://dcache.example.org/public/file-with-MD5
|HTTP/1.1 200 OK
|Date: Mon, 23 Sep 2019 10:31:14 GMT
|Server: dCache/6.0.0-SNAPSHOT
|Accept-Ranges: bytes
|ETag: "00006BC3FF06CC1047A291006C36DCDC252A_1571445045"
|Last-Modified: Mon, 23 Sep 2019 10:28:27 GMT
|Digest: md5=rFb0uPrFc5zNtFd30xO+zw==
|Content-Length: 1099016
Checksums may also be requested as part of HTTP third-party copy requests. This is discussed in the following section on third-party copies.
The list of all known checksums of a specific file is available as a (read-only) WebDAV property.
The property's namespace is http://www.dcache.org/2013/webdav
and
the property name is Checksums
. The property value is a
comma-separated list of checksum values, following the format
described in RFC 3230 for Digest
header values.
Note that the Checksums property is not returned by default.
Therefore a simple PROPFIND
request without specifying any desired
properties will not include the checksum details.
The following example shows curl requesting the Checksums
property
for the file file-with-MD5
created in an earlier example. Note that
the output from dCache is passed through the xmllint
command. This
is not essential and is used only to make the output easier to read.
echo '<?xml version="1.0"?><propfind xmlns="DAV:"><prop><d:Checksums xmlns:d="http://www.dcache.org/2013/webdav"/></prop></propfind>' | curl -s -T - -X PROPFIND https://dcache.example.org/public/file-with-MD5 | xmllint -format -
|<?xml version="1.0" encoding="utf-8"?>
|<d:multistatus xmlns:cal="urn:ietf:params:xml:ns:caldav" xmlns:cs="http://calendarserver.org/ns/" xmlns:card="urn:ietf:params:xml:ns:carddav" xmlns:ns1="http://www.dcache.org/2013/webdav" xmlns:d="DAV:">
| <d:response>
| <d:href>/public/file-with-MD5</d:href>
| <d:propstat>
| <d:prop>
| <ns1:Checksums>md5=rFb0uPrFc5zNtFd30xO+zw==,adler32=af543afc</ns1:Checksums>
| </d:prop>
| <d:status>HTTP/1.1 200 OK</d:status>
| </d:propstat>
| </d:response>
|</d:multistatus>
A common objective is to upload a file and to be certain that the stored file has identical data to the locally stored file; i.e., the file's data was not corrupted while being delivered to the server.
It is possible to achieve this by uploading the file and subsequently
requesting the file's checksum. With RFC 3230 and placing the
Want-Digest
request header within the PUT request, it is possible to
discover the uploaded file's checksum without making a subsequent
request.
However, if the file is discovered to be corrupt, the client is then responsible for either removing the corrupt file or attempting another upload. Until either is done, the file exists in dCache with corrupt data.
Placing this responsibility on the client may be problematic: the client could halt (or be interrupted) before the recovery procedure completes, or may be authorised only to upload data and not overwrite existing data nor delete existing data.
An alternative approach is to supply a known checksum value when uploading the data. dCache then verifies this known checksum value matches that of the data it receives. If the two checksums do not match then the upload fails.
RFC 1864 describes a standard approach for sending a known checksum:
the Content-MD5
header.
A Content-MD5
header is similar to the RFC-3230 Digest
header.
One important difference is that a server that does not support the
Digest
header will accept the request, while a server that does not
support the Content-MD5
header will fail the request. Therefore, a
successful upload with Content-MD5
can only happen if the data is
not corrupt.
The following example shows a file being uploaded using the
Content-MD5
request header to ensure the file has not be corrupted.
Note that this request is authenticated (-E /tmp/x509up_u1000
)
curl -D- -L -T /bin/bash -H "Content-MD5: $(md5sum /bin/bash | cut -d' ' -f1 | xxd -r -p | base64)" -E /tmp/x509up_u1000 https://dcache.example.org/Users/paul/file-content-md5
|HTTP/1.1 100 Continue
|
|HTTP/1.1 201 Created
|Date: Mon, 23 Sep 2019 11:19:59 GMT
|Server: dCache/6.0.0-SNAPSHOT
|Accept-Ranges: bytes
|ETag: "00004F27D12A7220433DA294D01E8F5785C3_1574536970"
|Transfer-Encoding: chunked
The header value contains the MD5 checksum value using BASE64
encoding, rather than the more common hexadecimal encoding. The
command md5sum /bin/bash | cut -d' ' -f1 | xxd -r -p | base64
calculates the MD5 checksum of the file /bin/bash
in this BASE64
encoding.
Therefore, the value "Content-MD5: $(md5sum /bin/bash | cut -d' ' -f1 | xxd -r -p | base64)"
is the Content-MD5
header along with the
correct value for the target file.
If the upload is corrupted then dCache replies with a 400
status
code. The status line provides additional information.
In the following example, the checksum is calculated for an unrelated
file (/bin/echo
). Including this other file's checksum when
uploading the file /bin/bash
is used to simulate data corruption.
curl -D- -L -T /bin/bash -H "Content-MD5: $(md5sum /bin/echo | cut -d' ' -f1 | xxd -r -p | base64)" -E /tmp/x509up_u1000 https://dcache.example.org/Users/paul/file-content-md5
|HTTP/1.1 100 Continue
|
|HTTP/1.1 400 Checksum mismatch (expected=[2:29f4bf55fe826e5b167340f91aeb0f49], actual=[1:af543afc, 2:ac56f4b8fac5739ccdb45777d313becf])
|Date: Mon, 23 Sep 2019 11:20:56 GMT
|Server: dCache/6.0.0-SNAPSHOT
|Transfer-Encoding: chunked
The status message includes both the expected checksums and the actual
checksums. The 1:
indicates an ADLER32 checksum value, while the
2:
prefix indicates an MD5 checksum value.
A GET request that targets a directory returns an web page that describes that directory. This provides a very simple read-only web interface for accessing files in dCache, through which you can view the contents of a directory and view or download files stored in dCache.
If the URI contains query values (e.g.,
https://dcache.example.org/?foo=bar
) then those values are included
in the directory web-page's navigation and file download links. For
example, if the directory request was authorised using the bearer
token TOKEN
embedded within the URL (?authz=TOKEN
) then the user
may navigate dCache's directory structure and request file downloads
that are also authorised from this bearer token. This is particularly
useful when used with macaroons, as it provides an interactive view of
dCache powered by macaroons.
Metalink is a standard XML-based file format, documented in RFC 5854, that describes how to download one or more files. dCache provides limited support for providing metalink information: it describes how to download all the files in a directory, but there is no support for downloading files located within sub-directories: there's no recursion.
The metalink description may be obtained in two ways: through HTTP content negotiation and through Metalink/HTTP.
Content negotiation is where the HTTP client describes which file
format(s) it understands, weighting them by preference. It uses the
Accept
request header with a list of media types. The media type
for metalink is application/metalink4+xml
. To obtain a metalink
description of a directory, the client issues a HTTP GET request
against a directory, using content-negotiation to select a metalink
response.
curl -s -H "Accept: application/metalink4+xml" https://dcache.example.org/Users/paul/ | xmllint -format -
|<?xml version="1.0"?>
|<metalink xmlns="urn:ietf:params:xml:ns:metalink">
| <file name="public-file">
| <size>174</size>
| <hash type="sha-1">b95d5d20afb9a49d1d779ad3a6a246bd03bfef34</hash>
| <hash type="md5">7128e02d3779f8ff5141b9f5ac003be4</hash>
| <url>https://dcache.example.org/Users/paul/public%2Dfile</url>
| <updated>2023-10-05T04:05:00.682Z</updated>
| </file>
| <file name="private-file">
| <size>145</size>
| <hash type="sha-1">cfb51c36cbb348ead6b10588b84f5f9923737649</hash>
| <hash type="md5">32f9a46c0b40d63222db11b8a46f0584</hash>
| <url>https://dcache.example.org/Users/paul/private%2Dfile</url>
| <updated>2023-10-05T04:05:01.438Z</updated>
| </file>
|</metalink>
In this example, the xmllint
command is used only to make the
resulting XML "pretty". Without this command, you will see the more
compact XML representation that dCache returns. This representation
requires fewer characters but is harder to understand.
The same information is also available without content negotiation by
appending ?type=metalink
to the URL (e.g.,
https://dcache.example.org/Users/paul/?type=metalink
). A GET
request that targets this URL will always provide a metalink
description of the directory's contents.
curl -s https://dcache.example.org/Users/paul/?type=metalink | xmllint -format -
|<?xml version="1.0"?>
|<metalink xmlns="urn:ietf:params:xml:ns:metalink">
| <file name="public-file">
| <size>174</size>
| <hash type="sha-1">b95d5d20afb9a49d1d779ad3a6a246bd03bfef34</hash>
| <hash type="md5">7128e02d3779f8ff5141b9f5ac003be4</hash>
| <url>https://dcache.example.org/Users/paul/public%2Dfile</url>
| <updated>2023-10-05T04:05:00.682Z</updated>
| </file>
| <file name="private-file">
| <size>145</size>
| <hash type="sha-1">cfb51c36cbb348ead6b10588b84f5f9923737649</hash>
| <hash type="md5">32f9a46c0b40d63222db11b8a46f0584</hash>
| <url>https://dcache.example.org/Users/paul/private%2Dfile</url>
| <updated>2023-10-05T04:05:01.438Z</updated>
| </file>
|</metalink>
Metalink/HTTP is described by RFC 6249. This a standard way to
discover a URL of a corresponding metalink description. Following
this RFC, dCache includes an HTTP Link
response header in GET or
HEAD requests that target a directory. Following RFC 6249, the link
response header has the relationship (rel
) attribute value of
describedby
and the type
attribute value of
application/metalink4+xml
.
curl -s -I https://dcache.example.org/Users/paul/ | grep ^Link
|Link: <https://dcache.example.org/Users/paul/?type=metalink>; rel=describedby; type="application/metalink4+xml"
In the above example, curl issues an HTTP HEAD request that targets a
directory. The response includes the Link
header that identifies
the URL containing the metalink description.
In general, metalink is useful because it is supported by different applications. The Metalink wikipedia page contains a list of clients that support the format. Here are some example clients along with some notes on their use:
-
aria2 supports metalink, both content-negotiation and metalink/http. The
-V
and--follow-metalink=mem
options may be of interest. -
wget version 1 (
wget
) has limited support for metalink, while version 2 (wget2
) has broader support.
Properties are a standard feature of the WebDAV protocol. They are arbitrary key-value pairs that describe files and directories in the WebDAV server.
Property names are XML fully qualified names, which are also known as
qnames. Conceptually, a qname has two logical parts: a namespace and
a local name. The namespace groups together related properties and is
identified by a URI. The local name is something that identifies the
specific WebDAV property within the namespace, the name must be a
valid XML element name. When writing a qname, a prefix is used to
link the XML elements with their corresponding namespace. For
example, a WebDAV property from the
http://example.org/webdav-properties
namespace with the local name
of property-1
may be written:
<wd:property-1 xmlns:wd="http://example.org/webdav-properties"/>
This is using the prefix wd
as a short-hand representation of the
namespace. The specific choice of prefix (wd
in this example) is
arbitrary: any valid value is allowed, provided it does not clash with
some other namespace prefix in the document.
There are two kinds of WebDAV properties: dead and live. Dead properties are one where the values come exclusively from the client; the WebDAV server stores dead properties but does not act on them. Live properties are properties that have some significance to the WebDAV server; for example, coming from the target file's data, the target directory's content, or from some internal metadata of the target.
dCache does not support dead properties. Attempts to create such properties will fail.
dCache supports live WebDAV properties that provide information about the files stored in dCache and the directories within which those files are organised. The available read-only live properties include the file's checksum and SRM locality. dCache also supports mutable live properties, such as file and directory extended attributes.
In this section, examples are given to show WebDAV properties in
action. These examples are low-level, using the curl
command and
exposing details of the WebDAV protocol. We recommend you use a
dedicated WebDAV client that has built-in support for properties as
this should make working with properties easier.
A WebDAV client makes a PROPFIND
request to discover the current
value of one or more properties. The client sends an XML entity when
making this request, listing which properties the server should
provide.
The following request entity requests two properties: the SRM
FileLocality
property and the DAV standard creationdate
property.
<?xml version="1.0"?>
<propfind xmlns="DAV:"
xmlns:srm="http://srm.lbl.gov/StorageResourceManager">
<prop>
<srm:FileLocality/>
<creationdate/>
</prop>
</propfind>
Here is an example curl command that requests these two property values, along with dCache's response with the current value of these two properties.
echo '<?xml version="1.0"?><propfind xmlns="DAV:"><prop><srm:FileLocality xmlns:srm="http://srm.lbl.gov/StorageResourceManager"/><creationdate/></prop></propfind>' | curl -s -T- -X PROPFIND https://dcache.example.org/public/test-1 | xmllint -format -
|<?xml version="1.0" encoding="utf-8"?>
|<d:multistatus xmlns:cal="urn:ietf:params:xml:ns:caldav" xmlns:cs="http://calendarserver.org/ns/" xmlns:card="urn:ietf:params:xml:ns:carddav" xmlns:ns1="http://srm.lbl.gov/StorageResourceManager" xmlns:d="DAV:">
| <d:response>
| <d:href>/public/test-1</d:href>
| <d:propstat>
| <d:prop>
| <ns1:FileLocality>NEARLINE</ns1:FileLocality>
| <d:creationdate>2020-05-11T12:13:17Z</d:creationdate>
| </d:prop>
| <d:status>HTTP/1.1 200 OK</d:status>
| </d:propstat>
| </d:response>
|</d:multistatus>
The special term <allprop/>
tells the WebDAV server to return all
properties for a specific file or directory. An <allprop/>
request
entity looks like:
<?xml version="1.0"?>
<propfind xmlns="DAV:">
<allprop/>
</propfind>
The following example shows curl making an <allprop/>
request
against the file /public/test-1
, along with the response from
dCache.
echo '<?xml version="1.0"?><propfind xmlns="DAV:"><allprop/></propfind>' | curl -s -T- -X PROPFIND https://dcache.example.org/public/test-1 | xmllint -format -
|<?xml version="1.0" encoding="utf-8"?>
|<d:multistatus xmlns:cal="urn:ietf:params:xml:ns:caldav" xmlns:cs="http://calendarserver.org/ns/" xmlns:card="urn:ietf:params:xml:ns:carddav" xmlns:ns2="http://www.dcache.org/2013/webdav" xmlns:ns1="http://srm.lbl.gov/StorageResourceManager" xmlns:d="DAV:">
| <d:response>
| <d:href>/public/test-1</d:href>
| <d:propstat>
| <d:prop>
| <ns1:AccessLatency>ONLINE</ns1:AccessLatency>
| <ns1:RetentionPolicy>REPLICA</ns1:RetentionPolicy>
| <ns2:Checksums>adler32=af543afc</ns2:Checksums>
| <ns1:FileLocality>NEARLINE</ns1:FileLocality>
| <d:getcreated>2020-05-11T12:13:17Z</d:getcreated>
| <d:creationdate>2020-05-11T12:13:17Z</d:creationdate>
| <d:getlastmodified>Mon, 11 May 2020 12:13:17 GMT</d:getlastmodified>
| <d:getetag>"0000FC2FCCBC9B2B43D08C672D1FE83AA0E8_61297885"</d:getetag>
| <d:iscollection>FALSE</d:iscollection>
| <d:displayname>test-1</d:displayname>
| <d:isreadonly>TRUE</d:isreadonly>
| <d:name>test-1</d:name>
| <d:supported-report-set/>
| <d:getcontentlength>1099016</d:getcontentlength>
| </d:prop>
| <d:status>HTTP/1.1 200 OK</d:status>
| </d:propstat>
| <d:propstat>
| <d:prop>
| <d:getcontenttype/>
| <d:resourcetype/>
| </d:prop>
| <d:status>HTTP/1.1 404 Not Found</d:status>
| </d:propstat>
| </d:response>
|</d:multistatus>
Although the <allprop/>
request is convenient, providing all
information about the targeted file or directory can also include
properties that are computationally expensive to calculate. Such
calculations are a waste of resources if these property values are not
of interest. Therefore it is strongly recommended that clients should
avoid <allprop/>
requests whenever possible.
The PROPPATCH
verb may be used to update an existing WebDAV
property. The property is automatically created if it does not
already exist.
Creating and updating properties are only supported in dCache in connection with extended attributes, which are described in more detail later.
The following XML shows an example entity sent with a PROPPATCH
request to create or modify two extended attributes: attribute-1
and
attribute-2
.
<?xml version="1.0"?>
<propertyupdate xmlns="DAV:" xmlns:xa="http://www.dcache.org/2020/xattr">
<set>
<prop>
<xa:attribute-1>new value 1</xa:attribute-1>
<xa:attribute-2>new value 2</xa:attribute-2>
</prop>
</set>
</propertyupdate>
The following command shows a complete example where a curl
command
updates these two properties.
echo '<?xml version="1.0"?><propertyupdate xmlns="DAV:" xmlns:xa="http://www.dcache.org/2020/xattr"><set><prop><xa:attribute-1>new value 1</xa:attribute-1><xa:attribute-2>new value 2</xa:attribute-2></prop></set></propertyupdate>' | curl -s -T- -X PROPPATCH https://dcache.example.org/public/test-1 | xmllint -format -
|<?xml version="1.0" encoding="utf-8"?>
|<d:multistatus xmlns:cal="urn:ietf:params:xml:ns:caldav" xmlns:cs="http://calendarserver.org/ns/" xmlns:card="urn:ietf:params:xml:ns:carddav" xmlns:d="DAV:">
| <d:response>
| <d:href>/public/test-1</d:href>
| <d:propstat>
| <d:prop>
| <attribute-1/>
| <attribute-2/>
| </d:prop>
| <d:status>HTTP/1.1 404 Not Found</d:status>
| </d:propstat>
| </d:response>
|</d:multistatus>
The PROPPATCH
verb is also used to remove a property from a target
file or directory. It is not necessary to verify whether the property
exists; removing a non-existing property does not result in an error.
As with updating properties, removing properties are only supported in dCache in connection with extended attributes.
The following example shows the request entity when removing two properties.
<?xml version="1.0"?>
<propertyupdate xmlns="DAV:" xmlns:xa="http://www.dcache.org/2020/xattr">
<remove>
<prop>
<xa:attribute-1/>
<xa:attribute-2/>
</prop>
</remove>
</propertyupdate>
Here is a complete example showing the removal of these two properties.
echo '<?xml version="1.0"?><propertyupdate xmlns="DAV:" xmlns:xa="http://www.dcache.org/2020/xattr"><remove><prop><xa:attribute-1/><xa:attribute-2/></prop></remove></propertyupdate>' | curl -s -T- -X PROPPATCH https://dcache.example.org/public/test-1 | xmllint -format -
|<?xml version="1.0" encoding="utf-8"?>
|<d:multistatus xmlns:cal="urn:ietf:params:xml:ns:caldav" xmlns:cs="http://calendarserver.org/ns/" xmlns:card="urn:ietf:params:xml:ns:carddav" xmlns:d="DAV:">
| <d:response>
| <d:href>/public/test-1</d:href>
| <d:propstat>
| <d:prop>
| <attribute-1/>
| <attribute-2/>
| </d:prop>
| <d:status>HTTP/1.1 404 Not Found</d:status>
| </d:propstat>
| </d:response>
|</d:multistatus>
Extended attributes are arbitrary key-value pairs that may be assigned to files and directories in dCache. Read the chapter on extended attributes to learn more about them.
It is possible to assign extended attributes to a file as part of the upload process. To do this, simply add the attribute definitions to the URL's query-part using a simple encoding.
The query part of a URL is what appears after the question mark
character. It often contains key-value pairs using the equals symbol
(=
). Multiple key-value pairs are usually separated by the
ampersand character (&
). Here is an arbitrary example that shows a
query-part http://example.org/path/to/file?key1=value1&key2=value2
.
In this example, the query-part is key1=value1&key2=value2
.
To define an extended attribute when uploading a file, prefix the
attribute name with xattr.
and append =
and the attribute value;
for example, the extended attribute "foo" with value "bar" would be
written xattr.foo=bar
. This defintion is included in the upload
URL query-part.
As with other parts of a URL, if the extended attribute value contains
spaces then they must be written using url-encoded (percent-escaped);
for example, the extended attribute "foo" with value "bar baz" would
be defined as xattr.foo=bar%20baz
in the URL's query-part.
Depending on your client, you may find spaces are encoded
automatically.
Multiple extended attributes may be defined when uploading a file.
The corresponding definitions are concatenated within the URL
query-part using the &
separator; for example, to define two
attributes "name1" and "name2" that have values "value 1" and "value
2" (respectively) an upload URL may be written as
xattr.name1=value%201&xattr.name2=value%202
.
The following is a complete example URL that may be used when
uploading data. The new file is located as /Users/paul/new-data.md5
and will have two extended attributes ("name1" and "name2") described
above.
https://dcache.example.org/Users/paul/new-data.md5?xattr.name1=value%201&xattr.name2=value%202
The following example shows a curl command that uses the above URL to simultaneously upload a file and assign it two extended attributes.
curl -E /tmp/x509up_u1000 -L -T new-data.md5 'https://dcache.example.org/Users/paul/new-data.md5?xattr.name1=value%201&xattr.name2=value%202'
Note that the URL is placed within quote marks. This is to prevent
the shell interpreting the &
character as a control operator, which
would result in the shell executing the command in the background.
The subsequent characters would then be interpreted as additional
shell commands.
An extended attribute has an equivalent WebDAV property, through which
the extended attribute may be created, queried, modified and removed.
This property has the extended attribute's name as the property's
local part with http://www.dcache.org/2020/xattr
as the namespace;
for example, the extended attribute attribute-1
is the WebDAV
property <xa:attribute-1 xmlns:xa="http://www.dcache.org/2020/xattr"/>
To query specific extended attributes of a file or directory a PROPFIND request is made with a request entity like:
<?xml version="1.0"?>
<propfind xmlns="DAV:"
xmlns:xa="http://www.dcache.org/2020/xattr">
<prop>
<xa:attribute-1/>
<xa:attribute-2/>
</prop>
</propfind>
The <allprop/>
query will yield all extended attributes assigned to
the target file or directory, along with all other WebDAV properties.
Extended attributes may be created or modified using a PROPPATCH
request with a request entity like:
<?xml version="1.0"?>
<propertyupdate xmlns="DAV:" xmlns:xa="http://www.dcache.org/2020/xattr">
<set>
<prop>
<xa:attribute-1>new value 1</xa:attribute-1>
<xa:attribute-2>new value 2</xa:attribute-2>
</prop>
</set>
</propertyupdate>
Finally, extended attributes may be removed using a PROPPATCH
request with a request entity like:
<?xml version="1.0"?>
<propertyupdate xmlns="DAV:" xmlns:xa="http://www.dcache.org/2020/xattr">
<remove>
<prop>
<xa:attribute-1/>
<xa:attribute-2/>
</prop>
</remove>
</propertyupdate>
A client may request dCache issue a macaroon by making a specific request to the WebDAV door. This section describes this process; the earlier section describes how macaroons may be used.
To request a macaroon, the client makes a POST request, with the
Content-Type
of this request set to application/macaroon-request
.
Note that this is only possible if the request is authenticated.
The following example shows the simplest request to obtain a macaroon.
curl -E /tmp/x509up_u1000 -X POST -H 'Content-Type: application/macaroon-request' https://dcache.example.org/
|{
| "macaroon": "MDAxY2[...]l8K",
| "uri": {
| "targetWithMacaroon": "https://dcache.example.org/?authz=MDAxY2[...]l8K",
| "baseWithMacaroon": "https://dcache.example.org/?authz=MDAxY2[...]l8K",
| "target": "https://dcache.example.org/",
| "base": "https://dcache.example.org/"
| }
|}
IMPORTANT
In this example, the full macaroon would be a 284-character strings. To improve readability, this macaroon is replaced by the much shorter string
MDAxY2[...]l8K
. This convention is followed for all macaroons in this chapter.
The macaroon request must be authenticated. In the above example, the
request is authenticated using X.509-based authentication (the option
-E /tmp/x509up_u1000
). Macaroon requests may be authenticated with
any supported webdav authentication scheme, with the exception of
SciTokens with path restrictions.
The response is a JSON object that includes various related values.
The first is the actual macaroon, which is the corresponding value of
the macaroon
JSON object key.
The macaroon may be used when making requests, as described in the macaroon authentication section.
The uri
JSON object value is another JSON object, providing useful
URIs related to this query. These items may all be derived from the
macaroon value, so are included as a helpful short-cut. The base
value is the URI of the root path, containing the scheme, hostname,
port number (if non-standard). The target
resolves the POST
request's path against the base
URI; if the POST request targeted
the root directory then base
and target
URIs have the same value.
The uri
JSON object value is another JSON object, providing useful
URIs related to this query. These items may all be derived from the
macaroon value, so are included as a helpful short-cut.
A macaroon appears as an opaque string, but actually contains information on what a user is allowed to do in the form of various caveats.
While it is not essential to understand the caveats of a macaroon, showing a macaroon's caveats should make it easier to understand how to request more restrictive macaroons.
This section will use the Python macaroon library pymacaroons
. This
is available pre-packaged; e.g., apt-get install python-pymacaroons
or pip install pymacaroons
.
The following example shows how to list the caveats contained within a macaroon:
from pymacaroons import Macaroon
import sys
for line in sys.stdin:
m = Macaroon.deserialize(line)
print(m.inspect())
Here is a typical response:
echo MDAxY2[...]l8K | python inspect-macaroon.py
|location Optional.empty
|identifier aktmMDje
|cid iid:GaVltWFP
|cid id:2002;2002,0;paul
|cid before:2019-09-25T08:12:11.080Z
|cid home:/Users/paul
|signature 0a9dcf9ede9d747fdbf365a88c4de7a65a60a709e9054f3e6f5533b06716365f
In this example, the macaroon has four caveats, each identified by the
cid
prefix. These four caveats values are iid:GaVltWFP
,
id:2002;2002,0;paul
, before:2019-09-25T08:12:11.080Z
and
home:/Users/paul
.
One benefit of macaroons is that it is possible to add additional caveats to macaroon independent of dCache. This allows some powerful work-flows where an external agent requests a powerful macaroon and generates more restricted macaroons "on demand".
The portal use-case provides an example of such a workflow. A web-portal should allow its users to download specific files if that user is authorised to view that data, where these users are unknown to dCache. The portal requests a macaroon that is authorised to download any file in dCache. When a user requests downloading a file she is authorised to read, the portal autonomously generate a macaroon that is authorised to download that single file and redirects the user's request to the URL with the embedded macaroon. The result is an architecture that allows users to download data at an almost arbitrary throughput.
The python library above may be used to add caveats to an existing macaroon. The macaroon is de-serialised, the additional caveats are added and the resulting macaroon is serialised.
Whenever dCache issues a macaroon there are always some caveats. When asking dCache for a macaroon, the request may ask that additional caveats be included. Although extra caveats may be added to a macaroon directly, it is somewhat safer to have dCache add the caveats as this avoids dCache returning an unnecessarily powerful macaroon.
In general dCache caveats values have the format KEY:VALUE
; e.g.,
the caveat before:2019-09-25T08:12:11.080Z
has a key of before
and
value 2019-09-25T08:12:11.080Z
.
The following table lists the available caveat keys along with the value format and what it means.
Activity | Description |
---|---|
LIST | Obtain directory lists |
UPLOAD | Create new files |
DOWNLOAD | Obtain files' data |
DELETE | Delete a file or directory or overwrite an existing file |
MANAGE | Rename and move files or directories |
READ_METADATA | Obtain file metadata |
UPDATE_METADATA | Modify file metadata, stage a file or change its QoS |
The READ_METADATA is implied if any other activity is specified. For example, the caveat activity:LIST,DOWNLOAD restricts the macaroon to read-only operations. Multiple activity caveats are supported. A request must satisfy all activity caveats to be accepted.
The POST request may include a JSON object providing information about
the desired macaroon. The caveats
key may be supplied. The value
is a JSON list of JSON strings. Each string is a caveat to be
included in the macaroon.
For example, the following JSON requests the generated macaroon
contain the caveat activity:LIST,DOWNLOAD
, which limits the macaroon
to read-only operations.
{
"caveats": [
"activity:LIST,DOWNLOAD"
]
}
The corresponding curl command is:
curl -E /tmp/x509up_u1000 -X POST -H 'Content-Type: application/macaroon-request' -d '{"caveats": ["activity:LIST,DOWNLOAD"]}' https://dcache.example.org/
|{
| "macaroon": "MDAxY2[...]XmCg",
| "uri": {
| "targetWithMacaroon": "https://dcache.example.org/?authz=MDAxY2[...]XmCg",
| "baseWithMacaroon": "https://dcache.example.org/?authz=MDAxY2[...]XmCg",
| "target": "https://dcache.example.org/",
| "base": "https://dcache.example.org/"
| }
|}
Often, it is useful to generate a macaroon that expires some fixed period in the future; for example, to generate a macaroon that expires in five minutes.
One way to achieve this is to calculate the instant five minutes in
the future, convert this into ISO 8601 format and include the caveat
in the macaroon request as one of the desired caveats. For example,
if the current time is 12:00:00 CEST on Tuesday 24th September 2019
then the caveat would be before:2019-09-24T10:05:00Z
.
To request a read-only macaroon that is valid for five minutes, The request JSON object would look like:
{
"caveats": [
"activity:LIST,DOWNLOAD",
"before:2019-09-24T10:05:00Z"
]
}
The corresponding curl command is:
curl -E /tmp/x509up_u1000 -X POST -H 'Content-Type: application/macaroon-request' -d '{"caveats": ["activity:LIST,DOWNLOAD", "before:2019-09-24T10:05:00Z"]}' https://dcache.example.org/
|{
| "macaroon": "MDAxY2[...]yRgK",
| "uri": {
| "targetWithMacaroon": "https://dcache.example.org/?authz=MDAxY2[...]yRgK",
| "baseWithMacaroon": "https://dcache.example.org/?authz=MDAxY2[...]yRgK",
| "target": "https://dcache.example.org/",
| "base": "https://dcache.example.org/"
| }
|}
As a short-cut, you can request a macaroon with a specific validity,
such as five minutes. dCache will calculate the corresponding period
and add the corresponding before
caveat. This is added as the
validity
key in the request JSON object. The value is an ISO 8601
value describing the validity period; for example, five minutes is
expressed as PT5M
in ISO 8601.
The following object requesting a read-only macaroon that is only valid for the next five minutes:
{
"caveats": [
"activity:LIST,DOWNLOAD"
],
"validity": "PT5M"
}
Here is the corresponding curl command:
curl -E /tmp/x509up_u1000 -X POST -H 'Content-Type: application/macaroon-request' -d '{"caveats": ["activity:LIST,DOWNLOAD"], "validity": "PT5M"}' https://dcache.example.org/
|{
| "macaroon": "MDAxY2[...]5bCg",
| "uri": {
| "targetWithMacaroon": "https://dcache.example.org/?authz=MDAxY2[...]5bCg",
| "baseWithMacaroon": "https://dcache.example.org/?authz=MDAxY2[...]5bCg",
| "target": "https://dcache.example.org/",
| "base": "https://dcache.example.org/"
| }
|}
The final short-cut is for providing a path
caveat. Supposing you
wish to allow a user to download a specific file, or files within a
specific directory. This may be achieved by including the path
caveat in the macaroon request object.
For example, to allow users to download the specific file
/users/paul/data-2019/top-secret.dat
, the following JSON object may
be supplied to the macaroon request:
{
"caveats": [
"activity:LIST,DOWNLOAD",
"path:/users/paul/data-2019/top-secret.dat"
]
}
The corresponding curl request would look like:
curl -E /tmp/x509up_u1000 -X POST -H 'Content-Type: application/macaroon-request' -d '{"caveats": ["activity:LIST,DOWNLOAD", "path:/users/paul/data-2019/top-secret.dat"]}' https://dcache.example.org/
|{
| "macaroon": "MDAxY2[...]tmIK",
| "uri": {
| "targetWithMacaroon": "https://dcache.example.org/?authz=MDAxY2[...]tmIK",
| "baseWithMacaroon": "https://dcache.example.org/?authz=MDAxY2[...]tmIK",
| "target": "https://dcache.example.org/",
| "base": "https://dcache.example.org/"
| }
|}
As a convenient short-cut, the desired path may be included as the path of the macaroon request URL.
For example, the same request may be achieved by sending the POST
request to
https://dcache.example.org/users/paul/data-2019/top-secret.dat
instead of https://dcache.example.org/
.
curl -s -E /tmp/x509up_u1000 -X POST -H 'Content-Type: application/macaroon-request' -d '{"caveats": ["activity:LIST,DOWNLOAD"]}' https://dcache.example.org/users/paul/data-2019/top-secret.dat
|{
| "macaroon": "MDAzY2[...]JeQo",
| "uri": {
| "targetWithMacaroon": "https://dcache.example.org/users/paul/data-2019/top-secret.dat?authz=MDAzY2[...]JeQo",
| "baseWithMacaroon": "https://dcache.example.org/?authz=MDAzY2[...]JeQo",
| "target": "https://dcache.example.org/users/paul/data-2019/top-secret.dat",
| "base": "https://dcache.example.org/"
| }
|}
Note that the target
and targetWithMacaroon
URLs in the response
JSON object have changed. In particular, the targetWithMacaroon
URL
may be used directly to download the desired file.
Third-party transfers are requests to dCache, asking that it transfers a file with another HTTP server. This differs from normal HTTP interactions as a third-party transfer involves data transferred directly between dCache and some other HTTP server (which might or might not be running dCache).
Third-party transfers are useful when transferring data as it uses the network between the source and destination storage systems. This is often well provisioned, with significant available bandwidth. As an example, a laptop connected via a coffee shop's free wifi can easily orchestrate the transfer of petabytes of data using third-party transfers.
Third-party transfer requests are distinguished into two groups: pull requests where dCache is downloading data from the remote server, and push requests where dCache is uploading data to the remote server.
To initiate a third-party transfer, the client issues a COPY
request
with the remote URL as either the Source
request header (for a pull
request) or the Destination
request header (for a push request). In
either case, the header value is a URL, which describes how data is
transferred, the name of the remote server, the port number (if
non-standard) and the path of the file.
The COPY request may optionally include other headers that affect the transfer. These other headers are described in subsequent sections.
All authentication schemes that dCache supports for direct WebDAV operations are also supported for third-party transfers.
IMPORTANT
This section discusses how dCache handles third-party COPY requests: requests to initiate a third-party copy. There is a separate issue where dCache must somehow authorise the data-bearing transfer. This is discussed in a subsequent section.
Third-party transfers are only allowed for authenticated users: anonymous third-party COPY requests are always rejected.
In general, third-party transfer authorisation may be understood by considering what were to happen if the client were to relay the data. In principal, a client can achieve the same result as a third-party transfer by downloading the data from the source and uploading that data to the destination.
A third-party transfer request that pulls data from the remote server is authorised in a similar manner that user attempting to upload the data itself. To initiate a third-party pull request, the user must be authorised to write into the target directory. If the target file already exists then the user must be authorised to overwrite that existing data.
A third-party transfer request that pushes data to the remote server is authorised in a similar manner to the client downloading the data: it requires that the client is able to read the source file.
If the user is not authorised to make a specific third-party transfer request then dCache returns immediate with an error status code.
In the following example, the client is attempting to initiate a pull request without authenticating:
curl -D- -X COPY -H 'Source: http://www.dcache.org/images/dcache-banner.png' https://dcache.example.org/test.png
|HTTP/1.1 401 Permission denied
|Date: Wed, 25 Sep 2019 08:30:04 GMT
|Server: dCache/6.0.0-SNAPSHOT
|WWW-Authenticate: Basic realm=""
|Content-Length: 0
|
If the third-party transfer request is authorised and it passes some
basic checks (e.g., exactly one of the Source
and Destination
request headers are present) then dCache will respond immediately with
a 202 status code. This indicates that work has started to process
the request, but dCache will continue working on this request in the
background.
A transfer may take some time to complete. While working on the request, dCache will return periodic reports (called performance markers) describing the current status, as part of the HTTP response. The HTTP response is sent using chunked encoding. This allows the client to receive these reports in a timely fashion, to get feedback as the transfer is processed.
Each performance marker has a strict format. Each report contains
multiple lines: the first line is Perf Marker
, followed by multiple
metadata lines, ending with the line containing only End
. Each
metadata line represents a key-value pair, printed as the key,
followed by a colon-space (:
), followed by the current value.
There are two phases to any transfer: the pre-transfer phase and the transfer phase.
During the pre-transfer phase, dCache is readying itself for the transfer.
For pull-requests, this involves creating the namespace entry, deciding which pool will accept the new data. There are also potential internal failures, which trigger internal retries.
For push-requests, the pool that will deliver the file's contents is selected. Depending on dCache configuration, it is possible that the selected pool does not currently contain the file's data. In this case, dCache will initiate an internal copy of the file's data, which may take some time.
During the pre-transfer phase, dCache returns progress markers that look like this:
Perf Marker
Timestamp: 1569399772
State: 3
State description: querying created file metadata
Stripe Index: 0
Total Stripe Count: 1
End
These metadata items have the following meaning:
Key | Value | Meaning |
---|---|---|
Timestamp | UNIX time | When the transfer was accepted |
State | Integer | A machine-readable description of the current status |
State description | String | A human-readable description of the current status |
Stripe Index | 0 |
Included for compatibility with other software |
Total Strip Count | 1 |
Included for compatibility with other software |
Once all the preparation steps of the pre-transfer phase has
completed, the pool will attempt to make an HTTP transfer. During
this transfer phase, the transfer is in state 10
(described as
transfer has started
).
While the transfer is underway, some additional metadata is included in the performance markers:
Here is a typical performance marker:
Perf Marker
Timestamp: 1569403183
State: 10
State description: transfer has started
Stripe Index: 0
Stripe Start Time: 1569403178
Stripe Last Transferred: 1569403183
Stripe Transfer Time: 4
Stripe Bytes Transferred: 503928392
Stripe Status: RUNNING
Total Stripe Count: 1
End
The fields Timestamp
, State
, State description
, Stripe Index
and Total Stripe Count
are still present and have the same meaning
as for the pre-transfer phase progress markers.
The additional metadata items have the following meaning:
Key | Value | Meaning |
---|---|---|
Stripe Start Time | UNIX time | When the transfer was started |
Stripe Last Transferred | UNIX time | When data was last send or received |
Stripe Transfer Time | Seconds | How long the transfer has been running |
Stripe Bytes Transferred | Bytes | How many bytes have been transferred |
Stripe Status | enumeration | Current status of the transfer |
A large discrepancy between Timestamp
and Stripe Last Transferred
indicates that the remote server has stopped accepting data (for push
requests) or stopped sending data (for pull requests).
The client can establish the average transfer bandwidth between two
performance markers by comparing the two Stripe Bytes Transferred
values. Advanced clients may use this to detect stalled transfers.
The Stripe Status
value is one of NEW
, QUEUED
, RUNNING
,
DONE
, CANCELED
. As the transfer only remains in state NEW
,
DONE
and CANCELED
for a very short period, you should only see
transfers in state QUEUED
or in state RUNNING
. QUEUED
indicates
the transfer is queued on the pool, while RUNNING
indicates the
transfer is now being processed.
The final line of the transfer describes whether or not the transfer
was successful. If the transfer was successful then the final line is
success: Created
. If the transfer was unsuccessful then the final
line starts failure:
followed by a description of why the transfer
failed.
For example, the final line failure: rejected GET: 404 Not Found
indicates a pull request attempted to copy a file that does not exist.
The final line failure: rejected GET: 401 Unauthorized
indicates
dCache was not authorised to read the remote file.
The following provides a complete example of the response when making a successful pull request. It includes both the HTTP response headers and the HTTP response body. The body contains two progress markers (one in the pre-transfer phase, the other transfer phase) and the final line indicating the transfer was successful.
HTTP/1.1 202 Accepted
Date: Wed, 25 Sep 2019 09:43:22 GMT
Server: dCache/5.2.0-SNAPSHOT
Content-Type: text/perf-marker-stream
Transfer-Encoding: chunked
Perf Marker
Timestamp: 1569404602
State: 3
State description: querying created file metadata
Stripe Index: 0
Total Stripe Count: 1
End
Perf Marker
Timestamp: 1569404607
State: 10
State description: transfer has started
Stripe Index: 0
Stripe Start Time: 1569404602
Stripe Last Transferred: 1569404607
Stripe Transfer Time: 4
Stripe Bytes Transferred: 531501280
Stripe Status: RUNNING
Total Stripe Count: 1
End
success: Created
dCache takes data integrity very seriously. This includes
transferring data with remote servers. Under normal circumstances,
dCache will only consider a third-party transfer successful if the
data-bearing HTTP request (either a GET
for pull, or PUT
for push)
indicates a success and the integrity of the new copy is verified.
There are two complementary ways to verify the new copy has not become corrupted: checking the file size matches and the file checksum matches.
For pull requests, the file's size is normally included in the
response headers of the GET
request; however, for push requests the
response headers to the PUT
request usually do not contain the
file's size. Therefore, a subsequent HEAD
request is often needed
after a successful PUT
request.
The checksum of the remote file is obtained using RFC 3230
Want-Digest
headers. For pull requests, this header is included in
the GET
request; however, for the PUT
request the Want-Digest
header is included in the subsequent HEAD
request.
Although dCache supports RFC 3230, the standard is not widely
supported by other HTTP servers. Therefore, when transferring data
with a non-dCache remote server, it is likely that the server does not
support RFC 3230. By default, dCache will fail the transfer if the
remote server does not support RFC 3230; however, if it is desirable
to accept transfers without checksum verification then the
RequireChecksumVerification
COPY request header may be set to
false
. When setting this flag to false
, dCache will still attempt
to verify the remote file's checksum and the transfer will fail if
that remote checksum indicates data corruption.
HTTP requests may contain different request headers. When dCache is processing a third-party transfer, it makes one or more requests with the remote server. When making these requests, dCache will include a standard set of transfer headers. However, you can modify these transfer headers.
The general rule is that any header in the third-party (COPY) request
that starts with a TransferHeader
prefix is used when dCache is
making the request without this prefix. For example, if the COPY
request contains the header TransferHeaderClientContext: foo
then
dCache will include the header ClientContext: foo
when making
requests to the remote server.
Perhaps the main use for this feature is to include authorisation
information for the data bearing request. For example, to include the
basic authentication (Authorization: Basic cGF1bDpUb29NYW55U2VjcmV0YQ==
) in the data-bearing request, the COPY
request should include the TransferHeaderAuthorization
header
(TransferHeaderAuthorization: Basic cGF1bDpUb29NYW55U2VjcmV0YQ==
).
Often a server will require some form of authentication before accepting an upload (PUT) request. Similarly, some data is not public and requires authentication before a downloaded (GET) request is accepted.
As a direct consequence, when processing a third-party transfer, dCache may need to authenticate or provide some kind of authorisation in order to obtain the data (for pull requests) or upload data (for push requests).
If needed, this authorisation must come from the client. Delegation is the general term for handing over credentials that allow dCache to operate on behalf of the user.
There are several ways a client can delegate a credential to dCache.
The Credential
third-party COPY request header controls where dCache
will fetch the credential. A value of none
indicates dCache should
not initiate any delegation, gridsite
indicates dCache should use a
delegated X.509 credential, and oidc
indicates dCache should use
OIDC delegation.
The default value for Credential
depends on how the client
authenticated for the third-party COPY request. If OIDC was used then
oidc
is the default; if X.509 was used then the default is
gridsite
, otherwise the default is none
.
If Credential
third-party COPY request header has value none
then
either the request does not require any credential (e.g., downloading
public data) or the credential is supplied through direct header
delegation.
Direct header delegation is where the client supplies the credential
directly to dCache, as a transfer header. As described above, this is
achieved by specifying the corresponding TransferHeader
header in
the COPY request; for example, most requests are authorised by
specifying the Authorization
request header value. To supply a
suitable Authorization
value, specify the
TransferHeaderAuthorization
header in the COPY request.
Basic (username + password) authentication may be used to authorise
the transfer. For example, to include the basic authentication
(Authorization: Basic cGF1bDpUb29NYW55U2VjcmV0YQ==
) in the
data-bearing request, the COPY request should include the
TransferHeaderAuthorization
header (TransferHeaderAuthorization: Basic cGF1bDpUb29NYW55U2VjcmV0YQ==
).
Basic authentication is NOT recommended as it requires the user to send their username and password to dCache.
An alternative to basic authentication is to use some kind of a bearer
token to authorise the transfer; for example, if the data bearing
transfer may be authorised with the Authorization: Bearer TOKEN
header, then the third-party COPY request should include the
TransferHeaderAuthorization: Bearer TOKEN
request header.
As a specific example, a macaroon may be used to authorise the
transfer. The client requests a macaroon from the third-party server
that targets the specific file. Once obtained, this macaroon may be
passed into the COPY request via the TransferHeaderAuthorization: Bearer MACAROON
(with MACAROON
replaced with the actual macaroon).
dCache supports the GridSite delegation protocol. This allows clients to delegate their X.509 credential to dCache so dCache can operation on their behalf.
If GridSite delegation is selected, dCache will check if the user has already delegated a credential that will still be valid in two minutes. If so, it will use that credential and the transfer will proceed directly.
If dCache has no credential for this user (or the credential has
already expired or will expire soon) then dCache will request the user
delegates. This is done by responding to the COPY request with a
redirection status code (with a target URL in the Location
response
header) and a X-Delegate-To
response header. The header value is a
space-separated list of GridSite delegation URLs.
The client is expected to delegate its X.509 credential to one of the
listed delegation URLs and re-issue the POST request against the URL
in the Location
response header.
An OpenID-Connect access token may include a claim that means it targets a specific server, or the access token could expire if the transfer is queued within dCache.
To counter these two problems, the oidc delegation process works by requesting a fresh access (and refresh) token from the OP that issued the OpenID-Connect access token.
This delegation process is not uniformly supported.
In the following example, the client instructs dCache to create a new
file /Users/paul/test-1
, taking the file's data from
http://www.dcache.org/images/dcache-banner.png
. Note that the COPY
request is authorised using an X.509 credential (-E /tmp/x509up_u1000
).
The source file (http://www.dcache.org/images/dcache-banner.png
) is
public: dCache does not require any special permission to obtain this
file's data. Therefore this third-party request requires no
delegation.
As the third-party COPY request is authenticated with X.509, the
Credential: none
request header is needed to avoid triggering
gridsite delegation.
Finally, as the server supplying the file's data is a standard Apache
web server, it does not support RFC 3230. Therefore, the client must
tell dCache not to fail the transfer if it cannot obtain a checksum to
verify the file's integrity. The RequireChecksumVerification: false
request header is used to convey this.
curl -D- -E /tmp/x509up_u1000 -X COPY -H 'Credential: none' -H 'RequireChecksumVerification: false' -H 'Source: http://www.dcache.org/images/dcache-banner.png' https://dcache.example.org/Users/paul/test-1
|HTTP/1.1 202 Accepted
|Date: Wed, 25 Sep 2019 08:22:26 GMT
|Server: dCache/6.0.0-SNAPSHOT
|Content-Type: text/perf-marker-stream
|Transfer-Encoding: chunked
|
|Perf Marker
| Timestamp: 1569399772
| State: 3
| State description: querying created file metadata
| Stripe Index: 0
| Total Stripe Count: 1
|End
|success: Created