Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WEBDAV-related special remote error #739

Open
adswa opened this issue Jul 5, 2024 · 8 comments
Open

WEBDAV-related special remote error #739

adswa opened this issue Jul 5, 2024 · 8 comments

Comments

@adswa
Copy link
Member

adswa commented Jul 5, 2024

In a recent office hour, the following way of interacting with a webdav sibling (using a public link generated in the web interface) yielded special remote errors:

### create test ds and push to sciebo
datalad create -c text2git test_publink_webdav
cd test_publink_webdav
datalad download-url http://www.neuromorphometrics.com/1103_3.tgz
# USE OWN SCIEBO
datalad create-sibling-webdav -s sciebo --credential sciebo --mode filetree "https://fz-juelich.sciebo.de/remote.php/dav/files/<USERNAME>%40fz-juelich.de/dataladstore/test_publink_webdav"
datalad push --to sciebo

# CREATE PUBLIC LINK WITH PASSWORD IN BROWSER
# USE LAST PART OF public link as USER:
export WEBDAV_USERNAME='<LAST-PART-OF-PUBLIC-LINK>'
export WEBDAV_PASSWORD='<YOUR-CHOSEN-PASSWORD-HERE>'

datalad clone "webdavs://fz-juelich.sciebo.de/public.php/webdav" test_publink_webdav


# ENABLE ANNEX SIBLING FAILS
datalad siblings -d "/home/fhoffstaedter/DATA_TMP/TMP/test_publink_webdav2" enable -s sciebo-storage

The observed error looked like this:

❯ datalad clone "webdavs://fz-juelich.sciebo.de/public.php/webdav" PPMI_publink_cat12.8.1 --shared
[INFO   ] Remote origin uses a protocol not supported by git-annex; setting annex-ignore                  
[INFO   ] Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>             
| BrokenPipeError: [Errno 32] Broken pipe 
[INFO   ] Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'> 
[INFO   ] BrokenPipeError: [Errno 32] Broken pipe 
[INFO   ] access to 1 dataset sibling sciebo-storage not auto-enabled, enable with:
|               datalad siblings -d "/data/project/deleted_every_sunday/PPMI_publink_cat12.8.1" enable -s sciebo-storage 
install(ok): /data/project/deleted_every_sunday/PPMI_publink_cat12.8.1 (dataset)

❯ cd PPMI_publink_cat12.8.1
❯ datalad siblings -d "/data/project/deleted_every_sunday/PPMI_publink_cat12.8.1" enable -s sciebo-storage

CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false annex enableremote sciebo-storage -c annex.dotfiles=true' failed with exitcode 1 under /data/project/deleted_every_sunday/PPMI_publink_cat12.8.1
enableremote sciebo-storage (testing WebDAV server...) 
failed
git-annex: WebDAV test failed: HttpExceptionRequest Request {
  host                 = "fz-juelich.sciebo.de"
  port                 = 443
  secure               = True
  requestHeaders       = [("Authorization","<REDACTED>"),("User-Agent","hDav-using application")]
  path                 = "/remote.php/dav/files/<USERNAME>%40fz-juelich.de/dataladstore/PPMI_inm7_cat12.8.1/git-annex-webdav-tmp-test"
  queryString          = ""
  method               = "PUT"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutNone
  requestVersion       = HTTP/1.1
  proxySecureMode      = ProxySecureWithConnect
}
 (StatusCodeException (Response {responseStatus = Status {statusCode = 401, statusMessage = "Unauthorized"}, responseVersion = HTTP/1.1, responseHeaders = [("Server","nginx/1.19.1"),("Date","Tue, 02 Jul 2024 22:06:46 GMT"),("Content-Type","application/xml; charset=utf-8"),("Content-Length","415"),("Connection","keep-alive"),("Set-Cookie","route=1719958007.834.93.674421; Expires=Tue, 02-Jul-24 23:06:46 GMT; Max-Age=3600; Path=/; Secure; HttpOnly"),("X-Content-Type-Options","nosniff"),("X-XSS-Protection","0"),("X-Robots-Tag","none"),("X-Frame-Options","SAMEORIGIN"),("X-Download-Options","noopen"),("X-Permitted-Cross-Domain-Policies","none"),("Set-Cookie","oc08853b0384=1qr9fn3ci5hce4jds744hhh7ed; path=/; secure; HttpOnly; SameSite=Strict"),("Expires","Thu, 19 Nov 1981 08:52:00 GMT"),("Cache-Control","no-store, no-cache, must-revalidate"),("Pragma","no-cache"),("Set-Cookie","oc_sessionPassphrase=%2BSL5svcq28zpljAZ5EnVyCtFm0l%2BwW6A7%2Bk4oJ%2Fa%2BiWxw9G%2BlG436qimhA3A9REurmOrnCHbppbxgzQ%2FNxH90VPgeThbqsVQhlJCBKBv%2Bjgm2XCZMYdBydew9St99o26; expires=Tue, 02-Jul-2024 22:26:46 GMT; Max-Age=1200; path=/; secure; HttpOnly; SameSite=Strict"),("Content-Security-Policy","default-src 'none';"),("Set-Cookie","oc08853b0384=3v95mkavg0jdi9qe1qofnkv3pp; path=/; secure; HttpOnly; SameSite=Strict"),("WWW-Authenticate","Bearer realm=\"sciebo\""),("WWW-Authenticate","Basic realm=\"sciebo\", charset=\"UTF-8\""),("Strict-Transport-Security","max-age=15724800; includeSubDomains")], responseBody = (), responseCookieJar = CJ {expose = [Cookie {cookie_name = "oc08853b0384", cookie_value = "3v95mkavg0jdi9qe1qofnkv3pp", cookie_expiry_time = 3023-11-03 00:00:00 UTC, cookie_domain = "fz-juelich.sciebo.de", cookie_path = "/", cookie_creation_time = 2024-07-02 22:06:46.997523916 UTC, cookie_last_access_time = 2024-07-02 22:06:46.997523916 UTC, cookie_persistent = False, cookie_host_only = True, cookie_secure_only = True, cookie_http_only = True},Cookie {cookie_name = "oc_sessionPassphrase", cookie_value = "%2BSL5svcq28zpljAZ5EnVyCtFm0l%2BwW6A7%2Bk4oJ%2Fa%2BiWxw9G%2BlG436qimhA3A9REurmOrnCHbppbxgzQ%2FNxH90VPgeThbqsVQhlJCBKBv%2Bjgm2XCZMYdBydew9St99o26", cookie_expiry_time = 2024-07-02 22:26:46.997523916 UTC, cookie_domain = "fz-juelich.sciebo.de", cookie_path = "/", cookie_creation_time = 2024-07-02 22:06:46.997523916 UTC, cookie_last_access_time = 2024-07-02 22:06:46.997523916 UTC, cookie_persistent = True, cookie_host_only = True, cookie_secure_only = True, cookie_http_only = True},Cookie {cookie_name = "route", cookie_value = "1719958007.834.93.674421", cookie_expiry_time = 2024-07-02 23:06:46.997523916 UTC, cookie_domain = "fz-juelich.sciebo.de", cookie_path = "/", cookie_creation_time = 2024-07-02 22:06:46.997523916 UTC, cookie_last_access_time = 2024-07-02 22:06:46.997523916 UTC, cookie_persistent = True, cookie_host_only = True, cookie_secure_only = True, cookie_http_only = True}]}, responseClose' = ResponseClose, responseOriginalRequest = Request {
  host                 = "fz-juelich.sciebo.de"
  port                 = 443
  secure               = True
  requestHeaders       = [("Authorization","<REDACTED>"),("User-Agent","hDav-using application")]
  path                 = "/remote.php/dav/files/>USERNAME>%40fz-juelich.de/dataladstore/PPMI_inm7_cat12.8.1/git-annex-webdav-tmp-test"
  queryString          = ""
  method               = "PUT"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutNone
  requestVersion       = HTTP/1.1
  proxySecureMode      = ProxySecureWithConnect
}
}) "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<d:error xmlns:d=\"DAV:\" xmlns:s=\"http://sabredav.org/ns\">\n  <s:exception>Sabre\\DAV\\Exception\\NotAuthenticated</s:exception>\n  <s:message>No public access to this resource., Username or password was incorrect, No 'Authorization: Bearer' header found. Either the client didn't send one, or the server is mis-configured, Username or password was incorrect</s:message>\n</d:error>\n"): user error
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe
enableremote: 1 failed

❯ datalad siblings
.: here(+) [git]
.: cat12.8.1_in-storage(+) [ora]
.: origin(-) [datalad-annex::https://fz-juelich.sciebo.de/public.php/webdav?type=webdav&encryption=none&exporttree=yes&url={noquery} (git)]
.: cat12.8.1_out-storage(+) [ora]


❯ git annex info sciebo-storage
uuid: 7e9b6965-8a34-4ad5-b9df-755046011f1d
description: sciebo-storage
trust: semitrusted
remote annex keys: 112995
remote annex size: 89.28 gigabytes

I have tried to reproduce this, but I observe the failure already during the clone call rather than the siblings enable call. As I have missed previous office hours where this came up already, I'm unsure about the background of this problem. I also didn't find any documentation on this approach using public links. If this method is supposed to work, I think we should also add documentation about it.

@mslw
Copy link
Collaborator

mslw commented Jul 5, 2024

The documentation is probably in KBI0028.

To me this is about how Nextcloud exposes the shared folders through webdav:

  • The clone from .../public.php/webdav URL relies on the share token (part of the share link generated by nextcloud) and password being provided as user & password credentials (see the KBI)
  • The -storage remote still has the /remote.php/dav/files/<USERNAME> URL recorded (see the "unauthorized" response reported by git-annex), which has no chance to succeed with the above credentials
  • A probable solution would be to enableremote with the .../public.php/webdav URL.

There are two caveats regarding passwords which do not directly apply here, but could apply in general:

  • Had the share been public (no share password) we would need kind-of an anonymous login which is not currently supported, but a workaround is to provide a random string as password Support credential with no secret #438
  • We have the datalad realm-based credentials system, and we have the special remote credentials cached by git-annex in user-rw-only .git/annex/creds (see annex.cachecreds in git-annex manpage). The file gets written after successful enableremote. So usually things should just work, but there is some room for play here.

As a side note, I guess the user surprise is mostly due to the fact that the command suggested explicitly by DataLad ("enable with...") does not work. Also, there is no URL reconfiguration done by DataLad (like it does for RIA stores), all is left for the user. And neither happens because these are fairly unusual circumstances, in terms of setup. So apart from the fact that we can probably explain this situation (to be seen really), we can wonder whether this should be documented better or whether DataLad behavior needs to be changed.

@mslw
Copy link
Collaborator

mslw commented Jul 5, 2024

I observe the failure already during the clone call

It seems that the share link must allow write access - when using only "Download / view" permissions, I also observed a failure on clone. For reasons we might want to explore, there is a PUT call happenning when testing WebDAV server (note: although this step clones a git repo, datalad-annex special remote uses git-annex for intermediate steps):

❱ datalad clone "webdavs://fz-juelich.sciebo.de/public.php/webdav" test_publink_webdav_clone
...
fatal: CommandError(CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false annex initremote origin type=webdav encryption=none exporttree=yes url=https://fz-juelich.sciebo.de/public.php/webdav -c annex.dotfiles=true' failed with exitcode 1 under /tmp/test_publink_webdav_clone/.git/dl-repoannex/origin/repoannex [out: 'initremote origin (testing WebDAV server...)
failed'] [err: 'git-annex: WebDAV test failed: HttpExceptionRequest Request {
  host                 = "fz-juelich.sciebo.de"
  port                 = 443
  secure               = True
  requestHeaders       = [("Authorization","<REDACTED>"),("User-Agent","hDav-using application")]
  path                 = "/public.php/webdav/git-annex-webdav-tmp-test"
  queryString          = ""
  method               = "PUT"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutNone
  requestVersion       = HTTP/1.1
  proxySecureMode      = ProxySecureWithConnect
}

When the public link allows writing (I used "Download / View / Upload / Edit" to be sure), I am able to complete the reproducer.

Setup as in the first code block in OP until datalad clone followed by:

❱ datalad clone "webdavs://fz-juelich.sciebo.de/public.php/webdav" test_publink_webdav_clone                                            1 !
[INFO   ] Remote origin uses a protocol not supported by git-annex; setting annex-ignore
[INFO   ] access to 1 dataset sibling sciebo-storage not auto-enabled, enable with:
|               datalad siblings -d "/tmp/test_publink_webdav_clone" enable -s sciebo-storage
install(ok): /tmp/test_publink_webdav_clone (dataset)

❱ cd /tmp/test_publink_webdav_clone

❱ git annex initremote sciebo-storage-public --sameas sciebo-storage type=webdav exporttree=yes encryption=none url=https://fz-juelich.sciebo.de/public.php/webdav
initremote sciebo-storage-public (testing WebDAV server...) ok
(recording state in git...)

❱ datalad get -s sciebo-storage-public 1103_3.tgz
get(ok): 1103_3.tgz (file) [from sciebo-storage-public...]

Note: in the above, I am defining a new remote with a public url, sameas the original remote. This can be done by the consumer (possibly with --private option) but probably this could also be done by the producer, leaving the consumer only to enable it.

@adswa
Copy link
Member Author

adswa commented Jul 16, 2024

Thanks much for the analysis. I believe at least some confusion can be addressed with documentation changes. I'll update the KBI with some bits and pieces.

@adswa adswa self-assigned this Jul 16, 2024
@adswa
Copy link
Member Author

adswa commented Jul 16, 2024

ah shoot, I believe I still have a knot in my brain here. I wanted to test the different levels of write access that sciebo allows, but I again couldn't get past the cloning. I suspect there is something credential-related going wrong for me. Sorry, I thought I can add a write-up real quick - it seems I have to dig around a bit longer. It also reminded me of how badly datalad-handbook/book#760 is needed

@adswa
Copy link
Member Author

adswa commented Jul 16, 2024

Ok, another attempt after lunch resolved my confusion. I messed up between the webdav credentials needed for the producer to push dataset updates, and the public link credentials used by the consumer.

I checked which public option works; it is only the one with upload and edit:

Image

I can confirm that both consumer or producer can run the initremote call.

@adswa
Copy link
Member Author

adswa commented Jul 16, 2024

For reasons we might want to explore, there is a PUT call happenning when testing WebDAV server

I believe this originates in git-annex webdav protocol, specifically this function: https://git.joeyh.name/index.cgi/git-annex.git/tree/Remote/WebDAV.hs#n307

The file in question does indeed (temporarily) get created:
Image

@mih
Copy link
Member

mih commented Jul 16, 2024

I filed https://git-annex.branchable.com/todo/Read-only_support_for_webdav/

@mslw
Copy link
Collaborator

mslw commented Jul 16, 2024

After today's office hour, here's my overview of the situation, trying to isolate friction points and see if there are action items on DataLad side.

The use case involves a WebDAV sibling pair, created using datalad create-sibling-webdav --mode filetree. The sibling pair is set up with a user-specific /remote.php/dav/files/USERNAME/ WebDAV URL. The Sciebo folder is later shared as a password-protected public link. Subsequently, it is cloned by another user via a /public.php/webdav URL (the URL patterns and required authentication are a result of Nextcloud implementation).

Problem 1: enabling the storage sibling after clone. DataLad shows the user who clones via the public URL the standard message "access to 1 dataset sibling sciebo-storage not auto-enabled, enable with: datalad siblings ...". Running the printed command fails, because the the remote is configured with the /remote.php/dav/files/USERNAME/ URL, inaccessible to anyone other than the original user.

  • workaround: a special remote needs to be initialized / enabled for the /public.php/webdav URL instead; a component of the share link becomes the username and the share password is the password
  • potential TODO: document the workaround -- done in Amend WEBDav KBI with caveats on sharing via public links psychoinformatics-de/knowledge-base#127
  • potential TODO: change the admonition shown to the user
    • this may be hard to do because we would have to compare URLs and assume things which are Sciebo/Nextcloud specific
  • potential alternative TODO: reconfigure the sibling based on the clone URL, like RIA
    • this may be even harder: same concerns as above, plus the risk of not always doing the right thing (like RIA)

Problem 2: the need for write access. This wasn't part of the original description, but we quickly discovered that both cloning from webdav (which implicitly includes enabling a webdav remote as part of the datalad-annex protocol), and enabling the storage sibling explicitly, requires write access. This is because git-annex WebDAV special remote tests write access when it is enabled.

  • workaround: document the need for write access -- done in Amend WEBDav KBI with caveats on sharing via public links psychoinformatics-de/knowledge-base#127
  • potential TODO: file a TODO for git-annex to support read-only WebDAV access -- done in https://git-annex.branchable.com/todo/Read-only_support_for_webdav/
    • note: this comment on the special remote page suggests that httpalso special remote can be used for read-only webdav access. However, it currently does not support credentials. If it supported credentials, it could be used in the workaround for problem 1 (enabling the storage remote) but it would not affect clone
    • note: the same comment suggested that it should be possible to add readonly support to the webdav special remote, just that there is no use case; however we might just have one. This should affect both clone and enableremote.
  • potential TODO: if the above is implemented in git-annex, adjust DataLad functionality or documentation, if needed

@adswa adswa removed their assignment Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants