Adding `git_remote` fallback for `gitlab_remote` use without full API access (Resolves #604) #608

dgkf · 2021-05-01T00:10:24Z

As described in #604, the current gitlab_remote makes use of API endpoints that are not available to tokens generated for use within gitlab CI (stored in the $CI_JOB_TOKEN env var), throwing errors when these tokens are used.

This PR adds code to first ping the API at a generic endpoint (querying for /version). If that request fails and isTRUE(getOption("remotes.gitlab_git_fallback", TRUE)), a git_remote is returned.

If git2r is available, a credentials object is created from the auth_token. Otherwise, the token is embedded in the url in the form of http://gitlab-ci-token:[email protected]/namespace/project.git.

This allows install_gitlab to be used within CI jobs on non-public deployments of GitLab without the creation and embedding of personal tokens. Pipeline engineers need only to run export GITLAB_PAT=$CI_JOB_TOKEN prior to installing remotes.

Changelog

install_gitlab will defer to using install_git when authentication doesn't provide adequate API access to download a source archive
- will create a git2r::cred_user_pass if git2r is available
  - Username is set to gitlab-ci-token. When providing a PAT, GitLab ignores the username unless one is using a CI_JOB_TOKEN token within a CI job, in which case it must be gitlab-ci-token. Because of this, it covers both scenarios to pass gitlab-ci-token in both cases. Unfortunately I wasn't able to find any documentation to reference for this behavior, it was only narrowed down through testing.
- otherwise, will embed authentication in the git url (http://<username>:<password>@host.com/repo.git)
Because urls may contain access tokens, I wrote some handlers to strip these from messages and from the "Remote*" fields in DESCRIPTION (Design Feedback Request: is it preferred to keep the full url for updates, or to exclude the password so that it isn't leaked through things like renv?)
- git() updated to take an optional display_args command to provide output using censored git url as to not display access tokens in console output. This is used in remote_download.xgit_remote to display git commands without printing passwords to console.
- parse_git_url() updated to also extract a username and password, though it might be worth taking on a dependency to handle url parsing since this is regex is getting pretty involved
- git_anon_url() introduced to strip out username and password components from a url
- git_censored_url() introduced to replace the password component with asterisks
Added a bunch of tests for url parsing, anonymization and censoring
Added some tests for falling back to a git_remote when the GitLab host API requests fail
Added git_fallback = getOption("remotes.gitlab_git_fallback", TRUE) parameter to install_gitlab
Updated NEWS to describe new behavior

This is an initial pass just to experiment with implementation. Please let me know if this looks like a reasonable approach, and then I can polish this PR with

Documenting new behaviors
Add tests
Get tests working in CI

jimhester · 2021-06-02T16:45:09Z

My main worry with making the GitLab remote more complex is our team doesn't use GitLab, so it is possible this will break in the future without us realizing it.

We would definitely need some tests to avoid this.

dgkf · 2021-06-03T19:55:51Z

Thanks @jimhester - I'm happy to add in tests as much as possible. If you think the implementation looks sound, then I can get to work on tests and updating docs. I was just hesitant to invest more time fleshing out the peripheral bits until getting some impressions on the approach.

…t2r_remote

…-api-fallback

…v/gitlab-git-api-fallback-2

dgkf · 2021-07-30T20:10:32Z

@jimhester - this PR is ready whenever you have an opportunity to take a look. The only CI errors are ones that also exist on master. Overall, the design feels a bit clunky, but I'm struggling to come up with anything better.

To trace through the changes, it is easiest to start with install_gitlab's call out to gitlab_to_git_remote, and then look at the uses of $url in install_git.R as the url may contain a url-embedded username and token.

Just to highlight a critical design choice:

Design Feedback Request: is it preferred to keep the full url including username and password (https://dgkf:[email protected]/...) when storing remote_metadata or printing to console?

the good: this would allow remotes::update_packages to update a package which requires authentication
the bad: this might put a user at risk of leaking an access token through things like execution logs if printed to console or an renv lockfile if included in a DESCRIPTION file.

For now, I chose to scrub the username and password from the url before this is added to a DESCRIPTION file to prioritize safety of access tokens over the update experience.

The self-hosted GitLab issues are currently a big pain point at my org, so some help in moving this forward would be greatly appreciated.

statnmap · 2021-09-23T08:58:56Z

Thank you for this PR. This is a must when working on private GitLab instances.
I approve its improvements.

I tried it in a CI instance with the following classical use cases I guess. The PR solves the problems encountered with current version of {remotes}. This can be accepted as is.

Use `CI_JOB_TOKEN` set up with {git2r}

Clone and `install_local()`

current {remotes} OK
PR OK

tempclone <- tempfile(pattern = "conjdown")
dir.create(tempclone)

git2r::clone(url = "https://git.lab.sspcloud.fr/propre-conj/conjdown",
             local_path = tempclone,
             credentials = git2r::cred_user_pass(username = "gitlab-ci-token",
                                                 password = Sys.getenv("CI_JOB_TOKEN"))
)

remotes::install_local(tempclone)

install_git()

current {remotes} FAIL
PR OK

options(remotes.git_credentials = git2r::cred_user_pass("gitlab-ci-token", Sys.getenv("CI_JOB_TOKEN")))
  remotes::install_git("https://myprivategitlab.com/user/repos")

install_gitlab()

current {remotes} FAIL
PR OK with message

  remotes::install_gitlab(host = "https://myprivategitlab.com",
                          repo = "user/repos",
                          auth_token = Sys.getenv("CI_JOB_TOKEN"))

message:

auth_token does not have scopes 'read-repository' and 'api' for host
  'https://myprivategitlab.com" required to install using
  gitlab_remote.
Attempting git_remote

install from another package DESCRIPTION file with git2r creds and `git::`

current {remotes} FAIL
PR OK

DESCRIPTION file

Imports: 
    repos
Remotes:
    git::https://myprivategitlab.com/user/repos"

options(remotes.git_credentials = git2r::cred_user_pass("gitlab-ci-token", Sys.getenv("CI_JOB_TOKEN")))
remotes::install_deps(dependencies = TRUE)

Set `GITLAB_PAT`

install_gitlab()

current {remotes} FAIL
PR OK with message

Sys.setenv(GITLAB_PAT = Sys.getenv("CI_JOB_TOKEN"))
  remotes::install_gitlab(host = "https://myprivategitlab.com",
                          repo = "user/repos")

message:

Using GitLab PAT from envvar GITLAB_PAT
auth_token does not have scopes 'read-repository' and 'api' for host
  'https://myprivategitlab.com" required to install using
  gitlab_remote.
Attempting git_remote

install from another package DESCRIPTION file with GITLAB_PAT and `gitlab::`

This is a try. I know this is not the aim of this PR, but that could be a future enhancement, maybe.

current {remotes} FAIL
PR FAIL.

DESCRIPTION file

Imports: 
    repos
Remotes:
    gitlab::https://myprivategitlab.com/user/repos"

Sys.setenv(GITLAB_PAT = Sys.getenv("CI_JOB_TOKEN"))
remotes::install_deps(dependencies = TRUE)

Error

Error: Unknown remote type: gitlab
  Invalid git repo specification: 'https://myprivategitlab.com/user/repos'
Execution halted

statnmap · 2021-09-23T09:00:34Z

Do you think that it could be a good idea to allow gitlab_pat() to also look for CI_JOB_TOKEN environment variable if GITLAB_PAT is empty ? This may solve a lot pain using CI.

dgkf · 2021-09-27T20:57:05Z

Thanks for considering this PR, @jimhester.

Just wanted to highlight this bit in the PR thread for your consideration. I tried my best to dig into how remotes/renv use the Remotes* fields in the DESCRIPTION file, but wasn't totally sure what the preferred solution would be for access tokens in urls and want to make sure it was brought to your attention in case there are any security concerns with how it's handled.

Design Feedback Request: is it preferred to keep the full url including username and password (https://dgkf:[email protected]/...) when storing remote_metadata or printing to console?

the good: this would allow remotes::update_packages to update a package which requires authentication

the bad: this might put a user at risk of leaking an access token through things like execution logs if printed to console or an renv lockfile if included in a DESCRIPTION file

Currently user-specific url components are stripped to minimize any printing/saving of tokens.

jimhester · 2021-09-28T14:40:52Z

I guess we should strip them, though it would then break updating packages later.

However if you still set the GITLAB_PAT when you run update_packages() would the update work?

dgkf added 2 commits April 30, 2021 16:50

adding check for api access and git fallback

8b18758

always try gitlab_remote when !git_fallback

5985ecd

dgkf and others added 20 commits June 15, 2021 16:51

adding error handling for http requests during remote_package_name.gi…

8bed1e4

…t2r_remote

update NEWS; bump dev version

8be0803

update NEWS

0160e2c

breaking out dev note

a914a74

Merge branch 'master' into 625-git-remote-http

17c49bc

updating inst/ content; minor whitespace fix

e79a8b5

Merge branch 'master' of github.com:r-lib/remotes into dev/gitlab-git…

3fdd93f

…-api-fallback

Merge branch '625-git-remote-http' of github.com:dgkf/remotes into de…

37d359b

…v/gitlab-git-api-fallback-2

simplifying git fallback

b0d69b8

adding more helpful fallback messages

1b8e961

adding more helpful fallback messages

795b46a

adding wrap helper function

00fc529

breaking out git fallback into separate remote constructor

ab20096

fixing undef var

915c08c

handling sha/ref reconcilliation

d9ca73a

adding parsing for username:password in git url

8b9911f

censoring password in git command message

409ea84

actually censoring messages in git output

f6af232

enabling censored clone output

4f49ce0

adding tests

e90a52c

dgkf changed the title ~~Adding git_remote fallback for gitlab_remote use without full API access (Resolves #604)~~ WIP: Adding git_remote fallback for gitlab_remote use without full API access (Resolves #604) Jun 25, 2021

dgkf added 4 commits June 25, 2021 12:34

improving documentation of new behavios

b477334

adding tests; docs

4021939

Merge branch 'master' into dev/gitlab-git-api-fallback-2

8b68875

updating NEWS

d7d8709

dgkf changed the title ~~WIP: Adding git_remote fallback for gitlab_remote use without full API access (Resolves #604)~~ Adding git_remote fallback for gitlab_remote use without full API access (Resolves #604) Jul 9, 2021

dgkf added 4 commits July 9, 2021 18:31

updating DESCRIPTION; tests

37e177b

rerunning make

a6d1ae0

using atomic return instead of function for mocks

68980b2

fixing stubbing at depth into fn; yo dawg i heard you like mocks

1917f02

jimhester added 2 commits September 24, 2021 10:26

Merge branch 'master' into dev/gitlab-git-api-fallback

18e0b37

Merge branch 'master' into dev/gitlab-git-api-fallback

aa6f2a2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding `git_remote` fallback for `gitlab_remote` use without full API access (Resolves #604) #608

Adding `git_remote` fallback for `gitlab_remote` use without full API access (Resolves #604) #608

dgkf commented May 1, 2021 •

edited

Loading

jimhester commented Jun 2, 2021 •

edited

Loading

dgkf commented Jun 3, 2021

dgkf commented Jul 30, 2021 •

edited

Loading

statnmap commented Sep 23, 2021 •

edited

Loading

statnmap commented Sep 23, 2021

dgkf commented Sep 27, 2021

jimhester commented Sep 28, 2021

Adding git_remote fallback for gitlab_remote use without full API access (Resolves #604) #608

Are you sure you want to change the base?

Adding git_remote fallback for gitlab_remote use without full API access (Resolves #604) #608

Conversation

dgkf commented May 1, 2021 • edited Loading

Changelog

jimhester commented Jun 2, 2021 • edited Loading

dgkf commented Jun 3, 2021

dgkf commented Jul 30, 2021 • edited Loading

statnmap commented Sep 23, 2021 • edited Loading

Use CI_JOB_TOKEN set up with {git2r}

Clone and install_local()

install_git()

install_gitlab()

install from another package DESCRIPTION file with git2r creds and git::

Set GITLAB_PAT

install_gitlab()

install from another package DESCRIPTION file with GITLAB_PAT and gitlab::

statnmap commented Sep 23, 2021

dgkf commented Sep 27, 2021

jimhester commented Sep 28, 2021

Adding `git_remote` fallback for `gitlab_remote` use without full API access (Resolves #604) #608

Adding `git_remote` fallback for `gitlab_remote` use without full API access (Resolves #604) #608

dgkf commented May 1, 2021 •

edited

Loading

jimhester commented Jun 2, 2021 •

edited

Loading

dgkf commented Jul 30, 2021 •

edited

Loading

statnmap commented Sep 23, 2021 •

edited

Loading

Use `CI_JOB_TOKEN` set up with {git2r}

Clone and `install_local()`

install from another package DESCRIPTION file with git2r creds and `git::`

Set `GITLAB_PAT`

install from another package DESCRIPTION file with GITLAB_PAT and `gitlab::`