Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get closest commit #389

Open
wants to merge 51 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
d44a6c9
add function to get the date of a GitHub commit
mihem Jan 18, 2025
7b2967b
add function to download all commits from a repo
mihem Jan 18, 2025
f98f2ef
add function to get the closest commit to a date
mihem Jan 18, 2025
9c1f983
get date of git_pkg in fetchgit
mihem Jan 18, 2025
ce15077
better name for date
mihem Jan 18, 2025
f6e7dbb
add function to resolve package commits
mihem Jan 18, 2025
389dc9c
try to get commit hash for each package
mihem Jan 18, 2025
5633fe0
fix variable name
mihem Jan 18, 2025
8facc65
trycatch commit, fallback HEAD
mihem Jan 18, 2025
a6dadd8
fix missing variable
mihem Jan 18, 2025
9f23c5e
better name for get_closest_commit
mihem Jan 18, 2025
9090077
limit download_all_commits to 300 most recent commits
mihem Jan 18, 2025
ad52892
refactor API requests to use authenticated GitHub API calls
mihem Jan 18, 2025
32173fa
Revert "refactor API requests to use authenticated GitHub API calls"
mihem Jan 18, 2025
857c020
add tryCatch for get_commit_date
mihem Jan 19, 2025
be8160c
fix problem with @ or #
mihem Jan 19, 2025
aa5be83
try fix issue with @ in remote pkgs
mihem Jan 19, 2025
20443e5
remove unused code
mihem Jan 19, 2025
156fceb
fix fallback to today not NULL
mihem Jan 19, 2025
bd5ce0e
fix commit date default
mihem Jan 19, 2025
0be4495
try to use gh
mihem Jan 19, 2025
09bd894
update NAMESPACE and DESCRIPTION
mihem Jan 19, 2025
9231f88
replace gh with curl for get_commit_date
mihem Jan 22, 2025
93aaad2
replace gh with curl in donwload_all_commits
mihem Jan 22, 2025
c83001e
remove gh from DESCRIPTION, update NAMESPACE
mihem Jan 22, 2025
efe8418
improve get_commit_date function
mihem Jan 22, 2025
0e5259a
improve download_all_commits function
mihem Jan 22, 2025
f9d53bd
accidently removed remote_pkgs_names_and_refs
mihem Jan 22, 2025
c8157fb
try to fix download_all_commits
mihem Jan 22, 2025
9e46d24
added check for get_commit_date
mihem Jan 22, 2025
a0cff2c
try to fix download_all_commits
mihem Jan 22, 2025
0acb80b
replace curl_download with curl_fetch_memory
mihem Jan 22, 2025
f348b91
update NAMESPACE
mihem Jan 22, 2025
98163d8
use curl_fetch_memory
mihem Jan 22, 2025
78022de
replace warning with message
mihem Jan 22, 2025
dc13751
replace warning with message get_commit_date
mihem Jan 25, 2025
0380e48
only use get_commit_date for github
mihem Jan 25, 2025
e9e5b04
write test for get_commit_date
mihem Jan 26, 2025
3b0320e
add test for get_commit_date for no Github token
mihem Jan 26, 2025
dd21bb7
try to use gh token in workflows
mihem Jan 26, 2025
51407ba
try to fix github_pat for workflows
mihem Jan 26, 2025
8b92131
add test for download_all_commits
mihem Jan 26, 2025
716cd44
fix bug and improve performance in download_all_commits
mihem Jan 26, 2025
89eef89
add test for resolve_package_commit
mihem Jan 26, 2025
2642687
try fix failed test because of new rev hash
mihem Jan 26, 2025
da30de9
fix test using new detault_datathin.nix
mihem Jan 26, 2025
51a34c2
change date back in default_datathin.nix
mihem Jan 26, 2025
45a9d0f
improve download_all_commits
mihem Jan 26, 2025
94fd7bf
fix download_all_commits description
mihem Jan 26, 2025
01744de
fix test of download_all_commits
mihem Jan 26, 2025
45b84eb
fix tests
mihem Jan 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/workflows/tests-r-via-nix.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,11 @@ jobs:
steps:
- uses: actions/checkout@v4

- name: Create .Renviron
run: |
echo "GITHUB_PAT=${{ secrets.GITHUB_TOKEN }}" >> ~/.Renviron
shell: bash

- uses: cachix/install-nix-action@v25
with:
nix_path: nixpkgs=https://github.com/rstats-on-nix/nixpkgs/archive/refs/heads/r-daily.tar.gz
Expand Down
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,10 @@ export(tar_nix_ga)
export(with_nix)
importFrom(codetools,checkUsage)
importFrom(codetools,findGlobals)
importFrom(curl,curl_fetch_disk)
importFrom(curl,curl_fetch_memory)
importFrom(curl,handle_reset)
importFrom(curl,handle_setheaders)
importFrom(curl,has_internet)
importFrom(curl,new_handle)
importFrom(jsonlite,fromJSON)
Expand Down
179 changes: 167 additions & 12 deletions R/fetchers.R
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
remotes
)

if (is.list(remotes) & length(remotes) == 0) {

Check warning on line 31 in R/fetchers.R

View workflow job for this annotation

GitHub Actions / style_pkg

file=R/fetchers.R,line=31,col=24,[vector_logic_linter] Conditional expressions require scalar logical operators (&& and ||)
# if no remote dependencies

output <- main_package_expression
Expand Down Expand Up @@ -61,7 +61,7 @@
imports,
remotes = NULL) {
# If there are remote dependencies, pass this string
flag_remote_deps <- if (is.list(remotes) & length(remotes) == 0) {

Check warning on line 64 in R/fetchers.R

View workflow job for this annotation

GitHub Actions / style_pkg

file=R/fetchers.R,line=64,col=44,[vector_logic_linter] Conditional expressions require scalar logical operators (&& and ||)
""
} else {
# Extract package names
Expand Down Expand Up @@ -166,10 +166,11 @@

#' Finds dependencies of a package from the DESCRIPTION file
#' @param path path to package
#' @param commit_date date of commit
#' @importFrom utils untar
#' @return Atomic vector of packages
#' @noRd
get_imports <- function(path) {
get_imports <- function(path, commit_date) {
tmpdir <- tempdir()
on.exit(unlink(tmpdir, recursive = TRUE, force = TRUE), add = TRUE)

Expand Down Expand Up @@ -224,17 +225,10 @@

remote_pkgs_names <- remote_pkgs_names_and_refs |>
sapply(function(x) x[[1]])

# Check if we have a list of lists of two elements: a package name
# and a ref. If not, add "HEAD" to it.
remote_pkgs_refs <- lapply(remote_pkgs_names_and_refs, function(sublist) {
if (length(sublist) == 1) {
c(sublist, "HEAD")
} else {
sublist
}
}) |>
sapply(function(x) x[[2]])
# try to get commit hash for each package if not already provided
remote_pkgs_refs <- lapply(remote_pkgs_names_and_refs, function(x) {
resolve_package_commit(x, commit_date, remotes)
})

urls <- paste0(
"https://github.com/",
Expand Down Expand Up @@ -431,3 +425,164 @@
remote_package_names <- sapply(remotes, `[[`, "package_name")
return(remote_package_names)
}

#' get_commit_date Retrieves the date of a commit from a Git repository
#' @param repo The GitHub repository (e.g. "r-lib/usethis")
#' @param commit_sha The commit hash of interest
#' @return A character. The date of the commit.
#' @importFrom curl new_handle handle_setheaders curl_fetch_memory
#' @importFrom jsonlite fromJSON
#' @noRd
get_commit_date <- function(repo, commit_sha) {
url <- paste0("https://api.github.com/repos/", repo, "/commits/", commit_sha)
h <- new_handle()

token <- Sys.getenv("GITHUB_PAT")
token_pattern <- "^(gh[ps]_[a-zA-Z0-9]{36}|github_pat_[a-zA-Z0-9]{22}_[a-zA-Z0-9]{59})$"

if (grepl(token_pattern, token)) {
handle_setheaders(h, Authorization = paste("token", token))
} else {
message("No GitHub Personal Access Token found. Please set GITHUB_PAT in your environment. Falling back to unauthenticated API request.")

Check warning on line 446 in R/fetchers.R

View workflow job for this annotation

GitHub Actions / style_pkg

file=R/fetchers.R,line=446,col=101,[line_length_linter] Lines should not be more than 100 characters. This line is 141 characters.
}

tryCatch({
response <- curl_fetch_memory(url, handle = h)
if (response$status_code != 200) {
stop("API request failed with status code: ", response$status_code)
}
commit_data <- fromJSON(rawToChar(response$content))
if (is.null(commit_data$commit$committer$date)) {
stop("Invalid response format: missing commit date")
}
commit_data$commit$committer$date
}, error = function(e) {
stop("Failed to get commit date for ", commit_sha, ": ", e$message)
})
}

#' download_all_commits Downloads commits (maximum 1000) from a GitHub repository
#' @param repo The GitHub repository (e.g. "r-lib/usethis")
#' @param date The target date to find the closest commit
#' @return A data frame with commit SHAs and dates
#' @importFrom curl new_header handle_setheaders curl_fetch_memory
#' @importFrom jsonlite fromJSON
#' @noRd
download_all_commits <- function(repo, date) {
base_url <- paste0("https://api.github.com/repos/", repo, "/commits")
h <- new_handle()

token <- Sys.getenv("GITHUB_PAT")
token_pattern <- "^(gh[ps]_[a-zA-Z0-9]{36}|github_pat_[a-zA-Z0-9]{22}_[a-zA-Z0-9]{59})$"

if (grepl(token_pattern, token)) {
handle_setheaders(h, Authorization = paste("token", token))
} else {
message("No GitHub Personal Access Token found. Please set GITHUB_PAT in your environment. Falling back to unauthenticated API request.")
}

# Limit to 10 pages of 100 commits each, so 1000 commits in total

Check warning on line 484 in R/fetchers.R

View workflow job for this annotation

GitHub Actions / style_pkg

file=R/fetchers.R,line=484,col=101,[line_length_linter] Lines should not be more than 100 characters. This line is 141 characters.
per_page <- 100
max_pages <- 10
max_commits <- per_page * max_pages

# Pre-allocate results data frame
all_commits <- data.frame(
sha = character(max_commits),
date = as.POSIXct(rep(NA, max_commits))
)
commit_count <- 0

for (page in 1:max_pages) {
url <- paste0(base_url, "?per_page=", per_page, "&page=", page)

tryCatch(
{
response <- curl_fetch_memory(url, handle = h)
if (response$status_code != 200) {
stop("API request failed with status code: ", response$status_code)
}

commits <- fromJSON(rawToChar(response$content))
if (!is.list(commits) || length(commits) == 0) break

# if no commits are found, break the loop
n_commits <- length(commits$sha)
if (n_commits == 0) break


idx <- (commit_count + 1):(commit_count + n_commits)
all_commits$sha[idx] <- commits$sha
all_commits$date[idx] <- as.POSIXct(
commits$commit$committer$date,
format = "%Y-%m-%dT%H:%M:%OSZ"
)

commit_count <- commit_count + n_commits

# if the date of the last commit is before the target date, break the loop
if (min(all_commits$date, na.rm = TRUE) < date) break

},
error = function(e) {
stop("Failed to download commit data: ", e$message)
}
)
}

# Return only the rows with actual data
all_commits[1:commit_count, ]
}

#' get_closest_commit Finds the closest commit to a specific date
#' @param commits_df A data frame with commit SHAs and dates
#' @param target_date The target date to find the closest commit
#' @return A data frame with the closest commit SHA and date
#' @noRd
get_closest_commit <- function(commits_df, target_date) {
# Convert target_date to POSIXct format
target_date <- as.POSIXct(target_date, format = "%Y-%m-%dT%H:%M:%OSZ")

# Filter commits before or on the target date
filtered_commits <- commits_df[commits_df$date <= target_date, ]

# If no commits found, raise an error
if (nrow(filtered_commits) == 0) {
stop("No commits found before or on the target date.")
}

# Find the closest commit by selecting the maximum date
closest_commit <- filtered_commits[which.max(filtered_commits$date), ]
return(closest_commit)
}

#' resolve_package_commit Resolves the commit SHA for a package based on a date
#' @param remote_pkg_name_and_ref A list containing the package name and optionally a ref
#' @param date The target date to find the closest commit
#' @param remotess A character vector of remotes
#' @return A character. The commit SHA of the closest commit to the target date or "HEAD" if API fails
#' @noRd
resolve_package_commit <- function(remote_pkg_name_and_ref, date, remotes) {

Check warning on line 565 in R/fetchers.R

View workflow job for this annotation

GitHub Actions / style_pkg

file=R/fetchers.R,line=565,col=101,[line_length_linter] Lines should not be more than 100 characters. This line is 102 characters.
# Check if remote is a list with a package name and a ref
if (length(remote_pkg_name_and_ref) == 2) {
# Keep existing ref if present
return(remote_pkg_name_and_ref[[2]])
} else if (length(remote_pkg_name_and_ref) == 1) {
# For packages without ref, try to find closest one by date
# fallback to HEAD if API fails
result <- tryCatch({
remotes_fetch <- remotes[grepl(remote_pkg_name_and_ref, remotes)]
all_commits <- download_all_commits(remotes_fetch, date)
closest_commit <- get_closest_commit(all_commits, date)
closest_commit$sha
},
error = function(e) {
message(paste0("Failed to get commit for ", remote_pkg_name_and_ref,
": ", e$message, "\nFalling back to HEAD"))
return("HEAD")
})
return(result)
} else {
stop("remote_pkg_name_and_ref must be a list of length 1 or 2")
}
}
21 changes: 19 additions & 2 deletions R/nix_hash.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ nix_hash <- function(repo_url, commit) {
}
}


#' Return the SRI hash of an URL with .tar.gz
#' @param url String with URL ending with `.tar.gz`
#' @return list with following elements:
Expand Down Expand Up @@ -94,7 +93,25 @@ hash_url <- function(url) {
paths <- list.files(path_to_src, full.names = TRUE, recursive = TRUE)
desc_path <- grep(file.path(list.files(path_to_src), "DESCRIPTION"), paths, value = TRUE)

deps <- get_imports(desc_path)
if (grepl("github", url)) {
repo_url_short <- paste(unlist(strsplit(url, "/"))[4:5], collapse = "/")
commit <- gsub(x = basename(url), pattern = ".tar.gz", replacement = "")
commit_date <- tryCatch(
{
get_commit_date(repo_url_short, commit)
},
error = function(e) {
message(paste0(
"Failed to get commit date for ", commit, ": ", e$message,
"\nFalling back to today"
))
return(Sys.Date())
}
)
}

deps <- get_imports(desc_path, commit_date)


return(
list(
Expand Down
2 changes: 1 addition & 1 deletion tests/testthat/_snaps/renv_helpers/default_datathin.nix
Original file line number Diff line number Diff line change
Expand Up @@ -347,7 +347,7 @@ let
name = "datathin";
src = pkgs.fetchgit {
url = "https://github.com/anna-neufeld/datathin";
rev = "HEAD";
rev = "58eb154609365fa7301ea0fa397fbf04dd8c28ed";
sha256 = "sha256-rtRpwFI+JggX8SwnfH4SPDaMPK2yLhJFTgzvWT+Zll4=";
};
propagatedBuildInputs = builtins.attrValues {
Expand Down
Loading
Loading