Skip to content

Commit

Permalink
new version, terra compatible
Browse files Browse the repository at this point in the history
  • Loading branch information
hzambran committed Jul 26, 2023
1 parent 70a30d0 commit 7f2b5f9
Show file tree
Hide file tree
Showing 6 changed files with 61 additions and 50 deletions.
6 changes: 3 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
Package: RFmerge
Type: Package
Title: Merging of Satellite Datasets with Ground Observations using Random Forests
Version: 0.2-0
Version: 0.2-2
Author: Mauricio Zambrano-Bigiarini [aut, cre, cph] (<https://orcid.org/0000-0002-9536-643X>), Oscar M. Baez-Villanueva [aut, cph], Juan Giraldo-Osorio [ctb]
Authors@R: c(person("Mauricio Zambrano-Bigiarini", email = "[email protected]", role=c("aut", "cre", "cph"), comment=c(ORCID = "0000-0002-9536-643X")), person("Oscar M. Baez-Villanueva", email = "[email protected]", role=c("aut", "cph")), person("Juan Giraldo-Osorio", email = "[email protected]", role=c("ctb")) )
Maintainer: Mauricio Zambrano-Bigiarini <[email protected]>
Description: S3 implementation of the Random Forest MErging Procedure (RF-MEP), which combines two or more satellite-based datasets (e.g., precipitation products, topography) with ground observations to produce a new dataset with improved spatio-temporal distribution of the target field. In particular, this package was developed to merge different Satellite-based Rainfall Estimates (SREs) with measurements from rain gauges, in order to obtain a new precipitation dataset where the time series in the rain gauges are used to correct different types of errors present in the SREs. However, this package might be used to merge other hydrological/environmental satellite fields with point observations. For details, see Baez-Villanueva et al. (2020) <doi:10.1016/j.rse.2019.111606>. Bugs / comments / questions / collaboration of any kind are very welcomed.
Description: S3 implementation of the Random Forest MErging Procedure (RF-MEP), which combines two or more satellite-based datasets (e.g., precipitation products, topography) with ground observations to produce a new dataset with improved spatio-temporal distribution of the target field. In particular, this package was developed to merge different Satellite-based Rainfall Estimates (SREs) with measurements from rain gauges, in order to obtain a new precipitation dataset where the time series in the rain gauges are used to correct different types of errors present in the SREs. However, this package might be used to merge other hydrological/environmental gridded datasets with point observations. For details, see Baez-Villanueva et al. (2020) <doi:10.1016/j.rse.2019.111606>. Bugs / comments / questions / collaboration of any kind are very welcomed.
License: GPL (>=3)
Depends: R (>= 3.5.0)
Imports: terra, randomForest, zoo, parallel, methods, stats, utils, pbapply
Expand All @@ -17,4 +17,4 @@ BugReports: https://github.com/hzambran/RFmerge/issues
LazyLoad: yes
NeedsCompilation: no
Repository: CRAN
Packaged: Wed 19 Jul 2023 07:21:40 PM -04; hzambran
Packaged: Sun Jul 23 09:25:40 -04 2023; hzambran
6 changes: 1 addition & 5 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@
#importFrom("raster", brick, stack, mask, extract, nlayers, raster, distance, rasterize, writeRaster, compareRaster, compareCRS, extent, crs)
importFrom("terra", rast+º, mask, extract, nlayers, raster, distance, rasterize, writeRaster, compareRaster, compareCRS, extent, crs)
#importFrom("velox", velox)
#importFrom("sp", "coordinates<-", areaSpatialGrid)
#importFrom("sf", st_is, st_crs, st_as_sf, st_combine)
importFrom("terra", mask, compareGeom, crs, same.crs, ext, nlyr, rast, vect, distance, extract, predict, writeRaster)
importFrom("zoo", is.zoo, index, write.zoo)
importFrom("utils", write.table, installed.packages, ls.str)
importFrom("parallel", makeCluster, stopCluster, clusterApply)
Expand Down
43 changes: 28 additions & 15 deletions R/RFmerge.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
# Updates: 16-Nov-2019 ; 10-Dec-2019 ; 11-Dec-2019 ; 12-Dec-2019 ; 13-Dec-2019 #
# 14-Dec-2019 ; 17-Dec-2019 ; 20-Dec-2019 ; 23-Dec-2019 #
# 30-Jan-2020 ; 27-Apr-2020 ; 12-May-2020 #
# 10-Jun-2023 #
# 10-Jun-2023 ; 20-Jul-2023 ; 23-Jul-2023 # #
################################################################################

# 'x' : zoo object with ground-based values that will be used as the dependent variable to train the RF model.
Expand Down Expand Up @@ -142,7 +142,7 @@ RFmerge.zoo <- function(x, metadata, cov, mask, training,
# stop("Invalid argument: All the elements in 'cov' must have the same spatial extent, CRS, rotation and geometry !!")

cov1 <- cov[[1]]
n <- length(covariates.utm)
n <- length(cov)
if ( !(all(sapply(2:n, FUN=function(i) terra::compareGeom(cov1, cov[[i]]) ))) ) {
stop("Invalid argument: All the elements in 'cov' must have the same spatial extent, CRS, rotation and geometry !!")
} else cov.crs <- terra::crs(cov1)
Expand Down Expand Up @@ -173,6 +173,8 @@ RFmerge.zoo <- function(x, metadata, cov, mask, training,
}

cov[index] <- sapply(index, set.covariates, cov=cov, cov.layers=cov.layers)

lsample <- cov[[1]][[1]]

# Dividing the Ground-based observations in 'training' and 'validation' samples
if (verbose) message("[ Creating the training (", round(training*100,2), "%) and evaluation (", round((1-training)*100,2), "%) datasets ... ]" )
Expand Down Expand Up @@ -207,24 +209,29 @@ RFmerge.zoo <- function(x, metadata, cov, mask, training,
points <- train.metadata
points <- terra::vect(points, geom=c("lon", "lat"))
npoints <- length(points)
#crs(points) <- crs(lsample)
terra::crs(points) <- terra::crs(lsample)


# If required, computing the euclidean distances
if (verbose) message("[ Computing the Euclidean distances to each observation of the training set ...]")

# lsample <- cov[[1]][[1]]
# if (ED) {
# buff.dist <- as(lsample, "SpatialPixelsDataFrame")
# buff.dist <- .buffer_dist(points, buff.dist, as.factor(1:nrow(data.frame(points))))
# buff.dist <- as(buff.dist, "SpatRast")
# } # IF end

buff.dist <- vector("list", npoints)
for(i in 1:npoints)
buff.dist[[i]] <- terra::distance(lsample, points[i], rasterize=FALSE)
# Computation of Eucliden distances, if required by the user
if (ED) {
terra::rast( replicate( max(cov.layers), cov[[index]] ) )
buff.dist <- vector("list", npoints)
for(i in 1:npoints)
buff.dist[[i]] <- terra::distance(lsample, points[i], rasterize=FALSE)


# list -> SpatRaster
buff.dist <- terra::rast(buff.dist)
names(buff.dist ) <- paste0("dist", 1:npoints)
} # IF end
########################################################################
## parallel: start (ini) #
########################################################################
Expand Down Expand Up @@ -295,12 +302,12 @@ RFmerge.zoo <- function(x, metadata, cov, mask, training,
if (use.pb) {
out <- pbapply::pbsapply(X=1:ntsteps, FUN=.lrf, ldates, metadata, id, points, cov, mask, merged.drty, na.action, ntree, ED, train.ts, train.metadata, buff.dist, lsample, write2disk)
} else out <- sapply(X=1:ntsteps, FUN=.lrf, ldates, metadata, id, points, cov, mask, merged.drty, na.action, ntree, ED, train.ts, train.metadata, buff.dist, lsample, write2disk)
out <- stack(out)
out <- terra::rast(out)
} else {
if (use.pb) {
out <- pbapply::pbsapply(X=1:ntsteps, FUN=.lrf, ldates, metadata, id, points, cov, mask, merged.drty, na.action, ntree, ED, train.ts, train.metadata, buff.dist, lsample, write2disk, cl=cl, ...)
} else out <- parallel::clusterApply(cl= cl, 1:ntsteps, fun=.lrf, ldates, metadata, id, points, cov, mask, merged.drty, na.action, ntree, ED, train.ts, train.metadata, buff.dist, lsample, write2disk, ...)
out <- stack(out)
out <- terra::rast(out)
parallel::stopCluster(cl)
if (verbose) message("[ Parallelisation finished ! ]")
} # ELSE end
Expand Down Expand Up @@ -334,7 +341,7 @@ RFmerge.zoo <- function(x, metadata, cov, mask, training,


.lrf <- function(day, ldates, metadata, id, points, cov, mask, merged.drty, na.action, ntree, ED, train.ts, train.metadata, buff.dist, lsample, write2disk, ...) {

# Obtaining the ground-based measurements for each day
nr <- nrow(train.metadata)
codes <- vector("numeric", length= nr)
Expand All @@ -355,11 +362,11 @@ RFmerge.zoo <- function(x, metadata, cov, mask, training,

cov.day <- terra::rast(cov.day)

if (ED) cov.day <- terra::rast(cov.day, buff.dist)
#cov.day <- velox::velox(cov.day)
# Adding the Euclidean distances, if required by the user
if (ED) cov.day <- c(cov.day, buff.dist)

# Extraction of covariates at the ground-station locations
extraction <- data.frame(terra::extract(cov.day, points))
extraction <- data.frame(terra::extract(cov.day, points, ID=FALSE))
#extraction <- data.frame(cov.day$extract(points))
names(cov.day) <- names(extraction)

Expand All @@ -374,7 +381,13 @@ RFmerge.zoo <- function(x, metadata, cov, mask, training,
if ( sum(table$obs.values, na.rm = TRUE) > 0) {

# Training of the RF model and generation of the spatial prediction
rf.model <- randomForest::randomForest(obs.values ~ ., data = table.cov, na.action = na.action, ntree = ntree, ... )
suppressWarnings(
rf.model <- randomForest::randomForest(obs.values ~ ., data = table.cov, na.action = na.action, ntree = ntree, ... )
)
#suppressWarnings is used to avoid the following message in some very local scale cases:
#In randomForest.default(m, y, ...) :
# The response has five or fewer unique values.
# Are you sure you want to do regression?

# rf.model <- randomForest::randomForest(obs.values ~ ., data = table.cov, ntree = 2000, na.action = na.omit)
result <- terra::predict(cov.day, rf.model)
Expand Down
15 changes: 8 additions & 7 deletions man/RFmerge-package.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -40,24 +40,25 @@ Maintainer: Mauricio Zambrano-Bigiarini <mzb.devel@gmail>
\keyword{ package }
\seealso{
\url{https://cran.r-project.org/package=raster}. \cr
\url{https://cran.r-project.org/package=terra}. \cr
\url{https://cran.r-project.org/package=hydroGOF}.
}
\examples{
library(rgdal)
library(raster)
library(terra)
data(ValparaisoPPts)
data(ValparaisoPPgis)
data(ValparaisoSHP)
ValparaisoSHP.fname <- system.file("extdata/ValparaisoSHP.shp",package="RFmerge")
ValparaisoSHP <- terra::vect(ValparaisoSHP.fname)
chirps.fname <- system.file("extdata/CHIRPS5km.tif",package="RFmerge")
prsnncdr.fname <- system.file("extdata/PERSIANNcdr5km.tif",package="RFmerge")
dem.fname <- system.file("extdata/ValparaisoDEM5km.tif",package="RFmerge")
CHIRPS5km <- brick(chirps.fname)
PERSIANNcdr5km <- brick(prsnncdr.fname)
ValparaisoDEM5km <- raster(dem.fname)
CHIRPS5km <- rast(chirps.fname)
PERSIANNcdr5km <- rast(prsnncdr.fname)
ValparaisoDEM5km <- rast(dem.fname)
covariates <- list(chirps=CHIRPS5km, persianncdr=PERSIANNcdr5km,
dem=ValparaisoDEM5km)
Expand Down
33 changes: 17 additions & 16 deletions man/RFmerge.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -26,37 +26,37 @@ RFmerge(x, ...)
seed = NULL, ntree = 2000, na.action = stats::na.omit,
parallel=c("none", "parallel", "parallelWin"),
par.nnodes=parallel::detectCores()-1,
par.pkgs= c("raster", "randomForest", "zoo"), write2disk=FALSE,
par.pkgs= c("terra", "randomForest", "zoo"), write2disk=FALSE,
drty.out, use.pb=TRUE, verbose=TRUE,...)

\method{RFmerge}{zoo}(x, metadata, cov, mask, training,
id="id", lat = "lat", lon = "lon", ED = TRUE,
seed = NULL, ntree = 2000, na.action = stats::na.omit,
parallel=c("none", "parallel", "parallelWin"),
par.nnodes=parallel::detectCores()-1,
par.pkgs= c("raster", "randomForest", "zoo"), write2disk=FALSE,
par.pkgs= c("terra", "randomForest", "zoo"), write2disk=FALSE,
drty.out, use.pb=TRUE, verbose=TRUE, ...)


}
%- maybe also 'usage' for other objects documented here.
\arguments{
\item{x}{
data.frame witht the ground-based values that will be used as the dependent variable to train the RF model. \cr
data.frame with the ground-based values that will be used as the dependent variable to train the RF model. \cr
Every column must represent one ground-based station and the codes of the stations must be provided as colnames. class(data) must be zoo.
}
\item{metadata}{
data.frame with the metadata of the ground-based stations. At least, it MUST have the following 3 columns: \cr
-) \kbd{id}: This column stores the unique identifier (ID) of each ground-based observation. Default value is \kbd{"id"}. \cr
-) \kbd{lat}: This column stores the latitude of ech ground observation. Default value is \kbd{"lat"}. \cr
-) \kbd{lon}: This column stores the longitude of ech ground observation. Default value is \kbd{"lon"}.
-) \kbd{lat}: This column stores the latitude of each ground observation. Default value is \kbd{"lat"}. \cr
-) \kbd{lon}: This column stores the longitude of each ground observation. Default value is \kbd{"lon"}.
}
\item{cov}{
List with all the covariates used as independent variables in the Random Forest model. The individual covariates can be a \kbd{RasterStack} or \kbd{RasterBrick} object when they vary in time, or they can be a single \kbd{RasterLayer} object when they do not change in time (e.g., a digital elevation model). \cr
All time-varying covariates in \code{cov} MUST have the same number of layers (bands). Covariates that do not change in time (e.g., a DEM) are internally trasnformed into \kbd{RasterStack} or \kbd{RasterBrick} objects with the same number of layers as the other time-varying elements in \code{cov}
All time-varying covariates in \code{cov} MUST have the same number of layers (bands). Covariates that do not change in time (e.g., a DEM) are internally transformed into \kbd{RasterStack} or \kbd{RasterBrick} objects with the same number of layers as the other time-varying elements in \code{cov}
}
\item{mask}{
OPTIONAL. If provided, the final merged product maks out all values in \code{cov} outside \code{mask}. \cr
OPTIONAL. If provided, the final merged product mask out all values in \code{cov} outside \code{mask}. \cr
Spatial object (vectorial) with the spatial borders of the study area (e.g., catchment, administrative borders). \code{class(mask)} must be a \kbd{sf} object with "POLYGON" or "MULTIPOLYGON" geometry.
}
\item{training}{
Expand All @@ -73,7 +73,7 @@ Character, with the name of the column in \code{metadata} where the latitude of
Character, with the name of the column in \code{metadata} where the longitude of the stations is stored.
}
\item{ED}{
logical, should the Euclidean distances be computed an used as covariates in the random fores model?. The default value is \code{TRUE}.
logical, should the Euclidean distances be computed an used as covariates in the random forest model?. The default value is \code{TRUE}.
}
\item{seed}{
Numeric, indicating a single value, interpreted as an integer, or null.
Expand All @@ -100,7 +100,7 @@ Numeric indicating the maximum number trees to grow in the Random Forest algorit
A function to specify the action to be taken if NAs are found. (NOTE: If given, this argument must be named.)
}
\item{write2disk}{
logical, indicates if the output merged raster layers and the training and avaluation datasets (two files each, one with time series and other with metadata) will be written to the disk. By default \code{write2disk=FALSE}
logical, indicates if the output merged raster layers and the training and evaluation datasets (two files each, one with time series and other with metadata) will be written to the disk. By default \code{write2disk=FALSE}
}
\item{drty.out}{
Character with the full path to the directory where the final merged product will be exported as well as the training and evaluation datasets. Only used when \code{write2disk=TRUE}
Expand Down Expand Up @@ -139,23 +139,24 @@ Juan D. Giraldo-Osorio, \email{[email protected]}
%% ~Make other sections like Warning with \section{Warning }{....} ~
\seealso{
\code{\link[raster]{raster}}, \code{\link[raster]{stack}}, \code{\link[raster]{brick}}, \code{\link[raster]{resample}}, \code{\link[raster]{rotate}}, \code{\link[raster]{crop}}.
\code{\link[terra]{terra}}, \code{\link[terra]{rast}}, \code{\link[terra]{resample}}, \code{\link[terra]{rotate}}, \code{\link[raster]{crop}}.
}
\examples{
library(rgdal)
library(raster)
library(terra)
data(ValparaisoPPts)
data(ValparaisoPPgis)
data(ValparaisoSHP)
ValparaisoSHP.fname <- system.file("extdata/ValparaisoSHP.shp",package="RFmerge")
ValparaisoSHP <- terra::vect(ValparaisoSHP.fname)
chirps.fname <- system.file("extdata/CHIRPS5km.tif",package="RFmerge")
prsnncdr.fname <- system.file("extdata/PERSIANNcdr5km.tif",package="RFmerge")
dem.fname <- system.file("extdata/ValparaisoDEM5km.tif",package="RFmerge")
CHIRPS5km <- brick(chirps.fname)
PERSIANNcdr5km <- brick(prsnncdr.fname)
ValparaisoDEM5km <- raster(dem.fname)
CHIRPS5km <- rast(chirps.fname)
PERSIANNcdr5km <- rast(prsnncdr.fname)
ValparaisoDEM5km <- rast(dem.fname)
covariates <- list(chirps=CHIRPS5km, persianncdr=PERSIANNcdr5km,
dem=ValparaisoDEM5km)
Expand Down
8 changes: 4 additions & 4 deletions man/ValparaisoSHP.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ The fields stored in the \code{@data} slot of this object are: \cr

-) \var{NOM_REG}: Name of the administrative region \cr
-) \var{NOM_PROV}: Name of the administrative province \cr
-) \var{COD_COM}: ID of the administrative comune \cr
-) \var{NOM_COM}: Name of the administrative comune \cr
-) \var{COD_COM}: ID of the administrative commune \cr
-) \var{NOM_COM}: Name of the administrative commune \cr
-) \var{COD_REGI}: Numeric code of the administrative region \cr
-) \var{SUPERFICIE}: Spatial area within the administrative borders of the region, [km2] \cr
-) \var{POBLAC02}: Probably, it corresponds to the number of inhabitants of the region according to the 2002 census. \cr
Expand All @@ -39,12 +39,12 @@ The fields stored in the \code{@data} slot of this object are: \cr

}
\source{
Originally downloaded from the Biblioteca del Congreso Nacional de Chile and then reprojected into geographic coordiantes (EPSG:4326). Last accessed [March 2016]). \cr
Originally downloaded from the Biblioteca del Congreso Nacional de Chile and then reprojected into geographic coordinates (EPSG:4326). Last accessed [March 2016]). \cr
These data are intended to be used for research purposes only, being distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY.
}

\note{
The orginal file is not longer available at he Biblioteca del Congreso Nacional de Chile
The original file is not longer available at he Biblioteca del Congreso Nacional de Chile
}

%%\references{
Expand Down

0 comments on commit 7f2b5f9

Please sign in to comment.