-
Notifications
You must be signed in to change notification settings - Fork 18
/
Copy path108_rpackages.Rmd
217 lines (146 loc) · 9.98 KB
/
108_rpackages.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
# (PART) R Packages {-}
## R Packages
We are going to create a minimal package with RStudio. The package structure will be uploaded to Github in order to track all changes made to the package. Other users will be able to install your package from Github.
### Create a repository in your Github account
The repository will contain your package. Example: hellopackage, basicpackage2, etc. Tick the cell "Add README.md": This will create the `main` branch in Github. Accordingly, we will use the **main** branch in our project for the first commits (and not the **master** branch)
### Create your Package-Project in Rstudio, allowing Git to track changes
git remote add origin <https://github.com/yourrepo......./yourpackage----.git>
- use **main** (and not master) in order to conform to the github standard
git pull origin main
git push -u origin main
Now, we should see our changes in the github repository.
### Continue creating the package with RStudio
Switch to the Build tab - Check. Usually we get the warnings about the License, the documentation and the NAMESPACE file.
- Go to Tools \> Project Options \> Build Tools --\> Tick the cell "Generate documentation with Roxygen" and do not change the defaults.
- in the Build panel More \> Clean and Rebuild
- Check
- delete the NAMESPACE file because it is automatically created when building the package.
- go to the DESCRIPTION file and change the License to GPL-3 (or whatever). Save the file.
- Documentation warnings. We may include comments and other information for our functions using "#' " in the .R files. We should include the @export for exporting the functions and the @param for describing the parameters
Example: Code your own function or use the file `funhist2df.R` that is in the folder toAddLater
- go to More \> Document (or hit Ctrl+Shift+D)
- More \> Clean and Rebuild
- Check
- Most probably, there is an error in the file hello.R
We must document and export that file, too:
include the following lines that begin with the characters "#' "
"#'
#' @export
#' "
- We should see 0 errors \| 0 warnings \| 0 notes
- Save all changes done.
### Commit and push all your changes to Github
In order to install your new package and to see your changes, close the Project without saving the data and restart RStudio with a clean environment.
#### To install the new package from github
Load the library devtools - library(devtools)\
- devtools::install_github("yourusername/the_name_of_the_repo_containing_your_package")
### Adding a vignette with data analysis
A **vignette** is a standard form of writing long and detailed documentation for a package. That includes any type of report.
Type in the console
(change names as you wish)
usethis::use_vignette(name = “vignette1”, title = “My analysis of the data”)
You will see that a file vignette1.Rmd has been created. You can place any set of R chunks there.
For each package that you need to use in the vignette you need to declare the package in the description. It can be done automatically with
usethis::use_package("...whatever..package...")
The content of the "vignette1.Rmd" will usually contain several chunks of R code:
```{r setup}
library(thepackagethatyouarecreating)
```
```{r}
# load the data that you have created
data(thedatasetthatyouhavecreatedinthispackage)
# other data manipulation as examples
```
```{r}
# do whatever with the functions and datasets that you have created
summary(thedatasetthatyou....)
```
You may now Install and Restart. It will create a package that can be shared.
Those files must be tracked on Github.
If you do not want to include vignettes you may include any number of R Markdown documents. See next paragraph.
### Adding RMarkdown documents
We can add any number R Markdown files to our package. Usually we will put them in a new **rmd/** folder in the **inst/** folder. This folder must be tracked on Github.
Now you can use your functions by typing yourpackage::yourfunction1()
## A good use of a package: to export and to share data
We may add a dataset to a package so that it can be used when the package is installed. Or we can create a package that contains only data to be shared. An example of the second use is the well-known R package `gapminder`. Another recent example of a data package is the `hagr` (remotes::install_github("datawookie/hagr")).
We focus now on the second aspect and we will create a package that contains only data (but we may add also some reports in the form of documentation).
- RStudio: create a New project \> New directory \> newname . For the sake of example we create the new project with name "datapackaplusb" (we intend to combine two simple datasets into one single file)
- Create a repository in Github with the same name as the project, for the sake of clarity. Add some comments to the README.md. You will You will see the "main" branch created for the repo. -- -- Button Code: copy the <https://github>...
- Go to RStudio \> More \> Configure Build tools \> Git/SVN \> Select Git and Say yes to create a git repository and restart RStudio. Now you have your local repository created (most probably in the "master" branch)
- Open the Shell Git -\> More -\> Shell and paste the text that you copied from Github in the command
`git remote add origin <https://github.com-----------.git>`
- Type `git pull origin main`. With this first pull your local directory contains all changes done in "main" in Github. You should see now the "main" branch in RStudio.
- Important: Go to the Git tab and switch from the "master" branch to the "main" branch so that both local and Github are now in the same "main" branch.
- In the Git tab select the files and directories that you want to commit and push to Github (First commit and then push or "git push -u origin main"). Now, you should see your changes in the github repository.
### Create folder for the original files to be processed
1. Usually external files are placed in the dir ins/extdata. We may place the data there. Create those folders.
We copy and paste the files "albrecht.csv" and bailey.csv (available in datasets/efforEstimation) to ins/extdata
2. Perform a first check of the package (click the button or devtools::check)
-- Warning about the license --> rewrite to, for instance, GPL-3
3. Usually, the external files are not uploaded to Github, specially if their size is too big.
Additionally, when building the package We may ignore files located in some directories by adding those files to .Rbuildignore
^data-raw$
^ins/extdata$
4. Clean and rebuild the package. This will install the package we are creating in our environment, so that everything is available for use.
5. For the sake of example we copy and paste the files albretch.csv and bailey.csv in the folder ins/extdata
6. We can retrieve the actual path to those files extdata files with
system.file( "extdata", "albretch.csv", package = "datapackaplusb")
or using the read.csv or read.table
7. Processing those external files into a data frame that is usable
We will create a script in a new data_raw folder with `usethis::use_data_raw(name = "dfaandb")` (Give it the name that you like)
8. Do whatever you wish with the data. In this case we simple create a new file.
Copy and paste the content of the file dfaandb.R available in the folder "toAddLater"
The final command is
# save the dfaandb dataframe as an .rda file in datapackaplusb/data/
usethis::use_data(dfaandb, overwrite = TRUE)
creates the data/ folder with the data frame stored as .rda
9. Clean and Rebuild the package
10. The data can be accessed in the environment with
data("dfaandb", package = "datapackaplusb")
11. Document the data
Go to Build > More > Configure Build Tools
and check the tick in the "Document with ROxygen"
12. Create the empty file data.R in the R/ folder
More > Document or devtools::document()
13. Important. Delete the NAMESPACE file and repeat devtools::document() (NAMESPACE is overwritten)
14. Add the the following content (change as appropriate) to the file data.R in the R folder
#' Data of effort and size for several projects
#'
#' No missing values
#' A dataset containing ----- whatever you put here.
#'
#' @title DATASET OF ALBRETCH AND BAILEY
#' @format A data frame with 42 rows and three variables:
#' \describe{
#' \item{effort}{Effort measured in -------.}
#' \item{size}{Size measured in ....}
#' \item{source}{A or B indicating one source or another.}
#' }
#' @source \url{https://....domain.com... }
"dfaandb"
15. Roxygen transforms the code above into a dfaandb.Rd file and adds it to the man/ folder. We can view this documentation in the help pane by typing ?dfaandb in the R console.
16. Check. (you may delete the file hello.R or add the following lines if you want to have that function)
#' The most used program :-) Greeting.
#' @description Hello to the world
#' @param No parameters
#' @export
#'
#' @examples
#' hello()
#'
17. Commit and push all your changes to Github. You may ignore /data-raw and ins/extdata
18. close the project. Restart and install the package from github. devtools::install_github("yourrepo/datapackaplusb")
library(datapackaplusb)
19. Type data("dfaandb", package="datapackaplusb")
Additional steps
-- Vignette. If you wish you can create a vignette in an .Rmd with a report obtained from the data do this
usethis::use_vignette(name = "effort_eda", title = "Basic EDA of the Effort data")
The directory vignettes is created and you may complete the .Rmd
You may have to install the package qpdf to avoid the Warning about the size of the documents
(sudo apt-get install -y qpdf in Ubuntu)
-- Working with R Markdown. We may add RMarkdown to our package. We create a sub folder rmd/ in the inst/ folder.
## References
- [Version Control with Git and SVN]<https://support.rstudio.com/hc/en-us/articles/200532077?version=1.4.1103&mode=desktop>
- <https://r-pkgs.org/intro.html>
- <https://www.zdnet.com/article/github-to-replace-master-with-main-starting-next-month/>
- <https://rtask.thinkr.fr/fusen-create-a-package-from-a-single-rmarkdown-file/>