Gede Primahadi Wijaya Rajeg , Karlina Denistia , Simon Musgrave
This
work is licensed under a
Creative
Commons Attribution-NonCommercial-ShareAlike 4.0 International
License.
Please cite this repository as follows (in Unified Style Sheet for Linguistics):
Rajeg, Gede Primahadi Wijaya, Karlina Denistia & Simon Musgrave. 2019. R Markdown Notebook for Vector space model and the usage patterns of Indonesian denominal verbs. figshare. https://doi.org10.6084/m9.figshare.9970205. https://figshare.com/articles/R_Markdown_Notebook_for_i_Vector_space_model_and_the_usage_patterns_of_Indonesian_denominal_verbs_i_/9970205.
This is a repository containing the source R
Markdown
Notebook (i.e.
nusa_r_notebook.Rmd
) (Rajeg, Denistia & Musgrave
2019a) for the quantitative
analyses accompanying our
paper (Rajeg,
Denistia & Musgrave
2019b) on vector
space models and Indonesian denominal verbs (published open-access in
NUSA’s special issue
titled Linguistic studies using large annotated
corpora, edited by
Hiroki Nomoto and David
Moeljadi) (Nomoto & Moeljadi
2019). The R Notebook, which is deployed
as a GitHub
webpage,
however, does not provide detailed exposition and discussion for each
points. Also, there can be differences in some of the text-narratives in
the Notebook compared to the published manuscript after revision.
Readers are referred to our published
paper for details.
Our computational analyses in the R Notebook used the following
R packages, which have to be installed in R
to run all codes in the
Notebook:
- cluster (version 2.0.7-1) (Maechler et al. 2018)
- tidyverse (version 1.2.1) (Wickham &
Grolemund 2017), the core of which includes:
- dplyr (version 0.7.8) (Wickham et al. 2018)
- ggplot2 (version 3.1.0) (Wickham 2016)
- purrr (version 0.3.0) (Henry & Wickham 2019)
- readr (version 1.3.1) (Wickham, Hester & Francois 2018)
- stringr (version 1.3.1) (Wickham 2018)
- tidyr (version 0.8.2) (Wickham & Henry 2018)
- tibble (version 2.0.1) (Müller & Wickham 2019)
- dendextend (version 1.8.0) (Galili 2015)
- wordVectors (version 2.0) (Schmidt & Li 2017)
- Rling (version 1.0) (Levshina 2015)
The analyses in the paper were conducted using R version 3.6.0 (2019-04-26) and RStudio version 1.2.1335 for macOS.
-
Go to the GitHub repo(sitory): https://github.com/gederajeg/vector_space_model_indonesian.
-
Then, find and click the green button saying
"Clone or download"
and then the"Download ZIP"
option (see the picture below). -
The second step above will download the repo as a folder, by default called
vector_space_model_indonesian-master
. We suggest keep this folder’s name. The folder consists of, among others, README files, .bib file, and the R Notebook containing the R codes for producing the analyses in the paper (incl. figures and tables). -
Download the dataset (Rajeg, Denistia & Musgrave 2019c) from figshare (we store them on figshare due to their large size for version control, especially for the vector space model). Please read the information page before clicking the white button saying
"Download all"
(next to the dark pink"Cite"
button) to download all the data. Please cite the data as:
Rajeg, Gede Primahadi Wijaya, Karlina Denistia & Simon Musgrave. 2019. Dataset for Vector space model and the usage patterns of Indonesian denominal verbs. figshare. https://doi.org10.6084/m9.figshare.8187155. https://figshare.com/articles/Dataset_for_i_Vector_space_model_and_the_usage_patterns_of_Indonesian_denominal_verbs_i_/8187155.
-
Please rename the downloaded data folder into
data
and move thisdata
folder inside thevector_space_model_indonesian-master
folder so that the structure of the directory has to look like below:
-
Make sure all the required R packages mentioned above are installed in R and you have the latest version of RStudio (download from here).
-
Next, go to the
vector_space_model_indonesian-master
folder and double-click theMeNasal.Rproj
file. It will open up an RStudio session associated with data and codes in this project. -
Then, open the R Notebook file called
nusa_r_notebook.Rmd
by going toFile
>Open File ...
(or use ⌘+O on macOS or Ctrl+O on Windows), then select the given.Rmd
file. -
The codes can be run/executed all at once (i) using keyboard shortcut ⌥+⌘+R on macOS (i.e., Option+Cmd+R) or Alt+Ctrl+R on Windows, (ii) or by navigating to the drop-down
Run
button and selectRun All
as shown below.After running all the codes, reader may preview the notebook in HTML format by clicking on the
Preview
button or by using keyboard shortcut ⌘+⇧+K (i.e., Cmd/Ctrl+Shift+K). -
Alternative to the run-all option in (4) above, reader may wish to run the code chunk-by-chunk. The code-chunk is indicated by grey-shaded area in the Notebook (see the picture below).
Place the cursor in each chunk and then use keyboard shortcut ⌘+⇧+Enter (i.e., Cmd/Ctrl+Shift+Enter) to run the codes in the given chunk. Another way is to click the green arrow button (see the picture above).
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 3.6.3 (2020-02-29)
#> os macOS Catalina 10.15.3
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Asia/Makassar
#> date 2020-04-02
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
#> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.0)
#> callr 3.2.0 2019-03-15 [1] CRAN (R 3.6.0)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
#> devtools 2.2.1 2019-09-24 [1] CRAN (R 3.6.0)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.0)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.0)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
#> glue 1.3.2 2020-03-12 [1] CRAN (R 3.6.0)
#> htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.6.0)
#> knitr 1.28 2020-02-06 [1] CRAN (R 3.6.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
#> pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.6.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
#> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0)
#> processx 3.3.1 2019-05-08 [1] CRAN (R 3.6.0)
#> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.0)
#> Rcpp 1.0.4 2020-03-17 [1] CRAN (R 3.6.0)
#> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
#> rlang 0.4.5 2020-03-01 [1] CRAN (R 3.6.0)
#> rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.0)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
#> testthat 2.3.1 2019-12-01 [1] CRAN (R 3.6.0)
#> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.0)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
#> xfun 0.12 2020-01-13 [1] CRAN (R 3.6.0)
#> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
#>
#> [1] /Users/Primahadi/Rlibs
#> [2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
Galili, Tal. 2015. Dendextend: An R package for visualizing, adjusting, and comparing trees of hierarchical clustering. Bioinformatics. doi:10.1093/bioinformatics/btv428.
Henry, Lionel & Hadley Wickham. 2019. Purrr: Functional programming tools. https://CRAN.R-project.org/package=purrr.
Levshina, Natalia. 2015. How to do Linguistics with R: Data exploration and statistical analysis. John Benjamins Publishing Company.
Maechler, Martin, Peter Rousseeuw, Anja Struyf, Mia Hubert & Kurt Hornik. 2018. Cluster: Cluster Analysis Basics and Extensions.
Müller, Kirill & Hadley Wickham. 2019. Tibble: Simple data frames. https://CRAN.R-project.org/package=tibble.
Nomoto, Hiroki & David Moeljadi. 2019. Linguistic studies using large annotated corpora: Introduction. (Ed.) Hiroki Nomoto & David Moeljadi. NUSA 67. (Linguistic Studies Using Large Annotated Corpora). 1–6. http://repository.tufs.ac.jp/handle/10108/94450 (1 April, 2020).
Rajeg, Gede Primahadi Wijaya, Karlina Denistia & Simon Musgrave. 2019a. R markdown notebook for vector space model and the usage patterns of indonesian denominal verbs. figshare. doi:10.6084/m9.figshare.9970205. https://figshare.com/articles/R_Markdown_Notebook_for_i_Vector_space_model_and_the_usage_patterns_of_Indonesian_denominal_verbs_i_/9970205.
Rajeg, Gede Primahadi Wijaya, Karlina Denistia & Simon Musgrave. 2019b. Vector space models and the usage patterns of indonesian denominal verbs: A case study of verbs with meN-, meN-/-kan, and meN-/-i affixes. (Ed.) Hiroki Nomoto & David Moeljadi. NUSA 67. (Linguistic Studies Using Large Annotated Corpora). 35–76. http://repository.tufs.ac.jp/handle/10108/94452 (1 April, 2020).
Rajeg, Gede Primahadi Wijaya, Karlina Denistia & Simon Musgrave. 2019c. Dataset for vector space model and the usage patterns of indonesian denominal verbs. figshare. doi:10.6084/m9.figshare.8187155. https://figshare.com/articles/Dataset_for_i_Vector_space_model_and_the_usage_patterns_of_Indonesian_denominal_verbs_i_/8187155.
Schmidt, Ben & Jian Li. 2017. wordVectors: Tools for creating and analyzing vector-space models of texts. http://github.com/bmschmidt/wordVectors.
Wickham, Hadley. 2016. Ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. http://ggplot2.org.
Wickham, Hadley. 2018. Stringr: Simple, consistent wrappers for common string operations. https://CRAN.R-project.org/package=stringr.
Wickham, Hadley, Romain François, Lionel Henry & Kirill Müller. 2018. Dplyr: A grammar of data manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley & Garrett Grolemund. 2017. R for Data Science. Canada: O’Reilly. http://r4ds.had.co.nz/ (7 March, 2017).
Wickham, Hadley & Lionel Henry. 2018. Tidyr: Easily tidy data with ’spread()’ and ’gather()’ functions. https://CRAN.R-project.org/package=tidyr.
Wickham, Hadley, Jim Hester & Romain Francois. 2018. Readr: Read rectangular text data. https://CRAN.R-project.org/package=readr.