Skip to content

Repository for R Markdown Notebook containing codes for the paper "vector space models and the usage patterns of Indonesian denominal verbs" (published in NUSA)

Notifications You must be signed in to change notification settings

gederajeg/vector_space_model_indonesian

Repository files navigation

Vector space models and the usage patterns of Indonesian denominal verbs

Gede Primahadi Wijaya Rajeg ORCID iD icon, Karlina Denistia ORCID iD icon, Simon Musgrave ORCID iD icon

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

DOI

How to cite this repository

Please cite this repository as follows (in Unified Style Sheet for Linguistics):

Rajeg, Gede Primahadi Wijaya, Karlina Denistia & Simon Musgrave. 2019. R Markdown Notebook for Vector space model and the usage patterns of Indonesian denominal verbs. figshare. https://doi.org10.6084/m9.figshare.9970205. https://figshare.com/articles/R_Markdown_Notebook_for_i_Vector_space_model_and_the_usage_patterns_of_Indonesian_denominal_verbs_i_/9970205.

Preface

This is a repository containing the source R Markdown Notebook (i.e. nusa_r_notebook.Rmd) (Rajeg, Denistia & Musgrave 2019a) for the quantitative analyses accompanying our paper (Rajeg, Denistia & Musgrave 2019b) on vector space models and Indonesian denominal verbs (published open-access in NUSA’s special issue titled Linguistic studies using large annotated corpora, edited by Hiroki Nomoto and David Moeljadi) (Nomoto & Moeljadi 2019). The R Notebook, which is deployed as a GitHub webpage, however, does not provide detailed exposition and discussion for each points. Also, there can be differences in some of the text-narratives in the Notebook compared to the published manuscript after revision. Readers are referred to our published paper for details. Our computational analyses in the R Notebook used the following R packages, which have to be installed in R to run all codes in the Notebook:

The analyses in the paper were conducted using R version 3.6.0 (2019-04-26) and RStudio version 1.2.1335 for macOS.

How to download/clone the repository and the data

  1. Go to the GitHub repo(sitory): https://github.com/gederajeg/vector_space_model_indonesian.

  2. Then, find and click the green button saying "Clone or download" and then the "Download ZIP" option (see the picture below).

    Downloading the repository from GitHub

  3. The second step above will download the repo as a folder, by default called vector_space_model_indonesian-master. We suggest keep this folder’s name. The folder consists of, among others, README files, .bib file, and the R Notebook containing the R codes for producing the analyses in the paper (incl. figures and tables).

  4. Download the dataset (Rajeg, Denistia & Musgrave 2019c) from figshare (we store them on figshare due to their large size for version control, especially for the vector space model). Please read the information page before clicking the white button saying "Download all" (next to the dark pink "Cite" button) to download all the data. Please cite the data as:

Rajeg, Gede Primahadi Wijaya, Karlina Denistia & Simon Musgrave. 2019. Dataset for Vector space model and the usage patterns of Indonesian denominal verbs. figshare. https://doi.org10.6084/m9.figshare.8187155. https://figshare.com/articles/Dataset_for_i_Vector_space_model_and_the_usage_patterns_of_Indonesian_denominal_verbs_i_/8187155.

  1. Please rename the downloaded data folder into data and move this data folder inside the vector_space_model_indonesian-master folder so that the structure of the directory has to look like below:

    Project directory

How to run the codes in the R Notebook

  1. Make sure all the required R packages mentioned above are installed in R and you have the latest version of RStudio (download from here).

  2. Next, go to the vector_space_model_indonesian-master folder and double-click the MeNasal.Rproj file. It will open up an RStudio session associated with data and codes in this project.

  3. Then, open the R Notebook file called nusa_r_notebook.Rmd by going to File > Open File ... (or use +O on macOS or Ctrl+O on Windows), then select the given .Rmd file.

  4. The codes can be run/executed all at once (i) using keyboard shortcut ++R on macOS (i.e., Option+Cmd+R) or Alt+Ctrl+R on Windows, (ii) or by navigating to the drop-down Run button and select Run All as shown below.

    Running the notebook

    After running all the codes, reader may preview the notebook in HTML format by clicking on the Preview button or by using keyboard shortcut ++K (i.e., Cmd/Ctrl+Shift+K).

  5. Alternative to the run-all option in (4) above, reader may wish to run the code chunk-by-chunk. The code-chunk is indicated by grey-shaded area in the Notebook (see the picture below).

    Running the code chunk

    Place the cursor in each chunk and then use keyboard shortcut ++Enter (i.e., Cmd/Ctrl+Shift+Enter) to run the codes in the given chunk. Another way is to click the green arrow button (see the picture above).

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.3 (2020-02-29)
#>  os       macOS Catalina 10.15.3      
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Asia/Makassar               
#>  date     2020-04-02                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.0)
#>  backports     1.1.5   2019-10-02 [1] CRAN (R 3.6.0)
#>  callr         3.2.0   2019-03-15 [1] CRAN (R 3.6.0)
#>  cli           2.0.2   2020-02-28 [1] CRAN (R 3.6.0)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.0)
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 3.6.0)
#>  devtools      2.2.1   2019-09-24 [1] CRAN (R 3.6.0)
#>  digest        0.6.25  2020-02-23 [1] CRAN (R 3.6.0)
#>  ellipsis      0.3.0   2019-09-20 [1] CRAN (R 3.6.0)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.0)
#>  fansi         0.4.1   2020-01-08 [1] CRAN (R 3.6.0)
#>  fs            1.3.1   2019-05-06 [1] CRAN (R 3.6.0)
#>  glue          1.3.2   2020-03-12 [1] CRAN (R 3.6.0)
#>  htmltools     0.3.6   2017-04-28 [1] CRAN (R 3.6.0)
#>  knitr         1.28    2020-02-06 [1] CRAN (R 3.6.0)
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.0)
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 3.6.0)
#>  pkgbuild      1.0.3   2019-03-20 [1] CRAN (R 3.6.0)
#>  pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.6.0)
#>  prettyunits   1.0.2   2015-07-13 [1] CRAN (R 3.6.0)
#>  processx      3.3.1   2019-05-08 [1] CRAN (R 3.6.0)
#>  ps            1.3.0   2018-12-21 [1] CRAN (R 3.6.0)
#>  R6            2.4.1   2019-11-12 [1] CRAN (R 3.6.0)
#>  Rcpp          1.0.4   2020-03-17 [1] CRAN (R 3.6.0)
#>  remotes       2.1.0   2019-06-24 [1] CRAN (R 3.6.0)
#>  rlang         0.4.5   2020-03-01 [1] CRAN (R 3.6.0)
#>  rmarkdown     2.1     2020-01-20 [1] CRAN (R 3.6.0)
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.0)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.0)
#>  stringi       1.4.6   2020-02-17 [1] CRAN (R 3.6.0)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 3.6.0)
#>  testthat      2.3.1   2019-12-01 [1] CRAN (R 3.6.0)
#>  usethis       1.5.1   2019-07-04 [1] CRAN (R 3.6.0)
#>  withr         2.1.2   2018-03-15 [1] CRAN (R 3.6.0)
#>  xfun          0.12    2020-01-13 [1] CRAN (R 3.6.0)
#>  yaml          2.2.0   2018-07-25 [1] CRAN (R 3.6.0)
#> 
#> [1] /Users/Primahadi/Rlibs
#> [2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

References

Galili, Tal. 2015. Dendextend: An R package for visualizing, adjusting, and comparing trees of hierarchical clustering. Bioinformatics. doi:10.1093/bioinformatics/btv428.

Henry, Lionel & Hadley Wickham. 2019. Purrr: Functional programming tools. https://CRAN.R-project.org/package=purrr.

Levshina, Natalia. 2015. How to do Linguistics with R: Data exploration and statistical analysis. John Benjamins Publishing Company.

Maechler, Martin, Peter Rousseeuw, Anja Struyf, Mia Hubert & Kurt Hornik. 2018. Cluster: Cluster Analysis Basics and Extensions.

Müller, Kirill & Hadley Wickham. 2019. Tibble: Simple data frames. https://CRAN.R-project.org/package=tibble.

Nomoto, Hiroki & David Moeljadi. 2019. Linguistic studies using large annotated corpora: Introduction. (Ed.) Hiroki Nomoto & David Moeljadi. NUSA 67. (Linguistic Studies Using Large Annotated Corpora). 1–6. http://repository.tufs.ac.jp/handle/10108/94450 (1 April, 2020).

Rajeg, Gede Primahadi Wijaya, Karlina Denistia & Simon Musgrave. 2019a. R markdown notebook for vector space model and the usage patterns of indonesian denominal verbs. figshare. doi:10.6084/m9.figshare.9970205. https://figshare.com/articles/R_Markdown_Notebook_for_i_Vector_space_model_and_the_usage_patterns_of_Indonesian_denominal_verbs_i_/9970205.

Rajeg, Gede Primahadi Wijaya, Karlina Denistia & Simon Musgrave. 2019b. Vector space models and the usage patterns of indonesian denominal verbs: A case study of verbs with meN-, meN-/-kan, and meN-/-i affixes. (Ed.) Hiroki Nomoto & David Moeljadi. NUSA 67. (Linguistic Studies Using Large Annotated Corpora). 35–76. http://repository.tufs.ac.jp/handle/10108/94452 (1 April, 2020).

Rajeg, Gede Primahadi Wijaya, Karlina Denistia & Simon Musgrave. 2019c. Dataset for vector space model and the usage patterns of indonesian denominal verbs. figshare. doi:10.6084/m9.figshare.8187155. https://figshare.com/articles/Dataset_for_i_Vector_space_model_and_the_usage_patterns_of_Indonesian_denominal_verbs_i_/8187155.

Schmidt, Ben & Jian Li. 2017. wordVectors: Tools for creating and analyzing vector-space models of texts. http://github.com/bmschmidt/wordVectors.

Wickham, Hadley. 2016. Ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. http://ggplot2.org.

Wickham, Hadley. 2018. Stringr: Simple, consistent wrappers for common string operations. https://CRAN.R-project.org/package=stringr.

Wickham, Hadley, Romain François, Lionel Henry & Kirill Müller. 2018. Dplyr: A grammar of data manipulation. https://CRAN.R-project.org/package=dplyr.

Wickham, Hadley & Garrett Grolemund. 2017. R for Data Science. Canada: O’Reilly. http://r4ds.had.co.nz/ (7 March, 2017).

Wickham, Hadley & Lionel Henry. 2018. Tidyr: Easily tidy data with ’spread()’ and ’gather()’ functions. https://CRAN.R-project.org/package=tidyr.

Wickham, Hadley, Jim Hester & Romain Francois. 2018. Readr: Read rectangular text data. https://CRAN.R-project.org/package=readr.