-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
95 lines (58 loc) · 12.2 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
output: github_document
title: "Vector space models and the usage patterns of Indonesian denominal verbs"
author: '[Gede Primahadi Wijaya Rajeg](https://figshare.com/authors/Gede_Primahadi_Wijaya_Rajeg/1234749) <a itemprop="sameAs" content="https://orcid.org/0000-0002-2047-8621" href="https://orcid.org/0000-0002-2047-8621" target="orcid.widget" rel="noopener noreferrer" style="vertical-align:top;"><img src="https://orcid.org/sites/default/files/images/orcid_16x16.png" style="width:1em;margin-right:.5em;" alt="ORCID iD icon"></a>, [Karlina Denistia](http://uni-tuebingen.academia.edu/karlinadenistia) <a itemprop="sameAs" content="http://orcid.org/0000-0002-1060-3548" href="http://orcid.org/0000-0002-1060-3548" target="orcid.widget" rel="noopener noreferrer" style="vertical-align:top;"><img src="https://orcid.org/sites/default/files/images/orcid_16x16.png" style="width:1em;margin-right:.5em;" alt="ORCID iD icon"></a>, [Simon Musgrave](http://profiles.arts.monash.edu.au/simon-musgrave/) <a itemprop="sameAs" content="https://orcid.org/0000-0003-3237-9943" href="https://orcid.org/0000-0003-3237-9943" target="orcid.widget" rel="noopener noreferrer" style="vertical-align:top;"><img src="https://orcid.org/sites/default/files/images/orcid_16x16.png" style="width:1em;margin-right:.5em;" alt="ORCID iD icon"></a>'
link-citations: yes
csl: unified_stylesheet_linguistics.csl
bibliography: index_ref.bib
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.
[![DOI](https://img.shields.io/badge/doi-10.6084/m9.figshare.9970205-blue.svg?style=flat&labelColor=whitesmoke&logo=data%3Aimage%2Fpng%3Bbase64%2CiVBORw0KGgoAAAANSUhEUgAAAB8AAAAfCAYAAAAfrhY5AAAJsklEQVR42qWXd1DTaRrHf%2BiB2Hdt5zhrAUKz4IKEYu9IGiGFFJJQ0gkJCAKiWFDWBRdFhCQUF3UVdeVcRQEBxUI3yY9iEnQHb3bdW1fPubnyz%2F11M7lvEHfOQee2ZOYzPyDv%2B3yf9%2Fk95YX4fx%2BltfUt08GcFEuPR4U9hDDZ%2FVngIlhb%2FSiI6InkTgLzgDcgfvtnovhH4BzoVlrbwr55QnhCtBW4QHXnFrZbPBaQoBh4%2FSYH2EnpBEtqcDMVzB93wA%2F8AFwa23XFGcc8CkT3mxz%2BfXWtq9T9IQlLIXYEuHojudb%2BCM7Hgdq8ydi%2FAHiBXyY%2BLjwFlAEnS6Jnar%2FvnQVhvdzasad0eKvWZKe8hvDB2ofLZ%2FZEcWsh%2BhyIuyO5Bxs2iZIE4nRv7NWAb0EO8AC%2FWPxjYAWuOEX2MSXZVgPxzmRL3xKz3ScGpx6p6QnOx4mDIFqO0w6Q4fEhO5IzwxlSwyD2FYHzwAW%2BAZ4fEsf74gCumykwNHskLM7taQxLYjjIyy8MUtraGhTWdkfhkFJqtvuVl%2F9l2ZquDfEyrH8B0W06nnpH3JtIyRGpH1iJ6SfxDIHjRXHJmdQjLpfHeN54gnfFx4W9QRnovx%2FN20aXZeTD2J84hn3%2BqoF2Tqr14VqTPUCIcP%2B5%2Fly4qC%2BUL3sYxSvNj1NwsVYPsWdMUfomsdkYm3Tj0nbV0N1wRKwFe1MgKACDIBdMAhPE%2FwicwNWxll8Ag40w%2BFfhibJkGHmutjYeQ8gVlaN%2BjO51nDysa9TwNUFMqaGbKdRJZFfOJSp6mkRKsv0rRIpEVWjAvyFkxNOEpwvcAVPfEe%2Bl8ojeNTx3nXLBcWRrYGxSRjDEk0VlpxYrbe1ZmaQ5xuT0u3r%2B2qe5j0J5uytiZPGsRL2Jm32AldpxPUNJ3jmmsN4x62z1cXrbedXBQf2yvIFCeZrtyicZZG2U2nrrBJzYorI2EXLrvTfCSB43s41PKEvbZDEfQby6L4JTj%2FfIwam%2B4%2BwucBu%2BDgNK05Nle1rSt9HvR%2FKPC4U6LTfvUIaip1mjIa8fPzykii23h2eanT57zQ7fsyYH5QjywwlooAUcAdOh5QumgTHx6aAO7%2FL52eaQNEShrxfhL6albEDmfhGflrsT4tps8gTHNOJbeDeBlt0WJWDHSgxs6cW6lQqyg1FpD5ZVDfhn1HYFF1y4Eiaqa18pQf3zzYMBhcanlBjYfgWNayAf%2FASOgklu8bmgD7hADrk4cRlOL7NSOewEcbqSmaivT33QuFdHXj5sdvjlN5yMDrAECmdgDWG2L8P%2BAKLs9ZLZ7dJda%2BB4Xl84t7QvnKfvpXJv9obz2KgK8dXyqISyV0sXGZ0U47hOA%2FAiigbEMECJxC9aoKp86re5O5prxOlHkcksutSQJzxZRlPZmrOKhsQBF5zEZKybUC0vVjG8PqOnhOq46qyDTDnj5gZBriWCk4DvXrudQnXQmnXblebhAC2cCB6zIbM4PYgGl0elPSgIf3iFEA21aLdHYLHUQuVkpgi02SxFdrG862Y8ymYGMvXDzUmiX8DS5vKZyZlGmsSgQqfLub5RyLNS4zfDiZc9Edzh%2FtCE%2BX8j9k%2FqWB071rcZyMImne1SLkL4GRw4UPHMV3jjwEYpPG5uW5fAEot0aTSJnsGAwHJi2nvF1Y5OIqWziVCQd5NT7t6Q8guOSpgS%2Fa1dSRn8JGGaCD3BPXDyQRG4Bqhu8XrgAp0yy8DMSvvyVXDgJcJTcr1wQ2BvFKf65jqhvmxXUuDpGBlRvV36XvGjQzLi8KAKT2lYOnmxQPGorURSV0NhyTIuIyqOmKTMhQ%2BieEsgOgpc4KBbfDM4B3SIgFljvfHF6cef7qpyLBXAiQcXvg5l3Iunp%2FWv4dH6qFziO%2BL9PbrimQ9RY6MQphEfGUpOmma7KkGzuS8sPUFnCtIYcKCaI9EXo4HlQLgGrBjbiK5EqMj2AKWt9QWcIFMtnVvQVDQV9lXJJqdPVtUQpbh6gCI2Ov1nvZts7yYdsnvRgxiWFOtNJcOMVLn1vgptVi6qrNiFOfEjHCDB3J%2BHDLqUB77YgQGwX%2Fb1eYna3hGKdlqJKIyiE4nSbV8VFgxmxR4b5mVkkeUhMgs5YTi4ja2XZ009xJRHdkfwMi%2BfocaancuO7h%2FMlcLOa0V%2FSw6Dq47CumRQAKhgbOP8t%2BMTjuxjJGhXCY6XpmDDFqWlVYbQ1aDJ5Cptdw4oLbf3Ck%2BdWkVP0LpH7s9XLPXI%2FQX8ws%2Bj2In63IcRvOOo%2BTTjiN%2BlssfRsanW%2B3REVKoavBOAPTXABW4AL7e4NygHdpAKBscmlDh9Jysp4wxbnUNna3L3xBvyE1jyrGIkUHaqQMuxhHElV6oj1picvgL1QEuS5PyZTEaivqh5vUCKJqOuIgPFGESns8kyFk7%2FDxyima3cYxi%2FYOQCj%2F%2B9Ms2Ll%2Bhn4FmKnl7JkGXQGDKDAz9rUGL1TIlBpuJr9Be2JjK6qPzyDg495UxXYF7JY1qKimw9jWjF0iV6DRIqE%2B%2FeWG0J2ofmZTk0mLYVd4GLiFCOoKR0Cg727tWq981InYynvCuKW43aXgEjofVbxIqrm0VL76zlH3gQzWP3R3Bv9oXxclrlO7VVtgBRpSP4hMFWJ8BrUSBCJXC07l40X4jWuvtc42ofNCxtlX2JH6bdeojXgTh5TxOBKEyY5wvBE%2BACh8BtOPNPkApjoxi5h%2B%2FFMQQNpWvZaMH7MKFu5Ax8HoCQdmGkJrtnOiLHwD3uS5y8%2F2xTSDrE%2F4PT1yqtt6vGe8ldMBVMEPd6KwqiYECHDlfbvzphcWP%2BJiZuL5swoWQYlS%2Br7Yu5mNUiGD2retxBi9fl6RDGn4Ti9B1oyYy%2BMP5G87D%2FCpRlvdnuy0PY6RC8BzTA40NXqckQ9TaOUDywkYsudxJzPgyDoAWn%2BB6nEFbaVxxC6UXjJiuDkW9TWq7uRBOJocky9iMfUhGpv%2FdQuVVIuGjYqACbXf8aa%2BPeYNIHZsM7l4s5gAQuUAzRUoT51hnH3EWofXf2vkD5HJJ33vwE%2FaEWp36GHr6GpMaH4AAPuqM5eabH%2FhfG9zcCz4nN6cPinuAw6IHwtvyB%2FdO1toZciBaPh25U0ducR2PI3Zl7mokyLWKkSnEDOg1x5fCsJE9EKhH7HwFNhWMGMS7%2BqxyYsbHHRUDUH4I%2FAheQY7wujJNnFUH4KdCju83riuQeHU9WEqNzjsJFuF%2FdTDAZ%2FK7%2F1WaAU%2BAWymT59pVMT4g2AxcwNa0XEBDdBDpAPvgDIH73R25teeuAF5ime2Ul0OUIiG4GpSAEJeYW9wDTf43wfwHgHLKJoPznkwAAAABJRU5ErkJggg%3D%3D)](http://dx.doi.org/10.6084/m9.figshare.9970205)
# How to cite this repository
Please cite this repository as follows (in Unified Style Sheet for Linguistics):
> Rajeg, Gede Primahadi Wijaya, Karlina Denistia & Simon Musgrave. 2019. R Markdown Notebook for *Vector space model and the usage patterns of Indonesian denominal verbs*. figshare. https://doi.org10.6084/m9.figshare.9970205. https://figshare.com/articles/R_Markdown_Notebook_for_i_Vector_space_model_and_the_usage_patterns_of_Indonesian_denominal_verbs_i_/9970205.
# Preface
This is a repository containing the source [R Markdown](http://rmarkdown.rstudio.com) [Notebook](https://bookdown.org/yihui/rmarkdown/notebook.html) (i.e. `nusa_r_notebook.Rmd`) [@rajeg_r_2019] for the quantitative analyses accompanying our [paper](http://repository.tufs.ac.jp/handle/10108/94452) [@rajeg_vector_2019] on vector space models and Indonesian denominal verbs (published open-access in [*NUSA*](http://www.aa.tufs.ac.jp/en/publications/nusa)'s special issue titled [*Linguistic studies using large annotated corpora*](http://repository.tufs.ac.jp/handle/10108/94450), edited by [Hiroki Nomoto](http://www.tufs.ac.jp/ts/personal/nomoto/) and [David Moeljadi](http://compling.hss.ntu.edu.sg/who/david/)) [@nomoto_linguistic_2019]. The R Notebook, which is deployed as a GitHub [webpage](https://gederajeg.github.io/vector_space_model_indonesian/), however, does not provide detailed exposition and discussion for each points. Also, there can be differences in some of the text-narratives in the Notebook compared to the published manuscript after revision. Readers are referred to our published [paper](http://repository.tufs.ac.jp/handle/10108/94452) for details. Our computational analyses in the R Notebook used the following [R](https://www.r-project.org) packages, which have to be installed in R to run all codes in the Notebook:
- [cluster](https://cran.r-project.org/web/packages/cluster/index.html) (version 2.0.7-1) [@maechler_cluster_2018]
- [tidyverse](https://www.tidyverse.org) (version 1.2.1) [@wickham_r_2017], the core of which includes:
- [dplyr](https://dplyr.tidyverse.org) (version 0.7.8) [@wickham_dplyr_2018]
- [ggplot2](https://ggplot2.tidyverse.org) (version 3.1.0) [@wickham_ggplot2_2016]
- [purrr](https://purrr.tidyverse.org) (version 0.3.0) [@henry_purrr_2019]
- [readr](https://readr.tidyverse.org) (version 1.3.1) [@wickham_readr_2018]
- [stringr](https://stringr.tidyverse.org) (version 1.3.1) [@wickham_stringr_2018]
- [tidyr](https://tidyr.tidyverse.org) (version 0.8.2) [@wickham_tidyr_2018]
- [tibble](https://tibble.tidyverse.org) (version 2.0.1) [@muller_tibble_2019]
- [dendextend](https://cran.r-project.org/web/packages/dendextend/index.html) (version 1.8.0) [@galili_dendextend_2015]
- [wordVectors](https://github.com/bmschmidt/wordVectors) (version 2.0) [@schmidt_wordvectors_2017]
- [Rling](https://benjamins.com/sites/z.195/content/package.html) (version 1.0) [@levshina_how_2015]
The analyses in the paper were conducted using R version 3.6.0 (2019-04-26) and [RStudio](https://www.rstudio.com) version 1.2.1335 for macOS.
# How to download/clone the repository and the data
1. Go to the GitHub repo(sitory): https://github.com/gederajeg/vector_space_model_indonesian.
2. Then, find and click the green button saying `"Clone or download"` and then the `"Download ZIP"` option (see the picture below).
![Downloading the repository from GitHub](gh_tuts_1_clone.png)
3. The second step above will download the repo as a folder, by default called `vector_space_model_indonesian-master`. We suggest keep this folder's name. The folder consists of, among others, README files, .bib file, and the R Notebook containing the R codes for producing the analyses in the paper (incl. figures and tables).
4. Download the dataset [@rajeg_dataset_2019] from [figshare](https://doi.org/10.6084/m9.figshare.8187155) (we store them on figshare due to their large size for version control, especially for the vector space model). Please read the information page before clicking the white button saying `"Download all"` (next to the dark pink `"Cite"` button) to download all the data. Please cite the data as:
> Rajeg, Gede Primahadi Wijaya, Karlina Denistia & Simon Musgrave. 2019. Dataset for *Vector space model and the usage patterns of Indonesian denominal verbs*. figshare. https://doi.org10.6084/m9.figshare.8187155. https://figshare.com/articles/Dataset_for_i_Vector_space_model_and_the_usage_patterns_of_Indonesian_denominal_verbs_i_/8187155.
5. Please rename the downloaded data folder into `data` and move this `data` folder inside the `vector_space_model_indonesian-master` folder so that the structure of the directory has to look like below:
![Project directory](gh_tuts_4_project_directory.png)
# How to run the codes in the R Notebook
1. Make sure all the required R packages mentioned above are installed in R and you have the latest version of [RStudio](https://www.rstudio.com) (download from [here](https://www.rstudio.com/products/rstudio/download/)).
2. Next, go to the `vector_space_model_indonesian-master` folder and double-click the `MeNasal.Rproj` file. It will open up an RStudio session associated with data and codes in this project.
3. Then, open the R Notebook file called `nusa_r_notebook.Rmd` by going to `File` > `Open File ...` (or use <kbd>⌘</kbd>+<kbd>O</kbd> on macOS or <kbd>Ctrl</kbd>+<kbd>O</kbd> on Windows), then select the given `.Rmd` file.
4. The codes can be run/executed all at once (i) using keyboard shortcut <kbd>⌥</kbd>+<kbd>⌘</kbd>+<kbd>R</kbd> on macOS (i.e., <kbd>Option</kbd>+<kbd>Cmd</kbd>+<kbd>R</kbd>) or <kbd>Alt</kbd>+<kbd>Ctrl</kbd>+<kbd>R</kbd> on Windows, (ii) or by navigating to the drop-down `Run` button and select `Run All` as shown below.
![Running the notebook](gh_tuts_2_run_notebook.png)
After running all the codes, reader may preview the notebook in HTML format by clicking on the `Preview` button or by using keyboard shortcut <kbd>⌘</kbd>+<kbd>⇧</kbd>+<kbd>K</kbd> (i.e., <kbd>Cmd/Ctrl</kbd>+<kbd>Shift</kbd>+<kbd>K</kbd>).
5. Alternative to the run-all option in (4) above, reader may wish to run the code chunk-by-chunk. The code-chunk is indicated by grey-shaded area in the Notebook (see the picture below).
![Running the code chunk](gh_tuts_3_run_nbook_chunk.png)
Place the cursor in each chunk and then use keyboard shortcut <kbd>⌘</kbd>+<kbd>⇧</kbd>+<kbd>Enter</kbd> (i.e., <kbd>Cmd/Ctrl</kbd>+<kbd>Shift</kbd>+<kbd>Enter</kbd>) to run the codes in the given chunk. Another way is to click the green arrow button (see the picture above).
```{r sessinfo}
devtools::session_info()
```
# References