Skip to content

Commit

Permalink
Merge branch 'DMR_final' of github.com:rkrug/IPBES---TFC-Ch-2 into DM…
Browse files Browse the repository at this point in the history
…R_final
  • Loading branch information
rkrug committed Nov 19, 2024
2 parents 9dd4c71 + 7d25815 commit 2db44b5
Show file tree
Hide file tree
Showing 6 changed files with 77 additions and 77 deletions.
12 changes: 6 additions & 6 deletions IPBES_TCA_Ch1_evidence_causes.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -222,11 +222,11 @@ The `data` folder is also available in a separate deposit at [10.5281/zenodo.113

To guarantee reproducibility, it will be downloaded and extracted when the folder `ch1_evidence_causes/data` does not exist

All code to re-generate the data is included but might take a long time to run and produce different numbers as OpenAlex is updated continously.
All code to re-generate the data is included, but it might take a long time to run and produce different numbers as OpenAlex is updated continously.

Disable this block and delete all content in the folder `ch1_evidence_causes/data` to re-generate the data. The folder `ch1_evidence_causes/data` has to exist.

This code will only work after the approval of the assessment by the plenary as the repository will remain confidential before.
This code will only work after the approval of the assessment by the Plenary as the repository will remain confidential before.

```{r}
#| label: download_data
Expand Down Expand Up @@ -261,9 +261,9 @@ The BuidNo is automatically increased by one each time the report is rendered. I

![Overview of the analysis and results.](ch1_evidence_causes/figures/Ch1_evidence_causes.svg){width=100% height=100%}

- To download png, [click here](ch1_evidence_causes/figures/Ch1_evidence_causes.png){target="_blank"}.
- To download high resolution pdf, [click here](ch1_evidence_causes/figures/Ch1_evidence_causes.pdf){target="_blank"}.
- To download high resolution svg, [click here](ch1_evidence_causes/figures/Ch1_evidence_causes.svg){target="_blank"}.
- To download a png, [click here](ch1_evidence_causes/figures/Ch1_evidence_causes.png){target="_blank"}.
- To download a high resolution pdf, [click here](ch1_evidence_causes/figures/Ch1_evidence_causes.pdf){target="_blank"}.
- To download a high resolution svg, [click here](ch1_evidence_causes/figures/Ch1_evidence_causes.svg){target="_blank"}.

All searches are done on OpenAlex directly.
The downloaded TCA Corpus is not used directly due to methodological and technical limitations.
Expand Down Expand Up @@ -333,7 +333,7 @@ count$ethical_tca <- openalexR::oa_fetch(
The search terms is [concepts_1](ch1_evidence_causes/input/concepts_1.txt){target=_blank}
Open Alex search.

The [concepts_1](ch1_evidence_causes/input/concepts_1.txt){target=_blank} search returns a subset of the [nature corpus](tca_corpus/input/search terms/nature.txt){target=_blank} as `biodiversity` is a subset of the nature corpus. Therefore, it is not nbecessary to subset the nature corpus and the search can be done on the complete OpenAlex corpus
The [concepts_1](ch1_evidence_causes/input/concepts_1.txt){target=_blank} search returns a subset of the [nature corpus](tca_corpus/input/search terms/nature.txt){target=_blank} as `biodiversity` is a subset of the nature corpus. Therefore, it is not necessary to subset the nature corpus and the search can be done on the complete OpenAlex corpus.
.
```{r}
#| label: get_concepts_1_count
Expand Down
56 changes: 28 additions & 28 deletions IPBES_TCA_Ch2_technology.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -202,11 +202,11 @@ IPBES_TCA_Ch2_technology

# Build No: `r build`

The BuidNo is automatically increased by one each time the report is rendered. It is used to indicate different renderings when the version stays the same.
The BuildNo is automatically increased by one each time the report is rendered. It is used to indicate different renderings when the version stays the same.

## Introduction

All searches are done on all works in OpenAlex. The search in the TCA Corpus is not possibly at the moment, but we are working on it.
All searches are done on all works in OpenAlex. The search in the TCA Corpus is not possible at the moment, but we are working on it.


# Methods
Expand All @@ -216,7 +216,7 @@ All searches are done on all works in OpenAlex. The search in the TCA Corpus is

The `data` folder is also available in a separate deposit at [10.5281/zenodo.11389148](https://doi.org/10.5281/zenodo.11389207).

To guarantee reproducibility, it will be downloaded and extracted when the folder `ch2_technology/data` does not exist
To guarantee reproducibility, it will be downloaded and extracted when the folder `ch2_technology/data` does not exist.

All code to re-generate the data is included but might take a long time to run and produce different numbers as OpenAlex is updated continously.

Expand Down Expand Up @@ -600,11 +600,11 @@ if (get_count) {

The corpus download will be stored in `ch2_technology/pages` and the parquet database in `ch2_technology/data/corpus_complete`. This one will be filtered with the TCA / Global Corpus and get the final name `ch2_technology/data/corpus`.

This is not on github!
This is not on GitHub!

The corpus can be read by running `corpus_read("ch2_technology/data/corpus")` which opens the database so that then it can be fed into a `dplyr` pipeline. After most `dplyr` functions, the actual data needs to be collected via `collect()`.

Only then is the actual data read!
Only then the actual data is read!

Needs to be enabled by setting `eval: true` in the code block below.

Expand Down Expand Up @@ -693,14 +693,14 @@ toc()



Check the number of dulicates before running this next block, and then verify the new corpus afterwards. RUN ONLY MANUALY!
Check the number of dulicates before running this next block, and then verify the new corpus afterwards. RUN ONLY MANUALLY!

```{r}
#| label: fix_duplicate_ids_TEMPORARY
#| eval: false
#|
ONLY RUN MANUALLY!!!!!!!!!!!!!!!!!!!!!!!
ONLY RUN MANUALLY!
(read_corpus(params$corpus_dir) |> group_by(id) |> summarize(n = n()) |> filter(n > 1) |> collect() |> nrow()) / (corpus_read(params$corpus_dir) |> nrow())
Expand Down Expand Up @@ -759,7 +759,7 @@ NOW IF EVERYTHING IS OK, DELETE THE OLD CORPUS AND RENAME THE NEW ONE

The Sentiment Analysis has been implemented by [Maral Dadvar](mailto://dadvar.maral@gmail.com)

The sentiment analys is based on the abstracts of the works. As not all works has abstracts, the number of datapoints in the sentiment analysis is smaller then the number of works. The sentiment analysis was implemented using the [Python NLTK package](https://www.nltk.org/), and [VADER](https://www.nltk.org/_modules/nltk/sentiment/vader.html) which is an NLTK module that provides sentiment scores based on the words used. Vader is a pre-trained, rule-based sentiment analysis model in which the terms are generally labeled as per their semantic orientation as either positive or negative. The main advantage/reason for using this model was that it doesn't require a labelled training dataset.
The sentiment analys is based on the abstracts of the works. As not all works have abstracts, the number of datapoints in the sentiment analysis is smaller then the number of works. The sentiment analysis was implemented using the [Python NLTK package](https://www.nltk.org/), and [VADER](https://www.nltk.org/_modules/nltk/sentiment/vader.html) which is an NLTK module that provides sentiment scores based on the words used. Vader is a pre-trained, rule-based sentiment analysis model in which the terms are generally labeled as per their semantic orientation as either positive or negative. The main advantage/reason for using this model was that it doesn't require a labelled training dataset.

It returns a positive, negative, neutral and compound score. The compound score is a composite score that summarizes the overall sentiment of the text, where scores close to 1 indicate a positive sentiment, scores close to -1 indicate a negative sentiment, and scores close to 0 indicate a neutral sentiment. The other three scores show the percentage of each of the sentiments in the text.

Expand Down Expand Up @@ -2121,7 +2121,7 @@ if (length(list.files(path = file.path("ch2_technology", "figures"), pattern = "
```


## Extract `marine papers with sentiment scores
## Extract marine papers with sentiment scores

```{r}
#| label: extract_marine_papers
Expand Down Expand Up @@ -2291,7 +2291,7 @@ readRDS(params$fn_sentiment_results) |>
IPBES.R::table_dt(fn = "sentiment_scores", fixedColumns = list(leftColumns = 2))
```

Here is the per country table
Here is the sentiments per country table

```{r}
#| label: sentiments_countries_table
Expand All @@ -2309,14 +2309,14 @@ This graphs shows the sentiment scores of the sentiment analysis over time.

![](`r file.path("ch2_technology", "figures", "sentiments_over_time.png")`)

To download high resolution, [click here](`r file.path("ch2_technology", "figures", "sentiments_over_time.pdf")`){target="_blank"}
To download in high resolution, [click here](`r file.path("ch2_technology", "figures", "sentiments_over_time.pdf")`){target="_blank"}


For clarity, here only the positive and egative sentiments.

![](`r file.path("ch2_technology", "figures", "sentiments_over_time_neg_pos.png")`)

To download high resolution, [click here](`r file.path("ch2_technology", "figures", "sentiments_over_time_neg_pos.pdf")`){target="_blank"}
To download in high resolution, [click here](`r file.path("ch2_technology", "figures", "sentiments_over_time_neg_pos.pdf")`){target="_blank"}


### Negative Sentiment
Expand All @@ -2328,18 +2328,18 @@ This graphs shows the **negative score** of the sentiment analysis over time. It

![](`r file.path("ch2_technology", "figures", "sentiments_neg_over_time.png")`)

To download high resolution, [click here](`r file.path("ch2_technology", "figures", "sentiments_neg_over_time.pdf")`){target="_blank"}
To download in high resolution, [click here](`r file.path("ch2_technology", "figures", "sentiments_neg_over_time.pdf")`){target="_blank"}


#### Per country

![](`r file.path("ch2_technology", "maps", "sentiment_neg_per_countries_all.png")`)

To download high resolution, [click here](`r file.path("ch2_technology", "maps", "sentiment_neg_per_countries_all.pdf")`){target="_blank"}
To download in high resolution, [click here](`r file.path("ch2_technology", "maps", "sentiment_neg_per_countries_all.pdf")`){target="_blank"}

![](`r file.path("ch2_technology", "maps", "sentiment_neg_per_countries_10.png")`)

To download high resolution, [click here](`r file.path("ch2_technology", "maps", "sentiment_neg_per_countries_10.pdf")`){target="_blank"}
To download in high resolution, [click here](`r file.path("ch2_technology", "maps", "sentiment_neg_per_countries_10.pdf")`){target="_blank"}


### Neutral Sentiment
Expand All @@ -2352,19 +2352,19 @@ This graphs shows the **compound score** of the sentiment analysis over time. It

![](`r file.path("ch2_technology", "figures", "sentiments_neu_over_time.png")`)

To download high resolution, [click here](`r file.path("ch2_technology", "figures", "sentiments_neu_over_time.pdf")`){target="_blank"}
To download in high resolution, [click here](`r file.path("ch2_technology", "figures", "sentiments_neu_over_time.pdf")`){target="_blank"}


#### Per country


![](`r file.path("ch2_technology", "maps", "sentiment_neu_per_countries_all.png")`)

To download high resolution, [click here](`r file.path("ch2_technology", "maps", "sentiment_neu_per_countries_all.pdf")`){target="_blank"}
To download in high resolution, [click here](`r file.path("ch2_technology", "maps", "sentiment_neu_per_countries_all.pdf")`){target="_blank"}

![](`r file.path("ch2_technology", "maps", "sentiment_neu_per_countries_10.png")`)

To download high resolution, [click here](`r file.path("ch2_technology", "maps", "sentiment_neu_per_countries_10.pdf")`){target="_blank"}
To download in high resolution, [click here](`r file.path("ch2_technology", "maps", "sentiment_neu_per_countries_10.pdf")`){target="_blank"}



Expand All @@ -2377,17 +2377,17 @@ This graphs shows the **compound score** of the sentiment analysis over time. It

![](`r file.path("ch2_technology", "figures", "sentiments_pos_over_time.png")`)

To download high resolution, [click here](`r file.path("ch2_technology", "figures", "sentiments_pos_over_time.pdf")`){target="_blank"}
To download in high resolution, [click here](`r file.path("ch2_technology", "figures", "sentiments_pos_over_time.pdf")`){target="_blank"}

#### Per country

![](`r file.path("ch2_technology", "maps", "sentiment_pos_per_countries_all.png")`)

To download high resolution, [click here](`r file.path("ch2_technology", "maps", "sentiment_pos_per_countries_all.pdf")`){target="_blank"}
To download in high resolution, [click here](`r file.path("ch2_technology", "maps", "sentiment_pos_per_countries_all.pdf")`){target="_blank"}

![](`r file.path("ch2_technology", "maps", "sentiment_pos_per_countries_10.png")`)

To download high resolution, [click here](`r file.path("ch2_technology", "maps", "sentiment_neu_per_countries_10.pdf")`){target="_blank"}
To download in high resolution, [click here](`r file.path("ch2_technology", "maps", "sentiment_neu_per_countries_10.pdf")`){target="_blank"}



Expand All @@ -2400,18 +2400,18 @@ This graphs shows the **compound score** of the sentiment analysis over time. It

![](`r file.path("ch2_technology", "figures", "sentiments_comp_over_time.png")`)

To download high resolution, [click here](`r file.path("ch2_technology", "figures", "sentiments_comp_over_time.pdf")`){target="_blank"}
To download in high resolution, [click here](`r file.path("ch2_technology", "figures", "sentiments_comp_over_time.pdf")`){target="_blank"}

#### Per country


![](`r file.path("ch2_technology", "maps", "sentiment_comp_per_countries_all.png")`)

To download high resolution, [click here](`r file.path("ch2_technology", "maps", "sentiment_comp_per_countries_all.pdf")`){target="_blank"}
To download in high resolution, [click here](`r file.path("ch2_technology", "maps", "sentiment_comp_per_countries_all.pdf")`){target="_blank"}

![](`r file.path("ch2_technology", "maps", "sentiment_comp_per_countries_10.png")`)

To download high resolution, [click here](`r file.path("ch2_technology", "maps", "sentiment_comp_per_countries_10.pdf")`){target="_blank"}
To download in high resolution, [click here](`r file.path("ch2_technology", "maps", "sentiment_comp_per_countries_10.pdf")`){target="_blank"}

### Marine Vision and Technology Sentiments

Expand All @@ -2421,14 +2421,14 @@ This graphs shows the **positive and negative score** of the sentiment analysis

![](`r file.path("ch2_technology", "figures", "sentiments_marine_over_time_neg_pos.png")`)

To download high resolution, [click here](`r file.path("ch2_technology", "figures", "sentiments_marine_over_time_neg_pos.pdf")`){target="_blank"}
To download in high resolution, [click here](`r file.path("ch2_technology", "figures", "sentiments_marine_over_time_neg_pos.pdf")`){target="_blank"}

![](`r file.path("ch2_technology", "figures", "sentiments_marine_over_time_neg_pos_ridge.png")`)

To download high resolution, [click here](`r file.path("ch2_technology", "figures", "sentiments_marine_over_time_neg_pos_ridge.pdf")`){target="_blank"}
To download in high resolution, [click here](`r file.path("ch2_technology", "figures", "sentiments_marine_over_time_neg_pos_ridge.pdf")`){target="_blank"}

#### Download of Corpus
An Excel file conataining the works from the Narine subset of the Technology Corpus (`technology` AND `vision` AND `nature` AND `transformativechange` AND `marine`) with the fields `id`, `doi`, `author_abbr` and `abstract` as well as the sentiment scores of the papers.
An Excel file containing the works from the Marine subset of the Technology Corpus (`technology` AND `vision` AND `nature` AND `transformativechange` AND `marine`) with the fields `id`, `doi`, `author_abbr` and `abstract` as well as the sentiment scores of the papers.

There are technical issues with this at the moment.

Expand All @@ -2442,4 +2442,4 @@ The Excel file with all papers can be downloaded from [here](`r params$fn_marin

![](`r file.path("ch2_technology", "figures", "sentiments_over_time_neg_pos_ridge.png")`)

To download high resolution, [click here](`r file.path("ch2_technology", "figures", "sentiments_over_time_neg_pos_ridge.pdf")`){target="_blank"}
To download in high resolution, [click here](`r file.path("ch2_technology", "figures", "sentiments_over_time_neg_pos_ridge.pdf")`){target="_blank"}
Loading

0 comments on commit 2db44b5

Please sign in to comment.