Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to read and merge gender data to author data #216

Merged
merged 29 commits into from
Dec 21, 2021
Merged
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
8868ff4
Add function 'read.gender'
Oct 26, 2021
bfbe4de
Add functions to merge gender data to author data
Oct 26, 2021
0a23862
Add gender related configuration attributes
Oct 26, 2021
a7744b5
Add a test and sample data for the 'read.gender'
Oct 26, 2021
1eeca29
Add new changes regarding gender data to NEWS.md
Oct 26, 2021
6a50fd1
Add a folder 'test_empty_gender' in the test data
Nov 5, 2021
53ef8cd
Place gender section below author section
Nov 24, 2021
d795dac
Remove unnecessary comment regarding gender file
Nov 24, 2021
faa5b34
Reorder functions in util-read.R
Nov 24, 2021
15edca6
Reorder attributes in the util-conf.R
Nov 24, 2021
2f8480b
Reorder data paths in util-conf.R
Nov 24, 2021
c332c91
Refactor read.gender function to reduce complexity
Nov 24, 2021
5c50742
Refactor functions related to gender in util-data
Nov 24, 2021
cf2cce8
Replace is.null with empty for gender data
Dec 7, 2021
cd99e5c
Add ".list" ending to the gender file
Dec 7, 2021
413e24c
Add "cleanup.gender.data" function
Dec 7, 2021
cbcd552
Reorder tests in "test-read.R" and fix typo
Dec 8, 2021
39db315
Add gender data to ProjectData comparison tests
Dec 9, 2021
85c3056
Add info about fixed errors to NEWS.md
Dec 9, 2021
bc30c40
Add gender data to the necessary section of README
Dec 9, 2021
25fb862
Fix failing test due to updated igraph calculation
hechtlC Nov 30, 2021
1b4072c
Fix filtering of the deleted user
hechtlC Nov 30, 2021
c3ada92
Add gender to necessary additional resource lists
Dec 13, 2021
1e4026d
Restrict gender labels by predefined lables
Dec 14, 2021
be56183
Update gender test because of predefined lables
Dec 14, 2021
4d631bd
Reorder functions in util-data.r
Dec 15, 2021
8769ccf
Remove rownames while reading gender data
Dec 19, 2021
50292cb
Edit information about gender data in README.md
Dec 19, 2021
17811be
Update broken commit hashes in NEWS.md
Dec 19, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions util-read.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
## Copyright 2018-2019 by Anselm Fehnker <fehnker@fim.uni-passau.de>
## Copyright 2020-2021 by Niklas Schneider <s8nlschn@stud.uni-saarland.de>
## Copyright 2021 by Johannes Hostert <s8johost@stud.uni-saarland.de>
## Copyright 2021 by Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
## All Rights Reserved.

## Note:
Expand Down Expand Up @@ -387,6 +388,64 @@ read.issues = function(data.path, issues.sources = c("jira", "github")) {
}


## * Gender data ------------------------------------------------------------
miriyusifli marked this conversation as resolved.
Show resolved Hide resolved

## column names of a dataframe containing gender data (see function \code{read.gender})
GENDER.LIST.COLUMNS = c(
"author.name", "gender"
)

## declare the datatype for each column in the constant 'GENDER.LIST.COLUMNS'
GENDER.LIST.DATA.TYPES = c(
"character", "character"
)

#' Read and parse the gender data from the 'gender' file.
#' The form in the file is : author.name, gender
miriyusifli marked this conversation as resolved.
Show resolved Hide resolved
#' The parsed form is a data frame with author.name as key, gender as value.
#'
#' @param data.path the path to the gender data
#'
#' @return the read and parsed gender data
read.gender = function(data.path) {
# constant for seperating key and value
SEPERATOR = ";"

## get file name of gender data
filepath = file.path(data.path, "gender")
miriyusifli marked this conversation as resolved.
Show resolved Hide resolved

## read data from disk [can be empty]
lines = suppressWarnings(try(readLines(filepath), silent = TRUE))
miriyusifli marked this conversation as resolved.
Show resolved Hide resolved

## handle the case if the list of items is empty
if (inherits(lines, "try-error")) {
logging::logwarn("There are no gender data available for the current environment.")
logging::logwarn("Datapath: %s", data.path)
return(create.empty.gender.list())
}

result.list = parallel::mcmapply(lines, seq_along(lines), SIMPLIFY = FALSE, FUN = function(line, line.id) {
if ( nchar(line) == 0 ) {
return(NULL)
}

# 1) split key
# 2) split value
line.split = unlist(strsplit(line, SEPERATOR))
key = line.split[1]
value = line.split[2]

# Transform data to data.frame
df = merge(key, value)
colnames(df) = c("author.name", "gender")
return(df)
})

result.df = plyr::rbind.fill(result.list)
logging::logdebug("read.gender: finished.")
return(result.df)
miriyusifli marked this conversation as resolved.
Show resolved Hide resolved
}

#' Create an empty dataframe which has the same shape as a dataframe containing issues. The dataframe has the column
#' names and column datatypes defined in \code{ISSUES.LIST.COLUMNS} and \code{ISSUES.LIST.DATA.TYPES}, respectively.
#'
Expand All @@ -395,6 +454,14 @@ create.empty.issues.list = function() {
return (create.empty.data.frame(ISSUES.LIST.COLUMNS, ISSUES.LIST.DATA.TYPES))
}

#' Create an empty dataframe which has the same shape as a dataframe containing gender data.
#' The dataframe has the column names and column datatypes defined in \code{GENDER.LIST.COLUMNS}
#' and \code{GENDER.LIST.DATA.TYPES}, respectively.
#'
#' @return the empty dataframe
create.empty.gender.list = function() {
return (create.empty.data.frame(GENDER.LIST.COLUMNS, GENDER.LIST.DATA.TYPES))
}
miriyusifli marked this conversation as resolved.
Show resolved Hide resolved

## / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
## Additional data sources -------------------------------------------------
Expand Down