Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to read and merge gender data to author data #216

Merged
merged 29 commits into from
Dec 21, 2021

Conversation

miriyusifli
Copy link

Prerequisites

  • I adhere to the coding conventions (described here) in my code.
  • I have updated the copyright headers of the files I have modified.
  • I have written appropriate commit messages, i.e., I have recorded the goal, the need, the needed changes, and the location of my code modifications for each commit. This includes also, e.g., referencing to relevant issues.
  • I have put signed-off tags in all commits.
  • I have updated the changelog file NEWS.md appropriately.
  • I have checked whether I need to adjust the showcase file showcase.R with respect to my changes.
  • The pull request is opened against the branch dev.

Description

The pull request adds functionality to read and merge gender data to the author data

Changelog

  • Add gender related configurations to util-conf.R file (ebe4899)
  • Add function read.gender to read the gender data (27bf762)
  • Add the base functions get.data.path.gender, get.gender, set.gender, and update.gender.data to manage the gender data (20a4e90)
  • Add test, test data, and sample data to test the read.gender function (9c9d261)

Mirabdulla Yusifli added 3 commits October 27, 2021 08:21
Add the function 'read.gender' to read gender data that contains
2 columns: name;gender

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Add the base functions 'get.data.path.gender', 'get.gender', 'set.gender',
and 'update.gender.data' to manage gender data

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Add gender related configurations to 'util-conf.R' file

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
@miriyusifli miriyusifli deleted the dev branch October 27, 2021 06:33
@miriyusifli miriyusifli restored the dev branch October 27, 2021 06:34
@miriyusifli miriyusifli reopened this Oct 27, 2021
Add test, test data, and sample data to test the 'read.gender' function.

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Copy link
Collaborator

@bockthom bockthom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your nice additions to coronet @miriyusifli! I have had a quick look at your changes and they look good in general. However, I have identified a couple of inconsistencies and minor issues which have to be fixed. Please have a look at my detailed comments below.

In addition, the documentation of your changes in several README files is missing. So, please consider the following comments:

  1. coronet/tests/README.md: You have added gender data to one of these projects, so please mention this here (directly below bots).
  2. README.md: Please update the readme regarding the new data source. That is, update section Data sources and enhance the list of additional (orthogonal) data sources by adding a new list item between "commit messages" and "PaStA", describing the gender data. In addition, there is also a section "Configuration Classes" containing a subsection "ProjectConf", containing a subsection "(Configurable) Data-retrieval-related parameters". Please add the gender parameter in this sub-sub-section (at the same place, i.e., the same order, as in the code, see also the comment below.)

In case of any questions (i.e., if something is unclear to you), please don't hesitate to ask!

util-read.R Outdated Show resolved Hide resolved
util-read.R Outdated Show resolved Hide resolved
util-read.R Outdated Show resolved Hide resolved
util-read.R Outdated Show resolved Hide resolved
util-read.R Outdated Show resolved Hide resolved
util-conf.R Outdated Show resolved Hide resolved
util-conf.R Show resolved Hide resolved
tests/codeface-data/results/testing/test_gender/gender Outdated Show resolved Hide resolved
tests/test-read.R Outdated Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
Copy link
Contributor

@hechtlC hechtlC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all thanks for the nice changed @miriyusifli.
I have not found any new issues here as of now (except for the failing test which has to be fixed). But I totally agree with @bockthom on the points he raised. So please change this.

util-data.R Outdated Show resolved Hide resolved
Mirabdulla Yusifli added 13 commits December 9, 2021 12:13
Add description of new functionalities related to gender data to the
NEWS.md

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Include a folder 'test_empty_gender' with an empty 'gender' file in it in the test data folder.

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Move functions related to gender data to the 'additional data sources'
section to keep a meaningful order of the sections.

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Remove the sentence regarding the form of the gender file

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Reorder functions in util-read.R file to keep meaningful order of the
functions.

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Reorder attributes in the util-conf.R to keep the meaningfull order

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Reorder data paths in util-conf.R to keep meaningfull order

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Refactor read.gender function to reduce complexity removing separators
and etc.

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Improvements:
 - Reordering functions
 - Reordering configuration parameters
 - Fixing typos in comments

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Use empty function to check if the gender data frame is empty.

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Add "cleanup.geder.data" function to remove all lines from the gender
data that contain author names that do not appear in the author data

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Reorder tests in "test-read.R" file

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Copy link
Collaborator

@bockthom bockthom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miriyusifli Thanks a lot for your changes, this PR looks quite good now.
I had a look at your changes and spotted a few newly introduced inconsistencies which should be easy to fix, please find my comments below. In case of questions, don't hesitate to ask.
We are really close to finishing this PR, not much work left 😄

(Regarding the failing CI test regarding eigenvector centrality: Don't worry about that, @hechtlC is currently figuring out why this test fails. We'll let you know as soon as we have identified what's going wrong there. This should be independent from your changes. Hence, you can continue working on the remaining issues of your PR.)

util-read.R Outdated Show resolved Hide resolved
util-read.R Outdated Show resolved Hide resolved
util-read.R Outdated Show resolved Hide resolved
util-read.R Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
util-data.R Show resolved Hide resolved
util-data.R Show resolved Hide resolved
tests/test-data.R Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
tests/test-read.R Outdated Show resolved Hide resolved
Mirabdulla Yusifli added 2 commits December 15, 2021 07:59
- Add gender data to ProjectData comparison tests in test-data.R
- Rename gender files from 'gender' to 'gender.list'

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Added explanation about fixed errors to NEWS.md

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Mirabdulla Yusifli and others added 7 commits December 15, 2021 08:05
- Add gender data to the tests/README.md
- Add gender data to the README.md

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Since igraph has changed the calculation of eigenvector centrality with
version '1.2.7', the test for this had to be changed to the correct
values. Also add a warning message in the install.R script that we
recommend igraph version 1.2.7 or higher. Also document this in the
README.

Signed-off-by: Christian Hechtl <hechtl@cs.uni-saarland.de>
Signed-off-by: Thomas Bock <bockthom@cs.uni-saarland.de>

Applied-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
As documented in se-sic#214 we have to make the deleted user to lowercase as
this user can appear in all variations of upper and lowercase letters.
So by making this lowercase, we fix this.

Signed-off-by: Christian Hechtl <hechtl@cs.uni-saarland.de>
Signed-off-by: Thomas Bock <bockthom@cs.uni-saarland.de>

Applied-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Add gender data to necessary additional resource lists to pass
the tests successfully

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
 - Allow only 'male', 'female', 'unknown' lables for the gender column.
 - Replace all undefined lables with 'unknown'
 - Replace all 'unknown' lables with NA

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Update gender test and its data because of predefined labels

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Reorder functions and fix small formatting issues in util-data.r

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Copy link
Collaborator

@bockthom bockthom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miriyusifli Great, I like your new changes. There are just three minor issues left (one I have overlooked yesterday, and two newly introduced ones), and the two comments you replied to. Please see my answers below.

As soon as the six remaining issues are addressed, we are ready to merge. Thanks for your patience.

README.md Outdated Show resolved Hide resolved
util-read.R Outdated Show resolved Hide resolved
util-read.R Outdated Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
Mirabdulla Yusifli added 2 commits December 19, 2021 07:05
- Remove rownames while reading gender data
- Fix some formatting issues

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
There are no edge attributes for gender. Therefore,
edit information about gender data in README.md

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
@miriyusifli
Copy link
Author

miriyusifli commented Dec 19, 2021

@bockthom and @hechtlC all requested changes have been applied. I think PR is ready to be merged.

NEWS.md Outdated Show resolved Hide resolved
Update broken commit hashes related gender data in NEWS.md

Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
@miriyusifli
Copy link
Author

@bockthom Done :)

Copy link
Collaborator

@bockthom bockthom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @miriyusifli. Thanks for your endurance and thanks for contributing to coronet. Good work, this is a nice and beneficial enhancement for coronet.

We (@hechtlC and me) will merge this PR soon (and deal with the CI failure regarding eigenvector-centrality afterwards, as this failure is not related to your changes but affects coronet in general). So, now you're really done with this PR 🥳

We wish you all the best for the future.

Copy link
Contributor

@hechtlC hechtlC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @miriyusifli for the endurance while working on this PR! Ans also thanks for the nice additions to coronet.

@bockthom you can merge this once you are ready.

@bockthom bockthom merged commit 4eec43c into se-sic:dev Dec 21, 2021
@bockthom bockthom mentioned this pull request Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants