-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functionality to read and merge gender data to author data #216
Conversation
Add the function 'read.gender' to read gender data that contains 2 columns: name;gender Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Add the base functions 'get.data.path.gender', 'get.gender', 'set.gender', and 'update.gender.data' to manage gender data Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Add gender related configurations to 'util-conf.R' file Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Add test, test data, and sample data to test the 'read.gender' function. Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your nice additions to coronet
@miriyusifli! I have had a quick look at your changes and they look good in general. However, I have identified a couple of inconsistencies and minor issues which have to be fixed. Please have a look at my detailed comments below.
In addition, the documentation of your changes in several README files is missing. So, please consider the following comments:
coronet/tests/README.md
: You have added gender data to one of these projects, so please mention this here (directly below bots).README.md
: Please update the readme regarding the new data source. That is, update sectionData sources
and enhance the list ofadditional (orthogonal) data sources
by adding a new list item between "commit messages" and "PaStA", describing the gender data. In addition, there is also a section "Configuration Classes" containing a subsection "ProjectConf", containing a subsection "(Configurable) Data-retrieval-related parameters". Please add the gender parameter in this sub-sub-section (at the same place, i.e., the same order, as in the code, see also the comment below.)
In case of any questions (i.e., if something is unclear to you), please don't hesitate to ask!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all thanks for the nice changed @miriyusifli.
I have not found any new issues here as of now (except for the failing test which has to be fixed). But I totally agree with @bockthom on the points he raised. So please change this.
Add description of new functionalities related to gender data to the NEWS.md Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Include a folder 'test_empty_gender' with an empty 'gender' file in it in the test data folder. Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Move functions related to gender data to the 'additional data sources' section to keep a meaningful order of the sections. Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Remove the sentence regarding the form of the gender file Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Reorder functions in util-read.R file to keep meaningful order of the functions. Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Reorder attributes in the util-conf.R to keep the meaningfull order Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Reorder data paths in util-conf.R to keep meaningfull order Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Refactor read.gender function to reduce complexity removing separators and etc. Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Improvements: - Reordering functions - Reordering configuration parameters - Fixing typos in comments Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Use empty function to check if the gender data frame is empty. Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Add "cleanup.geder.data" function to remove all lines from the gender data that contain author names that do not appear in the author data Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Reorder tests in "test-read.R" file Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
464ad0f
to
1efb8fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miriyusifli Thanks a lot for your changes, this PR looks quite good now.
I had a look at your changes and spotted a few newly introduced inconsistencies which should be easy to fix, please find my comments below. In case of questions, don't hesitate to ask.
We are really close to finishing this PR, not much work left 😄
(Regarding the failing CI test regarding eigenvector centrality: Don't worry about that, @hechtlC is currently figuring out why this test fails. We'll let you know as soon as we have identified what's going wrong there. This should be independent from your changes. Hence, you can continue working on the remaining issues of your PR.)
c582e2d
to
caa08e3
Compare
- Add gender data to ProjectData comparison tests in test-data.R - Rename gender files from 'gender' to 'gender.list' Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Added explanation about fixed errors to NEWS.md Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
- Add gender data to the tests/README.md - Add gender data to the README.md Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Since igraph has changed the calculation of eigenvector centrality with version '1.2.7', the test for this had to be changed to the correct values. Also add a warning message in the install.R script that we recommend igraph version 1.2.7 or higher. Also document this in the README. Signed-off-by: Christian Hechtl <hechtl@cs.uni-saarland.de> Signed-off-by: Thomas Bock <bockthom@cs.uni-saarland.de> Applied-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
As documented in se-sic#214 we have to make the deleted user to lowercase as this user can appear in all variations of upper and lowercase letters. So by making this lowercase, we fix this. Signed-off-by: Christian Hechtl <hechtl@cs.uni-saarland.de> Signed-off-by: Thomas Bock <bockthom@cs.uni-saarland.de> Applied-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Add gender data to necessary additional resource lists to pass the tests successfully Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
- Allow only 'male', 'female', 'unknown' lables for the gender column. - Replace all undefined lables with 'unknown' - Replace all 'unknown' lables with NA Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Update gender test and its data because of predefined labels Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Reorder functions and fix small formatting issues in util-data.r Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miriyusifli Great, I like your new changes. There are just three minor issues left (one I have overlooked yesterday, and two newly introduced ones), and the two comments you replied to. Please see my answers below.
As soon as the six remaining issues are addressed, we are ready to merge. Thanks for your patience.
- Remove rownames while reading gender data - Fix some formatting issues Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
There are no edge attributes for gender. Therefore, edit information about gender data in README.md Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
Update broken commit hashes related gender data in NEWS.md Signed-off-by: Mirabdulla Yusifli <s8miyusi@stud.uni-saarland.de>
@bockthom Done :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @miriyusifli. Thanks for your endurance and thanks for contributing to coronet
. Good work, this is a nice and beneficial enhancement for coronet
.
We (@hechtlC and me) will merge this PR soon (and deal with the CI failure regarding eigenvector-centrality afterwards, as this failure is not related to your changes but affects coronet
in general). So, now you're really done with this PR 🥳
We wish you all the best for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @miriyusifli for the endurance while working on this PR! Ans also thanks for the nice additions to coronet.
@bockthom you can merge this once you are ready.
Prerequisites
showcase.R
with respect to my changes.dev
.Description
The pull request adds functionality to read and merge gender data to the author data
Changelog
util-conf.R
file (ebe4899)read.gender
to read the gender data (27bf762)get.data.path.gender
,get.gender
,set.gender
, andupdate.gender.data
to manage the gender data (20a4e90)read.gender
function (9c9d261)