Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making network_plot more robust #107

Merged
merged 11 commits into from
Jul 26, 2020
Merged

Making network_plot more robust #107

merged 11 commits into from
Jul 26, 2020

Conversation

thisisdaryn
Copy link
Collaborator

Addresses #106

If the call to stats::cmdscale internal to network_plot does not return a matrix of 2 columns, uniform shifts are added to the off-diagonal entries of the distance matrix and another attempt is made. The search for the shift value iterates through a fixed list of possible values, 10^(-6:-1), before terminating.

The error in the #106 arises at the point of trying to rename the first 2 columns of a matrix that has only one column.

Warnings from stats::cmdscale are not suppressed but they could be. As is, the end-user of network plot will see warnings for each call that does not return a matrix of 2 columns.

Test running with the data of #106 (The variables in the example data are highly-correlated):

> camas_desempleo %>%
+   select(-CCAA) %>%
+   correlate() %>%
+   network_plot()

Correlation method: 'pearson'
Missing treated using: 'pairwise.complete.obs'

Warning messages:
1: In stats::cmdscale(distance) :
  only 1 of the first 2 eigenvalues are > 0
2: In stats::cmdscale(shifted_distance) :
  only 1 of the first 2 eigenvalues are > 0
3: In stats::cmdscale(shifted_distance) :
  only 1 of the first 2 eigenvalues are > 0
4: In stats::cmdscale(shifted_distance) :
  only 1 of the first 2 eigenvalues are > 0
5: In stats::cmdscale(shifted_distance) :
  only 1 of the first 2 eigenvalues are > 0
6: In stats::cmdscale(shifted_distance) :
  only 1 of the first 2 eigenvalues are > 0

plot106

@juliasilge
Copy link
Member

This looks really good, but I think we probably should give a more user-friendly warning about what is happening and why. Can you check out the tidyverse style guide on error messages and add a different warning for these shifts? We should probably use rlang::warn().

Copy link
Collaborator Author

@thisisdaryn thisisdaryn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning messages modified.

> camas_desempleo %>% 
+   select(-CCAA) %>%
+   correlate() %>%
+   network_plot()

Correlation method: 'pearson'
Missing treated using: 'pairwise.complete.obs'

Warning messages:
1: Matrix of coordinates has dimension < 2 
2: Matrix of coordinates has dimension < 2 
3: Matrix of coordinates has dimension < 2 
4: Matrix of coordinates has dimension < 2 
5: Matrix of coordinates has dimension < 2 
6: Matrix of coordinates has dimension < 2 

updated_network_plot

@juliasilge
Copy link
Member

This is looking great! Let's do just one more little round of edits on the warning message itself. I suspect a typical user may think, "Wait, what has gone wrong?! I didn't give this function any matrix of coordinates???" 🤔

What do you think could be a good option for a slightly revamped message?

Coordinates created for plotting have dimension < 2.
Correlations have been shifted numerically to facilitate plotting.
Plot coordinates derived from correlation matrix have dimension < 2.
`correlate()` output has been adjusted to facilitate plotting.

Other ideas?

@thisisdaryn
Copy link
Collaborator Author

Idea: to tweak the latter option

Plot coordinates derived from correlation matrix have dimension < 2.
Pairwise distances have been adjusted to facilitate plotting.

This is a tricky question because the quantity being changed was an approximation anyway.

I don't think it is correct to say either that the correlations have been shifted or that correlate() output has been adjusted. The correlations are being represented in the plot in 2 ways:

  1. The correlations are mapped to the colors of the edges. This mapping is precise and specified in the plot's legend and is not affected by applying the shifts.
  2. The distance between the nodes is based on a function of the correlation. These distances would almost certainly have been approximations even in cases where we don't have to apply these shifts. The shifts increase these distances.

What we are really trying to convey is that the distances between the nodes have knowingly been increased in this instance.

@juliasilge
Copy link
Member

I think that is fantastic. 👍 Let's go with that option.

@thisisdaryn
Copy link
Collaborator Author

Question: how important is it that we pass along/translate warnings from the base R function call internal to the function?

The latest iteration of the PR suppresses a warning from a base R function and paraphrases it in the hope of being more helpful to the user who didn't call that function. So the number of warnings that get to the end-user is still the same as the total number of warnings i.e. one for each time a shift is applied (6 in this case).

Is it an option to provide a single warning to indicate that one or more attempts were made to find a shift distance that works? I think this may be possible to articulate without detracting from the user's ability to locate the error.

Copy link
Collaborator Author

@thisisdaryn thisisdaryn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning messages changed.

@juliasilge
Copy link
Member

Excellent! 🙌

I see what you're saying about the repetition of the messages. My feeling is that it is not that helpful for the user to see the same warning a bunch of times in this situation. What do you think about only showing the user the warning once, when one or more shifts has been changed?

thisisdaryn and others added 2 commits July 26, 2020 10:22
Relaying a single warning message to indicate that the distances have been distorted further for the purpose of placing the points in the network plot. Previous iteration of the code raised a warning each time a shift distance was tried.
@juliasilge
Copy link
Member

Excellent! 🙌

Thank you so much for your work on this.

@juliasilge
Copy link
Member

macOS R devel GH action builds are all broken right now, FWIW.

@juliasilge juliasilge merged commit 65872a6 into tidymodels:master Jul 26, 2020
@github-actions
Copy link

github-actions bot commented Mar 6, 2021

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants