Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: is there / will there be an implementation of Relative Weights Analysis? #113

Open
martinctc opened this issue Dec 27, 2020 · 2 comments

Comments

@martinctc
Copy link

Hi there,

Thank you for developing this wonderful package.

There is a method of estimating variable importance implemented by LeBreton and Tonidandel (2014) yielding very similar results to Shapley called relative weights analysis. I wanted to ask if this implementation is currently available within the package, and if not, is there a roadmap to implement this?

I've previously written a stand-alone package called rwa which implements this method (a wrapper around RWA Web code), which is also available for CRAN. Would love any opportunity to collaborate or combine efforts on this endeavour.

Here are a few other articles which compare the results between this RWA method with Shapley, which mainly is around performance.

Thanks,
Martin

@bgreenwell
Copy link
Member

Hi @martinctc, thanks for reaching out! I started a similar issue a while back regarding similar methodologies for linear regression that I plan on adding in the next release (see the paper linked to in that issue). However, I'm a bit skeptical about RWA (see, for example, this paper). Although all variable importance measures are flawed in some way so that doesn't necessarily mean it wouldn't be a valuable addition. Always happy to collaborate!

I need to do more digging, but I'm curious if there's a connection between the LMG method in package relaimpo and the RWA approach?!

@martinctc
Copy link
Author

martinctc commented Jan 3, 2021

Thanks @bgreenwell!

I've used relaimpo previously, but it does not scale well when there are more predictors, which was the reason why Tonidandel and LeBreton argued for RWA as a superior method. I've also previously used a proprietary analysis software called Q, which I believe in some version of it the Shapley method (I understand as = LMG in relaimpo) automatically uses RWA when a certain number of variables is added. Here's a long (sorry) paragraph from the documentation, but I found it quite useful as a summary of the difference in performance:

Where there are more independent variables the maths is the same, but we need to compute the average across more orderings (e.g., if we have 10 independent variables then we need to compute the R-squares across 1024 regressions). For this reason, each additional variable that is included slows down the computation of the Shapley value. For cases where there are more than 15 independent variables, it is suggested to use Relative Importance Analysis as it runs in a reasonable length of time, in contrast to Shapley, which could take a few minutes to a few hours. Furthermore, the computed Shapley Importance and Relative Importance Analysis yield highly similar results. The user will be prompted if they wish to conduct a Relative Importance Analysis instead in these cases except if more than 27 independent variables are requested. It is not possible to compute a Shapley Importance Analysis when more than 27 independent variables are provided due to limitations of the algorithm. In such scenarios, the Shapley Importance will automatically be converted into a Relative Importance Analysis since Shapley is not possible to compute.

I also love how in your README you specify the four types of variable importance measures, i.e. model-based, permutation-based, Shapley-based, and variable-based. I suspect RWA might fit into the model_based group, because it effectively creates maximally-related orthogonal variables and apply regression.

Let me have a read of the paper! I would argue that it is helpful to provide choice to the user (so outputs can be compared across methods), unless the intention is for the package to be opinionated and opt for a 'less is more' design.

Would love to collaborate anyway. What would be the best way to implement this method in your package in your opinion? Would it be to create a wrapper around rwa?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants