-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reproducibility and estimated measurements #24
Comments
Hi @parmentelat, Sure! Some of the experiments with leave-one-out cross-validation (LOOCV) would take extremely long (up to several decades in one instance) to run to completion - especially the ones using the scikit-learn implementation. For this reason, I opted to let the experiments run for a while and then, after some time, use the ratio of (time passed / cross validation splits completed) to forecast how much time would pass if I had let the entire experiment run to completion. This is a sensible thing to do as the splits are balanced. Thus, each split is expected to take the same amount of time to complete. I waited until some multiple of Automation of this approach may be feasible for the regular CPU implementations by emptying some of cv=KFold() (currently defined on line 288 in timings.py. Automation using the fast cross-validation implementation is likely not necessary at it is extremely fast - even for LOOCV with 1e6 samples. However, it can likely be supported by simply removing some multiple of Automation using the JAX implementations can probably be achieved in the same manner as for the fast cross-validation implementation by removing some of the indices in These are my three ideas for automation of the estimation process. However, I have not actually tried it and it may be the case that I have missed some detail that makes automation more challenging. Please let me know if you will accept the manual approach. Otherwise, I will try and implement my ideas as stated above. |
hi @Sm00thix I must admit it would be very nice to be able to script the process of re-obtaining all the numbers depicted in the paper's main figure regardless, once you are done with the improvements if any, I would recommend that you enrich |
…so modified paper/README.md to clarify the estimation process. Also updated the notebook to reflect the newly added --estimate flag in time_pls.py. This is related to #24
Hi @basileMarchand and @parmentelat, To accommodate your request to automate the benchmark estimation process, I have attempted to implement the ideas I described in my previous comment in this thread. I was successful for the regular CPU implementations. These are scikit-learn's NIPALS and my own NumPy-based IKPLS implementations - i.e., the ones that can be benchmarked with time_pls.py with the flags For the JAX implementations and fast cross-validation implementation, I was unable to implement automation of the estimation process. Instead I updated the These changes are visible in 841dc4a. If you agree with these changes, I will merge them into main. Please let me know which option you prefer. I will then proceed as you wish and close the issue. Below are my explanations for why I was unable to automate the benchmark estimation of the JAX implementations and the fast cross-validation implementation: I was not successful for the JAX implementations. The reason being the way I implemented the cross-validation where simply removing elements from For the fast cross-validation algorithm, timing the execution of a small number of cross-validation splits, computing the time/split ratio, and then linearly forecasting a time estimate for computing the whole cross-validation fails. |
hi @Sm00thix thanks for your work in this area, I really believe this kind of apparently unrewarding task contributes to a much better material for others to toy with :) My advise would be to
thanks again for bearing with me on this one :) |
* Added support for estimation of benchmarking for sk, np1, and np2. Also modified paper/README.md to clarify the estimation process. Also updated the notebook to reflect the newly added --estimate flag in time_pls.py. This is related to #24 * reproducibility improvement (#30) * clarify the estimation methods * refactor the notebook for reproducing results more options available from the command line more consistent way to implement the various filtering stages * reproducing notebook: replace -s (bool) with -s <n> this way we can go for more and more complex scenarios also the shortcut for --dry-run becomes more traditional -n * Clarified instructions for manual benchmarking in paper/README.md. Related to #24 --------- Co-authored-by: parmentelat <thierry.parmentelat@inria.fr>
Hi @parmentelat, Thanks for your assistance on this one. I agree that we have made it easier for others to toy around! I have merged your pull request to the benchmark branch and, in turn, merged the benchmark branch with main. I have also tried to improve the very last paragraph of paper/README.md as per your instructions. All the changes are merged to main in ee3de3e If you agree with these changes, I think we can close this issue :-) |
hey @Sm00thix actually I am currently trying to recompute all the data from the figure in the process I have run into 2 separate issues
I'll get this repro script to work as far as I can, and will file a PR once I'm done; please keep this open until then |
on a side track, I am seeing this during my reproducibility attempts, with jax-related runs
|
Hi @parmentelat, In regards to your benchmarks: Sounds good. Depending on your machine, running all the benchmarks may take a few weeks if I recall correctly. That is, if you estimate the same ones I did and run to completion the same ones I did. I'm sorry if I broke your notebook with some of my own changes. Be aware that a NaN in the 'inferred' column in `paper/timings/timings.csv' is to be regarded as False. That is, I (admittedly somewhat foolishly), did not write anything in that column if I did not estimate the value. Doing boolean logic with NaN's is error prone and I tried to account for that when I modified your script. Please let me know if I can be of any help in this regard. I will keep this issue open until that is resolved. In relation to the FutureWarning that you mention in your comment, I found the culprit and fixed it. I commented on the details in 9cf6cc7. I also ran a few of the JAX benchmarks and noticed no difference when compared to my original benchmarks. I initially forgot to bump the All of these changes are currently in the dev branch which I will merge to main once all tests pass. Alright. I had made a couple of errors which caused the computation of gradients using reverse mode differentiation to fail. I fixed those in 97c9024. UpdateI merged the dev branch to main. |
the paper, at least no worries about the notebook, the first version was very rough and needed ironing anyways; plus your code was actually working, so... also for clarity, none of the warnings that I reported are to be deemed showstoppers, it's just FYI in case you'd have missed them |
Hi @parmentelat, I merged your PR in b27201b. Thanks for your work on this one! Do you want to wait until your benchmarks have completed before we close this issue? Otherwise, I suggest we close it now :-) |
ok for closing now |
a final word about the paper's reproducibility
some values in the figure appear with a circle; about that the legend states that:
can you please comment further about the practical means, if any, to achieve that, and about possible means to automate that process nevertheless ?
The text was updated successfully, but these errors were encountered: