Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Porting compare.R to JavaScript? #16762

Closed
Trott opened this issue Nov 4, 2017 · 6 comments
Closed

Porting compare.R to JavaScript? #16762

Trott opened this issue Nov 4, 2017 · 6 comments
Labels
benchmark Issues and PRs related to the benchmark subsystem. discuss Issues opened for discussions and feedbacks. python PRs and issues that require attention from people who are familiar with Python.

Comments

@Trott
Copy link
Member

Trott commented Nov 4, 2017

  • Version: 10.0.0-pre
  • Platform: all
  • Subsystem: benchmarks

Any chance the benefits of having benchmark/compare.R functionality in JavaScript would outweigh any downsides?

No opinion from me on it. Just pondering. I know R and Python tend to be the languages of choice for this sort of thing but maybe there's something to be said for a stdlib-js approach to it? /cc @kgryte

(For that matter, would a Python 2 port make sense? At least we'd have one less external tool we rely on, as we need Python 2 for our build tool chain.)

@Trott Trott added benchmark Issues and PRs related to the benchmark subsystem. discuss Issues opened for discussions and feedbacks. python PRs and issues that require attention from people who are familiar with Python. labels Nov 4, 2017
@kenany
Copy link
Contributor

kenany commented Nov 4, 2017

#12585 has some discussion on doing the stats tests in JS/Python.

@refack
Copy link
Contributor

refack commented Nov 4, 2017

The main benefit of R is the ggplot library, but ATM we don't use it in our main benchmarking flows.

I asked an R ninja to do this porting, but he lost interest.
IMHO if we port this as-is, there should be no difference, we just need an implementation of the T-table

@joyeecheung
Copy link
Member

joyeecheung commented Nov 5, 2017

Should we mark this as good-first-issue? One don't really need to know too much about core to do this but if they are a stats ninja they might be interested.

@kgryte
Copy link

kgryte commented Nov 5, 2017

@Trott Thanks for the ping. In stdlib, we have implemented T-test functionality, which seems to be the main feature of the compare.R script not readily achievable in JavaScript. For an implementation, see here. While we would like to be able to say that you can just npm install the package directly, this is not possible at the moment, as we have yet to flip the switch and publish separate packages to npm.

While not available at the moment to use out-of-the-box, the code should provide some insight into what would be required to "roll your own" implementation. Most importantly, a proper T-test implementation won't rely on a T-table. Instead, it will rely on computing the CDF of a Students t-distribution, which can be found here. And computing the CDF, requires computing the incomplete beta function, which is not straightforward.

So, my assessment is that this is not a good first issue. You would need to put in considerable time to actually implement something comparable to R/Python, as we have.

As a stop gap, if you are wanting to rid yourselves of the R dependency, then use SciPy. The talk about using Pandas in the PR thread mentioned above is misguided. The SciPy functionality should work for Python 2.7 and above. You can achieve the box plot functionality using Matplotlib.

Once we have decomposed stdlib into individual packages, you'll be able to do everything in JS. But until then, I would opt for Python.

@refack
Copy link
Contributor

refack commented Nov 6, 2017

@kgryte thank you so more for the input.

As a stop gap, if you are wanting to rid yourselves of the R dependency, then use SciPy.

Getting SciPy installed on Windows while doable is IMHO just a ted more cumbersome than installing the R runtime. So I'd say that's not a big win.

💡
🤓 What we could do is write a t-test-as-a-service, that way we replace the dependency on R with internet access.

But just to put things in perspective, we are not looking for scientific paper grade statistics, we just need a way to measure significance, so IMHO having a precalculated T-table as a rough approximation of the CDF of a Student's t-distribution seems good enough from my POV.

@Planeshifter
Copy link

Planeshifter commented Nov 6, 2017

As a slight addendum to @kgryte's post, I would just like to note that the reason we haven't published to npm is that the final project structure is not cast in stone yet. Almost all of the existing packages are fully functional and thoroughly tested. Also, while we make no guarantees at this point, it's not very likely that the API of the t-test will change in the future.

As of now, our recommended approach to use stdlib-js is to create a bundle of the required functions. We provide a bundling tool for this purpose. Alternatively, we provide UMD bundles for the entire library (https://github.com/stdlib-js/stdlib/tree/develop/dist).

@Trott Trott closed this as completed Jan 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark Issues and PRs related to the benchmark subsystem. discuss Issues opened for discussions and feedbacks. python PRs and issues that require attention from people who are familiar with Python.
Projects
None yet
Development

No branches or pull requests

6 participants