-
Notifications
You must be signed in to change notification settings - Fork 80
Google Summer of Code 2014 Ideas
Feel free to reach us by joining #sciruby
on chat.freenode.net or via our mailing list.
We strongly recommend that you pick one of the ideas listed below. We value contributions in advance of GSoC, even if they're just little ones. Go pick out something in one of our trackers and work on it, talk to folks on the listserv, and get an idea for what features are needed.
You don't need to know a lot about Ruby to work on a project: depending on how much you already know, it'll be pretty easy to learn enough to be able to contribute. However, you may need some familiarity with scientific computation. If you don't have any, take a look at "Numerical Recipes in C", which you'll probably find in your university's library.
In any case, if you feel your skills aren't enough for some project, please ask us on our IRC channel (see contact section above) and we can help you.
Our number-one priority right now as an organization is NMatrix. Our number-two priority is most likely Minimization. We need both of these libraries for some more complicated statistical methods, some of which were actually written in pure Ruby in last year's Summer of Code; we want to speed up those methods, and to do so, we need a framework that can make use of pure C and Java facilities (which Minimization can't currently do).
NMatrix is SciRuby's numerical matrix core, implementing dense matrices as well as two types of sparse (linked-list-based and Yale/CSR). NMatrix is a fairly new but well-established project which has received Summer-of-Code-like grants from both Brighter Planet and the Ruby Association (in other words, from Matz, who created Ruby). Those who contribute to NMatrix will likely eventually become authors of a jointly-published peer-reviewed science article on the library. Additionally, NMatrix is a good place to gain practical C and C++ experience, while also working to improve Ruby.
- Mentors: John Woods (@mohawkjohn), Anna Belak, Colin Fuller (@cjfuller)
NMatrix currently relies on ATLAS/CBLAS/CLAPACK and standard LAPACK for several of its linear algebra operations. In some cases, native versions of the functions are implemented, so that the libraries are not required. There are quite a number of areas for growth in terms of the capabilities of NMatrix here.
Currently, the ATLAS/CBLAS/CLAPACK interface is built in to NMatrix, which makes NMatrix trickier to compile and link than it needs to be. Plus, there are use-cases of NMatrix that should not require ATLAS at all. It'd be good to see the ATLAS functionality — including ATLAS functions which NMatrix re-implements for rational numbers — abstracted into a separate gem (nmatrix-atlas
). In addition, NMatrix should have the ability to leverage other libraries that might be installed, such as eigen3, or maybe even boost (nmatrix-eigen3
, nmatrix-boost
, and perhaps nmatrix-gsl
). NMatrix should be able to switch seamlessly between them. One important design question to think about when applying: How does NMatrix choose which library to use if all three implement a given function? For example, if eigen3 and atlas both have matrix multiplication, which one should be used?
A related project is the writing of eigen3 and boost interfaces for NMatrix, though these are lower priority than adapting NMatrix to ATLAS. Another option is the Intel Math Kernel Library. Work in these areas would likely depend upon the F2RB project discussed further down the page, or perhaps FFI.
As mentioned, NMatrix depends upon other libraries for many of its math functions. It makes sense to implement a lot of this functionality within NMatrix, if it can be done relatively efficiently. In particular, we would like to improve support for rational numbers in NMatrix, as well as standard Ruby objects (NMatrix can store things as integers, floating points, rationals, and Ruby objects, but how it treats those things is undefined in several cases).
A key subtask involves improving NMatrix's rational types. It doesn't make sense for us to have rational32 or rational64 types; rational128 is going to be the only one with sufficient accuracy for most purposes, and even that may not be enough for — say — astrodynamics. But some research needs to be done on how to manage overflow in this type. How does Ruby handle rational numbers with large numerators or denominators? How do other libraries handle such numbers? While eliminating rational32 and rational64, can we add a new, more graceful rational type?
It is also conceivable that an exceptionally bright student could envision a project which extended beyond rational to semi-symbolic computations. For example, many matrix calculations involve fractions like PI/2
or 3*sqrt(2)/2
. Neither of these are rational at all! But they're so common that it should be easy to represent them. We call this semi-symbolic because there are no actual variables in the expressions — only real factors. How would computations occur on these matrices? This is a very difficult project, but would likely result in some serious credibility for an ambitious student.
Specifically, exponentials and square roots, matrix decomposition/factorization, calculation of norms, tensor products, principal component analysis (PCA), and many others. These functions are all enormously important, and would substantially improve the usability of NMatrix. Successful implementation would likely lead to co-authorship on a peer-reviewed article, and at the very least would look outstanding on a curriculum vitae.
- Mentors: John Woods (@mohawkjohn), Pjotr Prins (@pjotrp)
This is a new project idea which was identified as a major need for scientists working in Ruby.
Many critical science and math algorithms are written in Fortran, and it would be lovely to be able to quickly generate Ruby interfaces for these Fortran libraries — much like we can generate interfaces for C libraries in Ruby using FFI. Such a program already exists for Python (f2py). This would dramatically improve Ruby's usability for scientists and engineers.
This project requires some design and research (what solutions have already been tried, and what doesn't work?), and relates directly to the NMatrix ATLAS interface project idea.
It may be that generating Ruby-Fortran foreign function interfaces (FFIs, not to be confused with the FFI above) is hard and takes the whole summer. If so, so be it; but if in proposing the project you find that it's relatively easy, the following is suggested: create an autogenerator. Does this require use of f2c
, which translates Fortran code to C? Or can it be as simple as require_fortran "source.f"
? If you apply for GSoC with this project idea, please expect to write some sample code in your application and explain in detail how you plan to make it work.
- Mentors: Carlos Agarie (@agarie), Claudio Bustos(@clbustos), John Prince (@jtprince)
- SciRuby::Dataframe will be an implementation of a concept similar to Pandas (http://pandas.pydata.org/pandas-docs/dev/), which provides data structures usable by more powerful data analysis packages, like Dataframes (similar to R).
- Some requirements:
- Have some simple statistics built-in (statsample as a dependency): averages, quartiles, standard deviation, median, etc.
- Be really easy to plot. For example, a user should be able to plot a histogram from a Series object with only one method call, maybe two, without much hassle. Integration with Plotrb (see below) would be great.
- Easily receive and interpret data from a CSV file (or any delimeter separated value file), transforming it into a Dataframe with something as simple as
SciRuby::Dataframe.csv("data.csv")
. This would be a wrapper around Ruby'sCSV
class. Also, chunk processing of CSV files will be necessary. Thefaster_csv
gem implements this. - Be able to add/remove columns and do operations on rows or columns. For simple operations, this should be very very easy by using NMatrix's referenced slices.
- Have labeled columns and indexed rows. This means that the underlying data structure (NMatrix wrapper) will need to store some metadata.
- Use NMatrix for data storage. This also implies that we can use the NMatrix::IO module.
- As some of the requirements of this project depend on others (visualization, statistics, etc), the most important part is to design and develop it in such a way for its API to be easy to use for new users (e.g. scientists without much programming background) but extensible enough for other projects to use it.
- Inspired by Pandas and Statsample::Dataset.
- Mentors: Pjotr Prins, Rob Syme, John Woods
- D3 is an interactive plotting system for the browser. GSoC 2013 delivered static plotting through VEGA, a JSON style description of a figure. Having learnt from that project we want to create a more advanced system where Ruby generates JavaScript and JavaScript can be mixed with Ruby. The main challenge is to get from SciRuby data structures to D3 plots in the nicest possible way and still allow for the flexibility and interactivity that D3 allows.
- Requirements: This is an advanced project that requires use of powerful Ruby paradigms and a deep understanding of D3
- See also plotrb
- Mentors: John Woods (@mohawkjohn), Wan Zuhao, Carlos Agarie (@agarie)
Ruby needs a science notebook and instant plotting solution. Two examples of such pieces of software are IPython notebook and Mathematica. Ruby already has useful plotting tools, including Plotrb and Rubyvis, the first of which was a GSoC 2013 project by Wan Zuhao. Both of Plotrb and Rubyvis produce publication-quality graphics. However, sometimes a researcher doesn't need publication quality; she needs a quick-and-dirty prototype. This is where interactive notebooks come in handy. Work on a Ruby notebook might also necessitate development of some additional plotting software, but is primarily a user interface and user experience project. This project is code named "Redbook."
See GSOC 2014: Proposal for a Ruby Notebook by @minad for a more detailed description of this idea.
- Mentors: Claudio Bustos (@clbustos), John Woods (@mohawkjohn), Carlos Agarie (@agarie)
Minimization and Integration are two SciRuby modules which are used by Claudio Bustos' statsample gem. For Minimization, students would research and suggest additional minimization methods, develop tests, and improve documentation. For Integration, students would implement additional numerical integration methods and add support for solving various types of (ordinary and/or partial) differential equations. We need to be explicit about the imprecisions and performance of each method, so benchmarks will be necessary. As always, the student is expected to write tests and document code. There has been some talk of removing support for Ruby versions earlier than 1.9.3 for both Integration and Minimization.
- Standardized minimization framework. Right now, Minimization is pure Ruby. There are additional minimization algorithms in GSL and probably in Java which can be used in Ruby. It'd be great to have a standardized framework so that pure Ruby functions can be used, or C functions if using MRI/YARV, or Java functions if using JRuby. Such a thing is already done for the Distribution gem, so that can be used as a model.
- Mentors: Pjotr Prins, Toshiaki Katayama, Will Strinz, Mark Wilkinson, Jerven Bolleman
- Goal: Develop a flexible input interface allowing key SciRuby tools to consume RDF data.
- SciRuby encompasses a number of useful projects for analysing and presenting statistics and other scientific data. However, there are at present few interfaces for extracting input from large datasets. The Semantic Web provides a W3C standardized format known as RDF for representing data and metadata in a distributed, machine understandable way. RDF is very flexible, but can be given a more restricted structure using schema-like vocabularies and ontologies, allowing datasets and individual statements to be fully contextualized within their domain. This smoothes the path to extracting and analyzing useful information with the SPARQL query language. While there are a number of available datasets in the format, and a variety of tools for converting other formats into it, some level of in depth knowledge is still required to use these data sources with SciRuby gems such as NMatrix and StatSample. Building an extensible and reusable interface that is accessible to Rubyists who are less familiar with the Semantic Web would open up a range of new applications for the tools provided by SciRuby.
- Tasks:
- Start with building a generalized container interface to NMatrix
- Provide an abstracted interface and semantic annotations, as well as a reference to the model object
- If a student is working on the SciRuby::Dataframe project, coordinate your efforts and build off of their work. If not, implement the basic functionality of the Dataframe class and add semantic annotations to it.
- Allow your container to read from dimensional RDF sources such as Datacube
- requires both SPARQL queries to extract data and semantic annotations for the container
- Must preserve information to allow transforms in both directions
- Implement import for more ontologies and/or extend the container interface to more SciRuby libraries
- Start with building a generalized container interface to NMatrix
- Difficulty and needed skills: Average difficulty
- The student will need to have affinity with the semantic web and the SPARQL query language, and get to a decent level of Ruby programming. Probably includes meta-programming.