-
Notifications
You must be signed in to change notification settings - Fork 80
Google Summer of Code 2013 Ideas
Hey! We're pleased to announce that the Ruby Science Foundation has been accepted as a mentoring organization for Google Summer of Code 2013.
Feel free to reach us by joining #sciruby
on chat.freenode.net or via our mailing list.
You don't need to know a lot about Ruby before proposing a project: depending on how much you already know, it'll be pretty easy to learn enough to be able to contribute. However, you'll need some familiarity with scientific computation. If you don't have any, take a look at "Numerical Recipes in C", which you'll probably find in your university's library.
In any case, if you feel your skills aren't enough for some project, please ask us on our IRC channel (see contact section above) and we can help you.
Our number-one priority right now as an organization is NMatrix. Other priorities come close, but so far we haven't seen a lot of students expressing interest in NMatrix. In contrast, tons of folks have talked about how to accomplish the Ruby D3 idea, so there will be a lot of competition for that spot. Take this as a hint.
NMatrix is SciRuby's numerical matrix core, implementing dense matrices as well as two types of sparse (linked-list-based and Yale/CSR). NMatrix is a fairly new but well-established project which has received Summer-of-Code-like grants from both Brighter Planet and the Ruby Association (in other words, from Matz, who created Ruby). Those who contribute to NMatrix will likely eventually become authors of a jointly-published peer-reviewed science article on the library. Additionally, NMatrix is a good place to gain practical C and C++ experience, while also working to improve Ruby.
- Mentors: John Woods (@mohawkjohn)
-
ATLAS Functionality. NMatrix has many but not all ATLAS (cBLAS) and LAPACK functions exposed. We would like to see a consistent interface which makes sense in Ruby. We also want to be able to design and implement several
NMatrix
methods which depend upon ATLAS, cBLAS, and cLAPACK functions. - Rational Functionality. NMatrix includes some rational number capability, but support is lacking in areas where ATLAS functions are required, since ATLAS does not have a rational type. Rational-specific equivalents of ATLAS functions are needed. Along the way it may be possible to also implement some integer-specific ATLAS function equivalents.
- List matrix element-wise operations. Element-wise operations work for Yale and dense matrices, but not yet for list matrices.
- Slicing support. @flipback began implementing slicing support, but it is incomplete. Slicing needs to be integrated into a major portion of NMatrix methods.
- Basic matrix math functionality. Specifically, exponentials and square roots, matrix decomposition/factorization, calculation of norms, tensor products, principal component analysis (PCA).
- Statistical functions for matrices and vectors. Statsample needs to support NMatrix, accepting/returning matrices and vectors as well as single values.
- Sparse improvements. The "new" Yale matrices used by NMatrix, which store diagonals (zero and non-zero) separately from non-diagonal non-zeros, are inefficient for matrices that are taller than they are wide. One way to address the problem would be to introduce an alternate "old" Yale storage. Another would be to allow matrices to be stored and operated on transposed. The goal, overall, is to be able to produce efficient Yale/sparse vectors regardless of the vectors' orientation.
-
extconf improvements. See Ruby-core improvements below. NMatrix uses
mkmf
for compilation of its C and C++ code, as well as linking ATLAS, LAPACK, and BLAS. Butmkmf
is difficult to use, and leads to compilation and linking problems -- not just in NMatrix but elsewhere as well, and particularly when working on multiple platforms (Linux, Mac, Windows, etc.). It'd be better to have a customextconf.rb
-related library for NMatrix to use for linking highly-specialize C libraries like ATLAS. A successful implementation of this project would significantly reduce barriers for NMatrix adoption (e.g., by eliminating compiling and linking difficulties).
- Mentors: John Woods (@mohawkjohn)
- Ruby-core projects, particularly
mkmf
require that the student develop a good understanding of C as well as Ruby. Some prior familiarity with C and C++ would be beneficial. -
mkmf
is the library Ruby uses, typically inextconf.rb
in gems or other libraries (including NMatrix), for linking C and C++ extensions. It lacks documentation. Most people currently figure it out by trial-and-error. A successfulmkmf
-related project would accomplish one or both of the following goals:- Provide complete documentation and examples for
mkmf
, drawing from current Ruby extensions as well as supposing hypothetical extensions. - Propose and implement an update to
mkmf
, which improves Ruby extension compilation and linking. Such a project would be extremely popular in the broader Ruby community.
- Provide complete documentation and examples for
- Mentors: Carlos Agarie (@agarie), Claudio Bustos(@clbustos), Max Makarochkin (@mac-r)
- It needs to be re-designed
- Must use NMatrix internally for speed and to use its IO module for numeric and integer data.
- Based on Pandas: http://pandas.pydata.org/pandas-docs/dev/
- Talk to BioRuby folks about what's necessary in this format.
- Mentors: John Woods (@mohawkjohn), Raoul J.P. Bonnal, Rob Syme, Pjotr Prins
- D3 is an incredible interactive data visualisation library written in Javascript that runs in a browser.
- Ruby D3. Rubyvis, our current visualization tool, is a Ruby port of Protovis. Protovis was recently supplanted by D3. We would like to produce a Ruby port of D3. Next to general plots for statistics and scientific analysis we have ideas for special visualisations for Bioinformatics related to genome displays, phylogeny, QTL mapping, etc. as well. Based on existing work in the Ruby bio-graphics and RubyD3 gems, R/qtl, and work done for genometools and the JBrowser, we would like to create a graphics generator that allows for embedding interactive Javascript hooks. The immediate task is to create zoomable interactive figures for a genetic map, pairwise recombination fractions, image of genotype data, LOD curves, 2d scans and QTL effects.
- Rubyvis/D3 JavaScript helpers for Rails. Rubyvis is pure Ruby code, but Protovis and D3 are Javascript. It would be nice to be able to write Rubyvis code which can either render SVGs directly or produce Javascript code that can render SVGs in a web browser. The goal is to provide interactive scientific tools for Ruby on Rails. This project is too large for one GSoC student, but we would like someone to lay the foundations for a good interactive library that can be used from SciRuby.
- Mentors: Claudio Bustos(@clbustos)
- Remove support for Ruby version < 1.9.3
- Create more tests
- Improve the documentation based on RDoc and the current style used in NMatrix
- Create modules for Generalized Lineal Models (GLM) and Time Series Analysis
- Mentors: Claudio Bustos (@clbustos)
- Remove support for Ruby version < 1.9.3
- Add more minimization methods
- Create more tests
- Maybe 'optimization', as it'd allow more general algorithms or something?
- Improve the documentation based on RDoc and the current style used in NMatrix
- Mentors: Claudio Bustos (@clbustos)
- The objective of this project is to implement more integration methods and add support for solving various types of (ordinary and/or partial) differential equations. We need to be explicit about the imprecisions and performance of each method, so benchmarks will be necessary. Of course, the student is expected to write tests.
- Remove support for Ruby version < 1.9.3
- Mentors: Raoul J.P. Bonnal, Francesco Strozzi
- Machine learning and data mining algorithms are widely employed for analyses of complex datasets, especially in bioinformatics. Many Java libraries currently exist that implement the most commonly used algorithms in bioinformatics (such as clustering methods and simple classifiers), but the usability of these tools is restricted by the limited supply of APIs and user-friendly implementations for languages other than Java.
- Approach: The goal of this project would be to implement a system to easily access these set of tools using JRuby and to develop a basic framework that integrates the different sources. The Java libraries that could be primarily used would be taken from Weka (http://www.cs.waikato.ac.nz/ml/weka/) and RapidMiner (http://rapid-i.com/content/view/181/190/). This approach could be subsequently extended to develop a visualization scheme based on D3.
- Another idea is to integrate Waffles: "Waffles seeks to be the world's most comprehensive collection of command-line tools for machine learning and data mining. Our native tools have minimal dependencies (no interpreter, VM, or runtime environment is necessary), and build cross-platform. If you have a useful data mining tool that meets these criteria, we want it in Waffles.". We would want to wrap the command line interface (much like mini_magick does for imagemagick) and/or create native bindings that link in with NMatrix or Sciruby::Dataset.
- Difficulty and needed skills: Medium/Hard depending on the topic selected and the scope of the project. Basic statistical knowledge is required as well as programming in Ruby, JRuby and Java.
- The project requires basic statistical knowledge,Ruby,JRuby,Java and possibly C/C++, wrapping external libraries, machine learning
- Mentors: Pjotr Prins, Toshiaki Katayama, Mark Wilkinson, Jerven Bolleman
- Interactive scientific tools tend to manage complex state in RAM and allows persistence after an analysis session. One example for statistics is R and its environment. The downside of this approach is that the amount of data handled is limited by memory size of the machine. We propose to use a triple-store as a generic backend for interactive data analysis and session persistence in interactive Ruby. The use of a flexible data store that allows complex objects and querying is rather appealing. Applications in bioinformatics would especially be gratifying. For example The bioinformatics community is doing a lot of work integrating different data repositories through RDF. For example Bio2RDF and SADI. A list of activities can be found here. BioRuby and biogems contain a wide range of parsers and formatters which could be extended to support reading and writing RDF. Having such functionality would make it easy for bioinformaticians to incorporate and expose RDF for flexible data queries.
- Approach: We will visit all existing classes, parsers and formatters and decide which ones are most useful for RDF import/export. The student will tackle one transformer at a time, writing tests and adding a SPARQL end point for others to use. The student will also add SADI service discovery.
- Difficulty and needed skills: Average difficulty
- The student will need to have affinity with the semantic web and get to a decent level op Ruby programming. Probably includes meta-programming.
- Mentors:
- From their site: "LEMON stands for Library for Efficient Modeling and Optimization in Networks. It is a C++ template library providing efficient implementations of common data structures and algorithms with focus on combinatorial optimization tasks connected mainly with graphs and networks."
- http://lemon.cs.elte.hu/pub/doc/1.2.3/annotated.html
- This would be a great chance to learn more about Ruby's C API.
- This library would allow us to create probabilistic graphical models with SciRuby (using statsample and distribution).
- Ruby, C API, wrapping external libraries, C/C++, graph theory