Skip to content

Google Summer of Code 2013 Ideas

pjotrp edited this page Apr 15, 2013 · 43 revisions

Hey! We're pleased to announce that the Ruby Science Foundation has been accepted as a mentoring organization for Google Summer of Code 2013.

Contact

Feel free to reach us by joining #sciruby on chat.freenode.net or via our mailing list.

Instructions for students

You don't need to know a lot about Ruby before proposing a project: depending on how much you already know, it'll be pretty easy to learn enough to be able to contribute. However, you'll need some familiarity with scientific computation. If you don't have any, take a look at "Numerical Recipes in C", which you'll probably find in your university's library.

In any case, if you feel your skills aren't enough for some project, please ask us on our IRC channel (see contact section above) and we can help you.

Projects ideas

NMatrix projects

NMatrix is SciRuby's numerical matrix core, implementing dense matrices as well as two types of sparse (linked-list-based and Yale/CSR). NMatrix is a fairly new but well-established project which has received Summer-of-Code-like grants from both Brighter Planet and the Ruby Association (in other words, from Matz, who created Ruby). Those who contribute to NMatrix will likely eventually become authors of a jointly-published peer-reviewed science article on the library. Additionally, NMatrix is a good place to gain practical C and C++ experience, while also working to improve Ruby.

  • Mentors: John Woods (@mohawkjohn)
  • ATLAS Functionality. NMatrix has many but not all ATLAS (cBLAS) and LAPACK functions exposed. We would like to see a consistent interface which makes sense in Ruby. We also want to be able to design and implement several NMatrix methods which depend upon ATLAS, cBLAS, and cLAPACK functions.
  • Rational Functionality. NMatrix includes some rational number capability, but support is lacking in areas where ATLAS functions are required, since ATLAS does not have a rational type. Rational-specific equivalents of ATLAS functions are needed. Along the way it may be possible to also implement some integer-specific ATLAS function equivalents.
  • List matrix element-wise operations. Element-wise operations work for Yale and dense matrices, but not yet for list matrices.
  • Slicing support. @flipback began implementing slicing support, but it is incomplete. Slicing needs to be integrated into a major portion of NMatrix methods.
  • Basic matrix math functionality. Specifically, exponentials and square roots, matrix decomposition/factorization, calculation of norms, tensor products, principal component analysis (PCA).
  • Statistical functions for matrices and vectors. Statsample needs to support NMatrix, accepting/returning matrices and vectors as well as single values.
  • Sparse improvements. The "new" Yale matrices used by NMatrix, which store diagonals (zero and non-zero) separately from non-diagonal non-zeros, are inefficient for matrices that are taller than they are wide. One way to address the problem would be to introduce an alternate "old" Yale storage. Another would be to allow matrices to be stored and operated on transposed. The goal, overall, is to be able to produce efficient Yale/sparse vectors regardless of the vectors' orientation.
  • extconf improvements. See Ruby-core improvements below. NMatrix uses mkmf for compilation of its C and C++ code, as well as linking ATLAS, LAPACK, and BLAS. But mkmf is difficult to use, and leads to compilation and linking problems -- not just in NMatrix but elsewhere as well, and particularly when working on multiple platforms (Linux, Mac, Windows, etc.). It'd be better to have a custom extconf.rb-related library for NMatrix to use for linking highly-specialize C libraries like ATLAS. A successful implementation of this project would significantly reduce barriers for NMatrix adoption (e.g., by eliminating compiling and linking difficulties).

Ruby-core improvements: mkmf

  • Mentors: John Woods (@mohawkjohn)
  • Ruby-core projects, particularly mkmf require that the student develop a good understanding of C as well as Ruby. Some prior familiarity with C and C++ would be beneficial.
  • mkmf is the library Ruby uses, typically in extconf.rb in gems or other libraries (including NMatrix), for linking C and C++ extensions. It lacks documentation. Most people currently figure it out by trial-and-error. A successful mkmf-related project would accomplish one or both of the following goals:
    1. Provide complete documentation and examples for mkmf, drawing from current Ruby extensions as well as supposing hypothetical extensions.
    2. Propose and implement an update to mkmf, which improves Ruby extension compilation and linking. Such a project would be extremely popular in the broader Ruby community.

SciRuby::Dataset or SciRuby::Dataframe

  • Mentors: Carlos Agarie (@agarie), Claudio Bustos(@clbustos)
  • It needs to be re-designed
  • Must use NMatrix internally for speed and to use its IO module for numeric and integer data.
  • Based on Pandas: http://pandas.pydata.org/pandas-docs/dev/
  • Talk to BioRuby folks about what's necessary in this format.

Create the foundations of a visualization package based on D3

  • Mentors: John Woods (@mohawkjohn), Raoul J.P. Bonnal, Rob Syme, Pjotr Prins
  • D3 is an incredible interactive data visualisation library written in Javascript that runs in a browser.
  • Ruby D3. Rubyvis, our current visualization tool, is a Ruby port of Protovis. Protovis was recently supplanted by D3. We would like to produce a Ruby port of D3. Next to general plots for statistics and scientific analysis we have ideas for special visualisations for Bioinformatics related to genome displays, phylogeny, QTL mapping, etc. as well. Based on existing work in the Ruby bio-graphics and RubyD3 gems, R/qtl, and work done for genometools and the JBrowser, we would like to create a graphics generator that allows for embedding interactive Javascript hooks. The immediate task is to create zoomable interactive figures for a genetic map, pairwise recombination fractions, image of genotype data, LOD curves, 2d scans and QTL effects.
  • Rubyvis/D3 JavaScript helpers for Rails. Rubyvis is pure Ruby code, but Protovis and D3 are Javascript. It would be nice to be able to write Rubyvis code which can either render SVGs directly or produce Javascript code that can render SVGs in a web browser. The goal is to provide interactive scientific tools for Ruby on Rails. This project is too large for one GSoC student, but we would like someone to lay the foundations for a good interactive library that can be used from SciRuby.

Ajaila

  • Mentors: Max Makarochkin (@mac-r), Carlos Agarie (@agarie)
  • It's a modular DSL (Domain Specific Language) for predictive analysis, i.e. you can use it to build diverse kinds of classifiers and systems based on machine learning. If you want to learn about Data Science, this is a very good project to work on.
  • https://github.com/ajaila/ajaila/
  • We must describe what tasks someone could undertake by choosing Ajaila as his/her project.
  • Ruby, data analysis, data science.

Statsample

  • Mentors: Claudio Bustos(@clbustos)
  • Remove support for Ruby version < 1.9.3
  • Create more specs (with current version of RSpec)
  • Improve the documentation based on RDoc and the current style used in NMatrix
  • Create modules for Generalized Lineal Models (GLM) and Time Series Analysis

Minimization

  • Mentors: Claudio Bustos(@clbustos)
  • Remove support for Ruby version < 1.9.3
  • Add more minimization methods
  • Create more specs (with current version of RSpec)
  • Maybe 'optimization', as it'd allow more general algorithms or something?
  • http://docs.scipy.org/doc/scipy/reference/optimize.html
  • Improve the documentation based on RDoc and the current style used in NMatrix

Integration

  • Mentors: Claudio Bustos(@clbustos)
  • Improve its API
  • Remove support for Ruby version < 1.9.3
  • Create more specs (with current version of RSpec)
  • Add more integration methods: be more explicit about each method's imprecisions
  • Add support for ODEs
  • http://docs.scipy.org/doc/scipy/reference/integrate.html
  • Improve the documentation based on RDoc and the current style used in NMatrix

Distribution

  • Mentors: Claudio Bustos(@clbustos)
  • Update to current versions of JRuby and MRI.
  • Remove support for Ruby version < 1.9.3
  • Create more specs (with current version of RSpec)
  • Use a more modular approach to each distribution, i.e. Strategy pattern
  • Improve the documentation based on RDoc and the current style used in NMatrix
  • Related: issue #5.

Machine Learning & Data Mining Algorithms for Ruby

  • Mentors: Raoul J.P. Bonnal, Francesco Strozzi
  • Machine learning and data mining algorithms are widely employed for analyses of complex datasets, especially in bioinformatics. Many Java libraries currently exist that implement the most commonly used algorithms in bioinformatics (such as clustering methods and simple classifiers), but the usability of these tools is restricted by the limited supply of APIs and user-friendly implementations for languages other than Java.
  • Approach: The goal of this project would be to implement a system to easily access these set of tools using Jruby and to develop a basic framework that integrates the different sources. The Java libraries that could be primarily used would be taken from Weka (http://www.cs.waikato.ac.nz/ml/weka/) and RapidMiner (http://rapid-i.com/content/view/181/190/). This approach could be subsequently extended to develop a visualization scheme based on D3.
  • Difficulty and needed skills: Medium/Hard depending on the topic selected and the scope of the project. Basic statistical knowledge is required as well as programming in Ruby, JRuby and Java.
  • The project requires basic statistical knowledge,Ruby,JRuby,Java

Integrate documentation on how to use SciRuby and its subprojects to our current website

  • Mentors: Carlos Agarie (@agarie)
  • Design and build a static website (that'll be located under guides.sciruby.com or sciruby.com/guides) for documentation on how to use SciRuby, NMatrix, Statsample and other projects. You'll need to choose good problems and solve them using SciRuby and its libraries. This isn't a pure documentation project, as you'll be asked to suggest improvements to each library's API as you go along, updating the guides on the way.
  • Must be generated with a single rake command (e.g. rake generate:guides)
  • Use RailsGuides as an inspiration https://github.com/rails/rails/tree/master/guides
  • Related technologies: Web development, Jekyll, API improvement
  • Related: issue #24.

Create a new RDoc template optimized for our purposes

Create a Ruby wrapper for LEMON

  • Mentors:
  • From their site: "LEMON stands for Library for Efficient Modeling and Optimization in Networks. It is a C++ template library providing efficient implementations of common data structures and algorithms with focus on combinatorial optimization tasks connected mainly with graphs and networks."
  • http://lemon.cs.elte.hu/pub/doc/1.2.3/annotated.html
  • This would be a great chance to learn more about Ruby's C API.
  • This library would allow us to create probabilistic graphical models with SciRuby (using statsample and distribution).
  • Ruby, C API, wrapping external libraries, C/C++, graph theory

Create a Ruby wrapper for Waffles

  • Mentors:
  • From their site: "Waffles seeks to be the world's most comprehensive collection of command-line tools for machine learning and data mining. Our native tools have minimal dependencies (no interpreter, VM, or runtime environment is necessary), and build cross-platform. If you have a useful data mining tool that meets these criteria, we want it in Waffles."
  • There are various projects that can be created out of this: -- Wrap the command line interface (much like mini_magick does for imagemagick). -- Create native bindings that link in with NMatrix or Sciruby::Dataset. -- Create an FFI interface.
  • Technologies/skills: Ruby, C/C++, wrapping external libraries, machine learning, statistics, command-line tools

JRuby

  • Mentors:
  • Verify the compatibility between JRuby and SciRuby's subprojects
  • It'd allow us to take advantage of JRuby's multithreaded environment!
  • Research if pure ruby libraries (D3, Rubyvis, minimization) can benefit from using Java "native" extensions and if the extra complexity pays off.
Clone this wiki locally