- Overview
- Getting started with Clojure
- Getting started with Incanter
- Documentation and examples
- Building Incanter
- Dependencies
Incanter is a Clojure-based, R-like statistical computing and graphics environment for the JVM. At the core of Incanter are the Parallel Colt numerics library, a multithreaded version of Colt, and the JFreeChart charting library, as well as several other Java and Clojure libraries.
The motivation for creating Incanter is to provide a JVM-based statistical computing and graphics platform with R-like semantics and interactive-programming environment. Running on the JVM provides access to the large number of existing Java libraries for data access, data processing, and presentation. Clojure’s seamless integration with Java makes leveraging these libraries much simpler than is possible in R, and Incanter’s R-like semantics makes statistical programming much simpler than is possible in pure Java.
Motivation for a Lisp-based R-like statistical environment can be found in the paper Back to the Future: Lisp as a Base for a Statistical Computing System by Ihaka and Lang (2008). Incanter is also inspired by the now dormant Lisp-Stat (see the special volume in the Journal of Statistical Software on Lisp-Stat: Past, Present, and Future from 2005).
Motivation for a JVM-based Lisp can be found at the Clojure website, and screencasts of several excellent Clojure talks by the language’s creator, Rich Hickey, can be found at clojure.blip.tv.
For a great introduction to programming in Clojure, read Clojure – Functional Programming for the JVM. by R. Mark Volkmann. For an even more extensive introduction, get one of the books on Clojure Programming Clojure by Stuart Halloway, “The Joy of Clojure” by Michael Fogus and Chris Houser, “Clojure in Action” by Amit Rathore, “Practical Clojure” by Luke VanderHart and Stuart Sierra.
Other Clojure resources
- Clojure website
- Getting Started with Clojure
- Clojure Google group
- clojure.blip.tv
- Disclojure blog
- Full Disclojure screencasts
Start by visiting the Incanter website for an overview, checkout the documentation page for a listing of HOW-TOs and examples, and then download either an Incanter executable or a pre-built version of the latest build of Incanter, which includes all the necessary dependencies, and unpack the file (if you would like to build it from source, read Building Incanter).
Start the Clojure REPL (aka the shell) by double-clicking on the downloaded executable or, if you downloaded the pre-built distribution, running one of the scripts in the Incanter directory: script/repl
or script\repl.bat
on Windows. NOTE: The lein repl task uses Clojure 1.1, and Incanter 1.2.x requires Clojure 1.2, so use the repl script instead of lein.
From the Clojure REPL, load the Incanter libraries:
user=> (use '(incanter core stats charts))
Try an example: sample 1,000 values from a standard-normal distribution and view a histogram:
user=> (view (histogram (sample-normal 1000)))
Try another simple example, a plot of the sine function over the range -4 to 4:
user=> (view (function-plot sin -10 10))
The online documentation for most Incanter functions contain usage examples. The documentation can be viewed using Clojure’s doc
function. For example, to view the documentation and usage examples for the linear-model
function, call (doc linear-model)
from the Clojure shell. Use (find-doc "search term")
to search the online documentation from the Clojure shell. The API documentation can also be found at http://liebke.github.com/incanter.
More Incanter examples
- See the Data-Sorcery blog
- See the Documentation table of contents
The following documentation covers the Incanter and Clojure APIs and the APIs of the underlying java libraries.
Incanter documentation
Related API documentation
To build and test Incanter, you will need to have Leiningen and git installed:
1. Clone the repository with git: git clone git://github.com/liebke/incanter.git
2. Install Leiningen
a. Download the lein script: wget https://github.com/technomancy/leiningen/raw/stable/bin/lein
(use lein.bat on Windows)
b. Place it on your path and chmod it to be executable: chmod +x lein
c. Run: lein self-install
3. From the incanter directory, download the necessary dependencies: lein deps
4. Start a REPL: script/repl
or script\repl.bat
, or start a Swank server: script/swank
or script\swank.bat
Other tasks:
- If you want to run the tests for each of Incanter’s modules, use
script/test
- Each of Incanter’s modules are independent Leiningen projects. Just cd into modules/incanter-* and use Leiningen to build each one as a standalone library.
script/install
uses Leiningen to build all the modules and install them in your local ~/.m2 repository.
- Clojure
- Clojure-Contrib
- Parallel Colt
- Netlib-Java (included with Parallel Colt)
- JFreeChart
- OpenCSV
- iText
- Congomongo
- JLaTeXMath
- Apache POI
- JLine
- Integer overflow in distributions_tests.clj, function
(large-integer-tests). The test uses @(reduce - (repeat 100 2))@ and
(reduce * (repeat 100 3))
. Workaround:
change * to *’ in the test. Open question: propagating number
promotion looks to be a big pain. Should we add ticks to all the
arithmetic operators, or should be force them to bigint by adding an
“N” to the original literals?
- Promotion problems:
- Test failure in (dice-string) (stats_tests.clj:184), comparing a double (0.25) to a Ratio (1/4).
- Test failure in (chebyshev) (stats_tests.clj:236), comparing an integer to a double.
- Workaround: Change from equality operator (=) to equivalence operator (==). This should probably be done comprehensively.
- Bit ops no longer support doubles. Appears to only affect
incanter.core/get-dummies, which was passing a double but didn’t
need to. Have made that explicitly an integer.
(matrix)
promotes to double.
- Equality tests are now problematic. Either use = and double or
bigint literals, or use == and don’t force precision on literals. - However, matrix-to-list or lazy-seq-to-list compares don’t work with
==. Symmetry is broken and and clojure lists don’t look like numeric
lists, and clojure vectors don’t look like algebraic vectors.
- Matrix.java no longer accepted as seq. Previously, implementing ISeq
was sufficient, now it appears the marker interface
clojure.lang.Sequential is also needed. Workaround: added Sequential
to Matrix’s implements clause. Open question: Should this be
Sequential vs. Seqable?
- Compiler complains about
*test-statistic-iterations*
and
*test-statistic-map*
looking like dynamic vars, but not being
declared as such. They aren’t rebound in incanter-core, but the
docstring for test-statistic-distribution implies than an
application can rebind them, so I’m adding^:dynamic
to the decls. - $data was not declared as dynamic, could not be rebound. Workaround:
added ^{:dynamic true} to metadata. Open question: does rebinding
this fit in the 1.3 model for vars? Check with Stu about threading.
- Complains that class clojure.set not found (called from
incanter.core as clojure.set/difference). This didn’t happen when
building under cake, for some reason. Workaround: change call from
clojure.set/difference to just difference, add :use in ns decl.
- From clojure.contrib.core, defvar and defvar- didn’t make it into
core.incubator. Scope: only used by distributions.clj. Workaround:
changed defvar- calls to just def.
- clojure.contrib.combinatorics: moved to math.combinatorics, but no
release made yet. Workaround: added pom.xml to local clone of
https://github.com/clojure/math.combinatorics.git. Built using “mvn
install” to put jar in local repository.
These are things I caught by running examples by hand, but were not
picked up by any test cases.
- $data not used in -core
- Running the first “with-data” function on
http://data-sorcery.org/2010/01/03/datasets_mongodb/ revealed that
“sel” was not being exercised in -core.
read-dataset
with a URL that redirects gives a dataset with the
redirect response as the rows, rather than following the
redirect. Is this how it was under clojure 1.2?
$where
has some examples on the web that use strings as the column
keys. I couldn’t get that to work, had to use keywords
instead. Intended change or accidental?