Skip to content

Commit

Permalink
ARROW-484: Revise README to include more detail about software compon…
Browse files Browse the repository at this point in the history
…ents

Also closes #14.

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes #286 from wesm/ARROW-484 and squashes the following commits:

8c6acf6 [Wes McKinney] Tweak description of data containers
ec4b95e [Wes McKinney] Generalize note about binary formats
3d31644 [Wes McKinney] Typos
57b8bf5 [Wes McKinney] Revise README to include more detail about software components
  • Loading branch information
wesm committed Jan 16, 2017
1 parent a098fd0 commit 850774e
Showing 1 changed file with 29 additions and 7 deletions.
36 changes: 29 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,17 +32,39 @@ Arrow is a set of technologies that enable big-data systems to process and move
Initial implementations include:

- [The Arrow Format](https://github.com/apache/arrow/tree/master/format)
- [Arrow Structures and APIs in C++](https://github.com/apache/arrow/tree/master/cpp)
- [Arrow Structures and APIs in Java](https://github.com/apache/arrow/tree/master/java)
- [Java implementation](https://github.com/apache/arrow/tree/master/java)
- [C++ implementation](https://github.com/apache/arrow/tree/master/cpp)
- [Python interface to C++ libraries](https://github.com/apache/arrow/tree/master/python)

Arrow is an [Apache Software Foundation](www.apache.org) project. More info can be found at [arrow.apache.org](http://arrow.apache.org).
Arrow is an [Apache Software Foundation](www.apache.org) project. Learn more at
[arrow.apache.org](http://arrow.apache.org).

#### What's in the Arrow libraries?

The reference Arrow implementations contain a number of distinct software
components:

- Columnar vector and table-like containers (similar to data frames) supporting
flat or nested types
- Fast, language agnostic metadata messaging layer (using Google's Flatbuffers
library)
- Reference-counted off-heap buffer memory management, for zero-copy memory
sharing and handling memory-mapped files
- Low-overhead IO interfaces to files on disk, HDFS (C++ only)
- Self-describing binary wire formats (streaming and batch/file-like) for
remote procedure calls (RPC) and
interprocess communication (IPC)
- Integration tests for verifying binary compatibility between the
implementations (e.g. sending data from Java to C++)
- Conversions to and from other in-memory data structures (e.g. Python's pandas
library)

#### Getting involved

Right now the primary audience for Apache Arrow are the designers and
developers of data systems; most people will use Apache Arrow indirectly
through systems that use it for internal data handling and interoperating with
other Arrow-enabled systems.
Right now the primary audience for Apache Arrow are the developers of data
systems; most people will use Apache Arrow indirectly through systems that use
it for internal data handling and interoperating with other Arrow-enabled
systems.

Even if you do not plan to contribute to Apache Arrow itself or Arrow
integrations in other projects, we'd be happy to have you involved:
Expand Down

0 comments on commit 850774e

Please sign in to comment.