diff --git a/docs/code/IDataViewDesignPrinciples.md b/docs/code/IDataViewDesignPrinciples.md
new file mode 100644
index 0000000000..c3f345bf68
--- /dev/null
+++ b/docs/code/IDataViewDesignPrinciples.md
@@ -0,0 +1,471 @@
+# IDataView Design Principles
+
+## Overview
+
+### Brief Introduction to IDataView
+
+The *IDataView system* is a set of interfaces and components that provide
+efficient, compositional processing of schematized data for machine learning
+and advanced analytics applications. It is designed to gracefully and
+efficiently handle high dimensional data and large data sets. It does not
+directly address distributed data and computation, but is suitable for single
+node processing of data partitions belonging to larger distributed data sets.
+
+IDataView is the data pipeline machinery for ML.NET. Microsoft teams consuming
+this library have implemented libraries of IDataView related components
+(loaders, transforms, savers, trainers, predictors, etc.) and have validated
+the performance, scalability and task flexibility benefits.
+
+The name IDataView was inspired from the database world, where the term table
+typically indicates a mutable body of data, while a view is the result of a
+query on one or more tables or views, and is generally immutable. Note that
+both tables and views are schematized, being organized into typed columns and
+rows conforming to the column types. Views differ from tables in several ways:
+
+* Views are *composable*. New views are formed by applying transformations
+  (queries) to other views. In contrast, forming a new table from an existing
+  table involves copying data, making the tables decoupled; the new table is
+  not linked to the original table in any way.
+
+* Views are *virtual*; tables are fully realized/persisted. In other words, a
+  table contains the values in the rows while a view computes values from
+  other views or tables, so does not contain or own the values.
+
+* Views are *immutable*; tables are mutable. Since a view does not contain
+  values, but merely computes values from its source views, there is no
+  mechanism for modifying the values.
+
+Note that immutability and compositionality are critical enablers of
+technologies that require reasoning over transformation, like query
+optimization and remoting. Immutability is also key for concurrency and thread
+safety. Views being virtual minimizes I/O, memory allocation, and computation.
+Information is accessed, memory is allocated, and computation is performed,
+only when needed to satisfy a local request for information.
+
+### Design Requirements
+
+The IDataView design fulfills the following design requirements:
+
+* **General schema**: Each view carries schema information, which specifies
+  the names and types of the view's columns, together with metadata associated
+  with the columns. The system is optimized for a reasonably small number of
+  columns (hundreds). See [here](#basics).
+
+* **Open type system**: The column type system is open, in the sense that new
+  data types can be introduced at any time and in any assembly. There is a set
+  of standard types (which may grow over time), but there is no registry of
+  all supported types. See [here](#basics).
+
+* **High dimensional data support**: The type system for columns includes
+  homogeneous vector types, so a set of related primitive values can be
+  grouped into a single vector-valued column. See [here](#vector-types).
+
+* **Compositional**: The IDataView design supports components of various
+  kinds, and supports composing multiple primitive components to achieve
+  higher-level semantics. See [here](#components).
+
+* **Open component system**: While the ML.NET code has a growing large library
+  of IDataView components, additional components that interoperate with these
+  may be implemented in other code bases. See [here](#components).
+
+* **Cursoring**: The rows of a view are accessed sequentially via a row
+  cursor. Multiple cursors can be active on the same view, both sequentially
+  and in parallel. In particular, views support multiple iterations through
+  the rows. Each cursor has a set of active columns, specified at cursor
+  construction time. Shuffling is supported via an optional random number
+  generator passed at cursor construction time. See [here](#cursoring).
+
+* **Lazy computation**: When only a subset of columns or a subset of rows is
+  requested, computation for other columns and rows can be, and generally is,
+  avoided. Certain transforms, loaders, and caching scenarios may be
+  speculative or eager in their computation, but the default is to perform
+  only computation needed for the requested columns and rows. See
+  [here](#lazy-computation-and-active-columns).
+
+* **Immutability and repeatability**: The data served by a view is immutable
+  and any computations performed are repeatable. In particular, multiple
+  cursors on the view produce the same row values in the same order (when
+  using the same shuffling). See [here](#immutability-and-repeatability).
+
+* **Memory efficiency**: The IDataView design includes cooperative buffer
+  sharing patterns that eliminate the need to allocate objects or buffers for
+  each row when cursoring through a view. See [here](#memory-efficiency).
+
+* **Batch-parallel computation**: The IDataView system includes the ability to
+  get a set of cursors that can be executed in parallel, with each individual
+  cursor serving up a subset of the rows. Splitting into multiple cursors can
+  be done either at the loader level or at an arbitrary point in a pipeline.
+  The component that performs splitting also provides the consolidation logic.
+  This enables computation heavy pipelines to leverage multiple cores without
+  complicating each individual transform implementation. See
+  [here](#batch-parallel-cursoring).
+
+* **Large data support**: Constructing views on data files and cursoring
+  through the rows of a view does not require the entire data to fit in
+  memory. Conversely, when the entire data fits, there is nothing preventing
+  it from being loaded entirely in memory. See [here](#data-size).
+
+### Design Non-requirements
+
+The IDataView system design does *not* include the following:
+
+* **Multi-view schema information**: There is no direct support for specifying
+  cross-view schema information, for example, that certain columns are primary
+  keys, and that there are foreign key relationships among tables. However,
+  the column metadata support, together with conventions, may be used to
+  represent such information.
+
+* **Standard ML schema**: The IDataView system does not define, nor prescribe,
+  standard ML schema representation. For example, it does not dictate
+  representation of nor distinction between different semantic interpretations
+  of columns, such as label, feature, score, weight, etc. However, the column
+  metadata support, together with conventions, may be used to represent such
+  interpretations.
+
+* **Row count**: A view is not required to provide its row count. The
+  `IDataView` interface has a `GetRowCount` method with type `Nullable<long>`.
+  When this returns `null`, the row count is not available directly from the
+  view.
+
+* **Efficient indexed row access**: There is no standard way in the IDataView
+  system to request the values for a specific row number. While the
+  `IRowCursor` interface has a `MoveMany(long count)` method, it only supports
+  moving forward `(count > 0)`, and is not necessarily more efficient than
+  calling `MoveNext()` repeatedly. See [here](#row-cursor).
+
+* **Data file formats**: The IDataView system does not dictate storage or
+  transport formats. It *does* include interfaces for loader and saver
+  components. The ML.NET code has implementations of loaders and savers for
+  some binary and text file formats.
+
+* **Multi-node computation over multiple data partitions**: The IDataView
+  design is focused on single node computation. We expect that in multi-node
+  applications, each node will be given its own data partition(s) to operate
+  on, with aggregation happening outside an IDataView pipeline.
+
+## Schema and Type System
+
+### Basics
+
+IDataView has general schema support, in that a view can have an arbitrary
+number of columns, each having an associated name, index, data type, and
+optional metadata.
+
+Column names are case sensitive. Multiple columns can share the same name, in
+which case, one of the columns hides the others, in the sense that the name
+will map to one of the column indices, the visible one. All user interaction
+with columns should be via name, not index, so the hidden columns are
+generally invisible to the user. However, hidden columns are often useful for
+diagnostic purposes.
+
+The set of supported column data types forms an open type system, in the sense
+that additional types can be added at any time and in any assembly. However,
+there is a precisely defined set of standard types including:
+
+* Text
+* Boolean
+* Single and Double precision floating point
+* Signed integer values using 1, 2, 4, or 8 bytes
+* Unsigned integer values using 1, 2, 4, or 8 bytes
+* Unsigned 16 byte values for ids and probabilistically unique hashes
+* Date time, date time zone, and timespan
+* Key types
+* Vector types
+
+The set of standard types will likely be expanded over time.
+
+The IDataView type system is specified in a separate document, *IDataView Type
+System Specification*.
+
+IDataView provides a general mechanism for associating semantic metadata with
+columns, such as designating sets of score columns, names associated with the
+individual slots of a vector-valued column, values associated with a key type
+column, whether a column's data is normalized, etc.
+
+While IDataView schema supports an arbitrary number of columns, it, like most
+schematized data systems, is designed for a modest number of columns,
+typically, limited to a few hundred. When a large number of *features* are
+required, the features should be gathered into one or more vector-valued
+columns, as discussed in the next section. This is important for both user
+experience and performance.
+
+### Vector Types
+
+Machine learning and advanced analytics applications often involve high-
+dimensional data. For example, a common technique for learning from text,
+known as [bag-of-words](https://en.wikipedia.org/wiki/Bag-of-words_model),
+represents each word in the text as a numeric feature containing the number of
+occurrences of that word. Another technique is indicator or one-hot encoding
+of categorical values, where, for example, a text-valued column containing a
+person's last name is expanded to a set of features, one for each possible
+name (Tesla, Lincoln, Gandhi, Zhang, etc.), with a value of one for the
+feature corresponding to the name, and the remaining features having value
+zero. Variations of these techniques use hashing in place of dictionary
+lookup. With hashing, it is common to use 20 bits or more for the hash value,
+producing `2^^20` (about a million) features or more.
+
+These techniques typically generate an enormous number of features.
+Representing each feature as an individual column is far from ideal, both from
+the perspective of how the user interacts with the information and how the
+information is managed in the schematized system. The solution is to represent
+each set of features, whether indicator values, or bag-of-words counts, as a
+single vector-valued column.
+
+A vector type specifies an item type and optional dimensionality information.
+The item type must be a primitive, non-vector, type. The optional
+dimensionality information specifies, at the basic level, the number of items
+in the corresponding vector values.
+
+When the size is unspecified, the vector type is variable-length, and
+corresponding vector values may have any length. A tokenization transform,
+that maps a text value to the sequence of individual terms in that text,
+naturally produces variable-length vectors of text. Then, a hashing ngram
+transform may map the variable-length vectors of text to a bag-of-ngrams
+representation, which naturally produces numeric vectors of length `2^^k`,
+where `k` is the number of bits used in the hash function.
+
+### Key Types
+
+The IDataView system includes the concept of key types. Key types are used for
+data that is represented numerically, but where the order and/or magnitude of
+the values is not semantically meaningful. For example, hash values, social
+security numbers, and the index of a term in a dictionary are all best modeled
+with a key type.
+
+## Components
+
+The IDataView system includes several standard kinds of components and the
+ability to compose them to produce efficient data pipelines. A loader
+represents a data source as an `IDataView`. A transform is applied to an
+`IDataView` to produce a derived `IDataView`. A saver serializes the data
+produced by an `IDataView` to a stream, in some cases in a format that can be
+read by a loader. There are other more specific kinds of components defined
+and used by the ML.NET code base, for example, scorers, evaluators, joins, and
+caches. While there are several standard kinds of components, the set of
+component kinds is open.
+
+### Transforms
+
+Transforms are a foundational kind of IDataView component. Transforms take an
+IDataView as input and produce an IDataView as output. Many transforms simply
+"add" one or more computed columns to their input schema. More precisely,
+their output schema includes all the columns of the input schema, plus some
+additional columns, whose values are computed from some of the input column
+values. It is common for an added column to have the same name as an input
+column, in which case, the added column hides the input column. Both the
+original column and new column are present in the output schema and available
+for downstream components (in particular, savers and diagnostic tools) to
+inspect. For example, a normalization transform may, for each slot of a
+vector-valued column named Features, apply an offset and scale factor and
+bundle the results in a new vector-valued column, also named Features. From
+the user's perspective (which is entirely based on column names), the Features
+column was "modified" by the transform, but the original values are available
+downstream via the hidden column.
+
+Some transforms require training, meaning that their precise behavior is
+determined automatically from some training data. For example, normalizers and
+dictionary-based mappers, such as the TermTransform, build their state from
+training data. Training occurs when the transform is instantiated from user-
+provided parameters. Typically, the transform behavior is later serialized.
+When deserialized, the transform is not retrained; its behavior is entirely
+determined by the serialized information.
+
+### Composition Examples
+
+Multiple primitive transforms may be applied to achieve higher-level
+semantics. For example, ML.NET's `CategoricalTransform` is the composition of
+two more primitive transforms, `TermTransform`, which maps each term to a key
+value via a dictionary, and `KeyToVectorTransform`, which maps from key value
+to indicator vector. Similarly, `CategoricalHashTransform` is the composition
+of `HashTransform`, which maps each term to a key value via hashing, and
+`KeyToVectorTransform`.
+
+Similarly, `WordBagTransform` and `WordHashBagTransform` are each the
+composition of three transforms. `WordBagTransform` consists of
+`WordTokenizeTransform`, `TermTransform`, and `NgramTransform`, while
+`WordHashBagTransform` consists of `WordTokenizeTransform`, `HashTransform`,
+and `NgramHashTransform`.
+
+## Cursoring
+
+### Row Cursor
+
+To access the data in a view, one gets a row cursor from the view by calling
+the `GetRowCursor` method. The row cursor is a movable window onto a single
+row of the view, known as the current row. The row cursor provides the column
+values of the current row. The `MoveNext()` method of the cursor advances to
+the next row. There is also a `MoveMany(long count)` method, which is
+semantically equivalent to calling `MoveNext()` repeatedly, `count` times.
+
+Note that a row cursor is not thread safe; it should be used in a single
+execution thread. However, multiple cursors can be active simultaneously on
+the same or different threads.
+
+### Lazy Computation and Active Columns
+
+It is common in a data pipeline for a down-stream component to only require a
+small subset of the information produced by the pipeline. For example, code
+that needs to build a dictionary of all terms used in a particular text column
+does not need to iterate over any other columns. Similarly, code to display
+the first 100 rows does not need to iterate through all rows. When up-stream
+computations are lazy, meaning that they are only performed when needed, these
+scenarios execute significantly faster than when the up-stream computation is
+eager (always performing all computations).
+
+The IDataView system enables and encourages components to be lazy in both
+column and row directions.
+
+A row cursor has a set of active columns, determined by arguments passed to
+`GetRowCursor`. Generally, the cursor, and any upstream components, will only
+perform computation or data movement necessary to provide values of the active
+columns. For example, when `TermTransform` builds its term dictionary from its
+input `IDataView`, it gets a row cursor from the input view with only the term
+column active. Any data loading or computation not required to materialize the
+term column is avoided. This is lazy computation in the column direction.
+
+Generally, creating a row cursor is a very cheap operation. The expense is in
+the data movement and computation required to iterate over the rows. If a
+cursor is used to iterate over a small subset of the input rows, then
+generally, only computation and data movement needed to materialize the
+requested rows is performed. This is lazy computation in the row direction.
+
+### Immutability and Repeatability
+
+Cursoring through data does not modify input data in any way. The root data is
+immutable, and the operations performed to materialize derived data are
+repeatable. In particular, the values produced by two cursors constructed from
+the same view with the same arguments to `GetRowCursor` will be identical.
+
+Immutability and repeatability enable transparent caching. For example, when a
+learning algorithm or other component requires multiple passes over an
+IDataView pipeline that includes non-trivial computation, performance may be
+enhanced by either caching to memory or caching to disk. Immutability and
+repeatability ensure that inserting caching is transparent to the learning
+algorithm.
+
+Immutability also ensures that execution of a composed data pipeline graph is
+safe for parallelism. Without the guarantee of immutability, nodes in a data
+flow graph can produce side effects that are visible to other non-dependent
+nodes. A system where multiple transforms worked by mutating data would be
+impossible to predict or reason about, short of the gross inefficiency of
+cloning of the source data to ensure consistency.
+
+The IDataView system's immutability guarantees enable flexible scheduling
+without the need to clone data.
+
+### Batch Parallel Cursoring
+
+The `GetRowCursor` method on `IDataView` includes options to allow or
+encourage parallel execution. If the view is a transform that can benefit from
+parallelism, it requests from its input view, not just a cursor, but a cursor
+set. If that view is a transform, it typically requests from its input view a
+cursor set, etc., on up the transformation chain. At some point in the chain
+(perhaps at a loader), a component, called the splitter, determines how many
+cursors should be active, creates those cursors, and returns them together
+with a consolidator object. At the other end, the consolidator is invoked to
+marshal the multiple cursors back into a single cursor. Intervening levels
+simply create a cursor on each input cursor, return that set of cursors as
+well as the consolidator.
+
+The ML.NET code base includes transform base classes that implement the
+minimal amount of code required to support this batch parallel cursoring
+design. Consequently, most transform implementations do not have any special
+code to support batch parallel cursoring.
+
+### Memory Efficiency
+
+Cursoring is inherently efficient from a memory allocation perspective.
+Executing `MoveNext()` requires no memory allocation. Retrieving primitive
+column values from a cursor also requires no memory allocation. To retrieve
+vector column values from a cursor, the caller can optionally provide buffers
+into which the values should be copied. When the provided buffers are
+sufficiently large, no additional memory allocation is required. When the
+buffers are not provided or are too small, the cursor allocates buffers of
+sufficient size to hold the values. This cooperative buffer sharing protocol
+eliminates the need to allocate separate buffers for each row. To avoid any
+allocation while iterating, client code only need allocate sufficiently large
+buffers up front, outside the iteration loop.
+
+Note that IDataView allows algorithms that need to materialize data in memory
+to do so. Nothing in the system prevents a component from cursoring through
+the source data and building a complete in-memory representation of the
+information needed, subject, of course, to available memory.
+
+### Data Size
+
+For large data scenarios, it is critical that the pipeline support efficient
+multiple pass "streaming" from disk. IDataView naturally supports streaming
+via cursoring through views. Typically, the root of a view is a loader that
+pulls information from a file or other data source. We have implemented both
+binary .idv and text-based loaders and savers. New loaders and savers can be
+added at any time.
+
+Note that when the data is small, and repeated passes over the data are
+needed, the operating system disk cache transparently enhances performance.
+Further, when the data is known to fit in memory, caching, as described above,
+provides even better performance.
+
+### Randomization
+
+Some training algorithms benefit from randomizing the order of rows produced
+by a cursor. An `IDataView` indicates via a property whether it supports
+shuffling. If it does, a random number generator passed to its `GetRowCursor`
+method indicates shuffling should happen, with seed information pulled from
+the random number generator. Serving rows from disk in a random order is quite
+difficult to do efficiently (without seeking for each row). The binary .idv
+loader has some shuffling support, favoring performance over attempting to
+provide a uniform distribution over the permutation space. This level of
+support has been validated to be sufficient for machine learning goals (e.g.,
+in recent work on SA-SDCA algorithm). When the data is all in memory, as it is
+when cached, randomizing is trivial.
+
+## Appendix: Comparison with LINQ
+
+This section is intended for developers familiar with the .Net
+`IEnumerable<T>` interface and the LINQ technologies.
+
+The `IDataView` interface is, in some sense, similar to `IEnumerable<T>`, and
+the IDataView system is similar to the LINQ eco-system. The comparisons below
+refer to the `IDataView` and `IEnumerable<T>` interfaces as the core
+interfaces of their respective worlds.
+
+In both worlds, there is a cursoring interface associated with the core
+interface. In the IEnumerable world, the cursoring interface is
+`IEnumerator<T>`. In the IDataView world, the cursoring interface is
+`IRowCursor`.
+
+Both cursoring interfaces have `MoveNext()` methods for forward-only iteration
+through the elements.
+
+Both cursoring interfaces provide access to information about the current
+item. For the IEnumerable world, the access is through the `Current` property
+of the enumerator. Note that when `T` is a class type, this suggests that each
+item served requires memory allocation. In the IDataView world, there is no
+single object that represents the current row. Instead, the values of the
+current row are directly accessible via methods on the cursor. This avoids
+memory allocation for each row.
+
+In both worlds, the item type information is carried by both the core
+interface and the cursoring interface. In the IEnumerable world, this type
+information is part of the .Net type, while in the IDataView world, the type
+information is much richer and contained in the schema, rather than in the
+.Net type.
+
+In both worlds, many different classes implement the core interface. In the
+IEnumerable world, developers explicitly write some of these classes, but many
+more implementing classes are automatically generated by the C# compiler, and
+returned from methods written using the C# iterator functionality (`yield
+return`). In the IDataView world, developers explicitly write all of the
+implementing classes, including all loaders and transforms. Unfortunately,
+there is no equivalent `yield return` magic.
+
+In both worlds, multiple cursors can be created and used.
+
+In both worlds, computation is naturally lazy in the row direction. In the
+IEnumerable world, laziness in the column direction would correspond to the
+returned `Current` value of type `T` lazily computing some of its properties.
+
+In both worlds, streaming from disk is naturally supported.
+
+Neither world supports indexed item access, nor a guarantee that the number of
+items is available without iterating and counting.
diff --git a/docs/code/IDataViewImplementation.md b/docs/code/IDataViewImplementation.md
new file mode 100644
index 0000000000..63fe48b64d
--- /dev/null
+++ b/docs/code/IDataViewImplementation.md
@@ -0,0 +1,518 @@
+# `IDataView` Implementation
+
+This document is intended as an essay on the best practices for `IDataView`
+implementations. As a prerequisite, we suppose that someone has read, and
+mostly understood, the following documents:
+
+* [Design principles](IDataViewDesignPrinciples.md) and
+* [Type system](IDataViewTypeSystem.md).
+
+and has also read and understood the code documentation for the `IDataView`
+and its attendant interfaces. Given that background, we will expand on best
+practices and common patterns that go into a successful implementation of
+`IDataView`, and motivate them with real examples, and historical learnings.
+
+Put another way: There are now within the ML.NET codebase many implementations
+of `IDataView` and many others in other related code bases that interface with
+ML.NET. The corresponding PRs and discussions have resulted in the
+accumulation of some information, stuff that is not and perhaps should not be
+covered in the specification or XML code documentation, but that is
+nonetheless quite valuable to know. That is, not the `IDataView` spec itself,
+but many of the logical implications of that spec.
+
+We will here start with the idioms and practices for `IDataView` generally,
+before launching into specific *types* of data views: right now there are two
+types of data views that have risen to the dignity of being "general": loaders
+and transforms. (There are many "specific" non-general data views: "array"
+data views, cache data views, join data views, data views for taking other
+abstractions for representing data and phrasing it in a way our code can
+understand, but these do not follow any more general pattern as loaders and
+transforms do.)
+
+# Urgency in Adhering to Invariants
+
+The point of `IDataView` is that it enables composable data pipelines. But
+what does that composability, practically, entail?
+
+There are many implementations of `IDataView` and `IDataTransform` in the
+ML.NET codebase. There are, further, many instances of `ITrainer` that consume
+those data views. There are more implementations of these currently outside of
+this codebase, totaling some hundreds. Astonishingly, they all actually work
+well together. The reason why so many transforms can work well with so many
+different dataviews as potential inputs, chained in arbitrary and strange ways
+we can hardly imagine, and feed well into so many instances of `ITrainer` is
+not of course because we wrote code to accommodate the Cartesian product of
+all possible inputs, but merely because we assume that any given
+implementation of `IDataView` obeys the invariants and principles it must.
+
+This is a general principal of software engineering, or indeed any
+engineering: it is nearly impossible to build any complex system of multiple
+parts unless those subcomponents adhere to whatever specifications they're
+supposed to, and fulfill their requirements.
+
+We can to some extent tolerate divergence from the invariants in *some*
+components, if they are isolated: we have some losses that behave strangely,
+even trainers behave somewhat strangely, sort of. Yet `IDataView` is the
+center of our data pipeline, and divergences are more potentially harmful.
+There is, for every requirement listed here, actually *something* that is
+relying on it.
+
+The inverse is also true: not only must `IDataView` conform to invariants,
+code that consumes `IDataView` should be robust to situations other than the
+"happy path." It needn't succeed, but it should at least be able to detect if
+data is not in the expected form and throw an error message to the user
+telling them how they misused it.
+
+To give the most common example of what I have seen in PRs: often one designs
+a transform or learner whose anticipated usage is that it will be used in
+conjunction with another transform "upstream" to prepare the data. (Again,
+this is very common: a `KeyToVector` transform for example assumes there's
+*something* upstream producing key values.) What happens sometimes is people
+forget to check that the input data actually *does* conform to that, with the
+result that if a pipeline was composed in some other fashion, there would be
+some error.
+
+The only thing you can really assume is that an `IDataView` behaves "sanely"
+according to the contracts of the `IDataView` interface, so that future ML.NET
+developers can form some reasonable expectations of how your code behaves, and
+also have a prayer of knowing how to maintain the code. It is hard enough to
+write software correctly even when the code you're working with actually does
+what it is supposed to, and impossible when it doesn't. Anyway, not to belabor
+the point: hidden undocumented implicit requirements on the usage
+
+# Design Decisions
+
+Presumably you are motivated to read this document because you have some
+problem of how to get some data into ML.NET, or process data using ML.NET, or
+something along these lines. There is a decision to be made about how to even
+engineer a solution. Sometimes it's quite obvious: text featurization
+obviously belongs as a transform. But other cases are *less* obvious. We will
+talk here about how we think about these things.
+
+One crucial question is whether something should be a data view at all: Often
+there is ambiguity. To give some examples of previously contentious points:
+should clustering be *transform* or a *trainer*? What about PCA? What about
+LDA? In the end, we decided clustering was a *trainer* and both PCA and LDA
+are *transforms*, but this decision was hardly unambiguous. Indeed, what
+purpose is served by considering trainers and transforms fundamentally
+different things, at all?
+
+Even once we decide whether something *should* be an `IDataView` of some sort,
+the question remains what type of data view. We have some canonical types of
+data views:
+
+If it involves taking data from a stream, like a file, or some sort of stream
+of data from a network, or other such thing, we might consider this a
+*loader*, that is, it should perhaps implement `IDataLoader`.
+
+If it involves taking a *single* data view, and transmuting it in some
+fashion, **and** the intent is this same transmutation might be applied to
+novel data, then it should perhaps implement `IDataTransform`, and be a
+transform.
+
+Now then, consider that not everything should be a loader, or a transform,
+even when data could be considered to be read from a stream, or when there is
+a data view based on another single data view. The essential purpose of loader
+and transforms is that they can exist as part of the data model, that is, they
+should be serializable and applicable to new data. A nice rule of thumb is: if
+when designing some you can imagine a scenario where you want to apply some
+logic to *both* a training set as well as a test set, then it might make sense
+to make it a loader or a transform. If not, it probably does not make sense.
+
+1. Often data comes from some programmatic source, as a starting point for an
+   ML.NET pipeline. Despite being at the head of the data pipe, it is *not* a
+   loader, because the data source is not a stream (though it is stream*ing*):
+   it is a `RowSetDataView`.
+
+2. During training, data is sometimes cached. the structure that handles the
+   data caching is a `CacheDataView`. It is absolutely not a transform,
+   despite taking a single input and being itself an `IDataView`. There is no
+   reason to make it a transform, because there is no plausible rationale to
+   make it part of the data model: the decision of whether you want to cache
+   data during *training* has nothing at all to do with whether you want to
+   cache data during *scoring*, so there is no point in saving it to the data
+   model.
+
+3. The ML.NET API for prediction uses a scheme that phrases input data
+   programmatically as coming from an enumerable of typed objects: the
+   underlying programmatic `IDataView` that is constructed to wrap this is
+   *not* a loader, because it is not part of the data model. It is merely the
+   entry point to the data model, at least, in typical usage.
+
+# Why `GetGetter`?
+
+Let us address something fairly conspicuous. The question almost everyone
+asks, when they first start using `IDataView`: what is up with these getters?
+
+One does not fetch values directly from an `IRow` implementation (including
+`IRowCursor`). Rather, one retains a delegate that can be used to fetch
+objects, through the `GetGetter` method on `IRow`. This delegate is:
+
+```csharp
+public delegate void ValueGetter<TValue>(ref TValue value);
+```
+
+If you are unfamiliar with delegates, [read
+this](https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/delegates/).
+Anyway: you open a row cursor, you get the delegate through this `GetGetter`
+method, and you use this delegate multiple times to fetch the actual column
+values as you `MoveNext` through the cursor.
+
+Some history to motivate this: In the first version of `IDataView` the
+`IRowCursor` implementation did not actually have these "getters" but rather
+had a method, `GetColumnValue<TValue>(int col, ref TValue val)`. However, this
+has the following problems:
+
+* **Every** call had to verify that the column was active,
+* **Every** call had to verify that `TValue` was of the right type,
+* When these were part of, say, a transform in a chain (as they often are,
+  considering how common transforms are used by ML.NET's users) each access
+  would be accompanied by a virtual method call to the upstream cursor's
+  `GetColumnValue`.
+
+In contrast, consider the situation with these getter delegates. The
+verification of whether the column is active happens *exactly* once. The
+verification of types happens *exactly* once. Rather than *every* access being
+passed up through a chain of dozens of transform cursors, you merely get a
+getter from whatever cursor is serving it up, and do every access directly
+without having to pass through umpteen virtual method calls (each, naturally,
+accompanied by their own checks!). With these preliminaries done, a getter on
+every iteration, when called, merely has to just fill in the value: all this
+verification work is already taken care of. The practical result of this is
+that, for some workloads where the getters merely amounted to assigning
+values, the "getter" method became an order of magnitude faster. So: we got
+rid of this `GetColumnValue` method, and now work with `GetGetter`.
+
+# Repeatability
+
+A single `IDataView` instance should be considered a consistent view onto
+data. So: if you open a cursor on the same `IDatView` instance, and access
+values for the same columns, it will apparently be a "consistent" view. It is
+probably obvious what this mean, but specifically:
+
+The cursor as returned through `GetRowCursor` (with perhaps an identically
+constructed `IRandom` instance) in any iteration should return the same number
+of rows on all calls, and with the same values at each row.
+
+Why is this important? Many machine learning algorithms require multiple
+passes over the dataset. Most stochastic methods wouldn't really care if the
+data changed, but others are *very* sensitive to changes in the data. For
+example, how could an L-BFGS or OWL-QN algorithm effectively compute its
+approximation to a Hessian, if the examples from which the per-pass history
+are computed were not consistent? How could a dual algorithm like SDCA
+function with any accuracy, if the examples associated with any given dual
+variable were to change? Consider even a relatively simple transform, like a
+forward looking windowed averager, or anything relating to time series. The
+implementation of those `ICursor` interfaces often open *two* cursors on the
+underlying `IDataView`, one "look ahead" cursor used to gather and calculate
+necessary statistics, and another cursor for any data: how could the column
+constructed out of that transform be meaningful of the look ahead cursor was
+consuming different data from the contemporaneous cursor? There are many
+examples of this throughout the codebase.
+
+Nevertheless: in very specific circumstances we have relaxed this. For
+example, some ML.NET API code serves up corrupt `IDataView` implementations
+that have their underlying data change, since reconstituting a data pipeline
+on fresh data is at the present moment too resource intensive. Nonetheless,
+this is wrong: for example, the `TrainingCursorBase` and related subclasses
+rely upon the data not changing. Since, however, that is used for *training*
+and the prediction engines of the API as used for *scoring*, we accept these.
+However this is not, strictly speaking, correct, and this sort of corruption
+of `IDataView` should only be considered as a last resort, and only when some
+great good can be accomplished through this. We certainly did not accept this
+corruption lightly!
+
+# Norms for the Data Model
+
+In a similar vein for repeatability and consistency is the notion of the data
+model. Unlike repeatability, this topic is a bit specialized: `IDataView`
+specifically is not serializable, but both `IDataLoader` and `IDataTransform`
+are serializable. Nonetheless those are the two most important types of data
+views, so we will treat on them here.
+
+From a user's perspective, when they run ML.NET and specify a loader or set of
+transforms, what they are doing is composing a data pipe. For example, perhaps
+they specify a way to load data from, say, a text file, apply some
+normalization, some categorical handling, some text, some this, some that,
+some everything, and it all just works, and is consistent whether we're
+applying that to the training data on which the transforms were defined, or
+some other test set, whether we programmatically load the model in the API and
+apply it to some production setting, whether we are running in a distributed
+environment and want to make sure *all* worker nodes are featurizing data in
+exactly the same way, etc. etc.
+
+The way in which this consistency is accomplished is by having certain
+requirements on the essential parts of the data model: loaders and transforms.
+The essential reason these things exist is so that they can be applied to new
+data in a consistent way.
+
+Let us formalize this somewhat. We consider two data views to be functionally
+identical if there is absolutely no way to distinguish them: they return the
+same values, have the same types, same number of rows, they shuffle
+identically given identically constructed `IRandom` when row cursors are
+constructed, return the same ID for rows from the ID getter, etc. Obviously
+this concept is transitive. (Of course, `Batch` in a cursor might be different
+between the two, but that is the case even with two cursors constructed on the
+same data view.) So some rules:
+
+1. If you have an `IDataLoader`, then saving/loading the associated data model
+   on the same data should result in a functionally identical `IDataLoader`.
+
+2. If you have an `IDataTransform`, then saving/loading the associated data
+   model for the transforms on functionally identical `IDataView`s, should
+   itself result in functionally identical `IDataView`s.
+
+## Versioning
+
+This requirement for consistency of a data model often has implications across
+versions of ML.NET, and our requirements for data model backwards
+compatibility. As time has passed, we often feel like it would make sense if a
+transform behaved *differently*, that is, if it organized or calculated its
+output in a different way than it currently does. For example, suppose we
+wanted to switch the hash transform to something a bit more efficient than
+murmur hashes, for example. If we did so, presumably the same input values
+would map to different outputs. We are free to do so, of course, yet: when we
+deserialize a hash transform from before we made this change, that hash
+transform should continue to output values as it did, before we made that
+change. (This, of course, assuming that the transform was released as part of
+a "blessed" non-preview point release of ML.NET. We can, and have, broken
+backwards compatibility for something that has not yet been incorporated in
+any sort of blessed release, though we prefer to not.)
+
+## What is Not Functionally Identical
+
+Note that identically *constructed* data views are not necessarily
+*functionally* identical. Consider this usage of the train and score transform
+with `xf=trainScore{tr=ap}`, where we first train averaged perceptron, then
+copy its score and probability columns out of the way, then construct the
+same basic transform again.
+
+```maml
+maml.exe showdata saver=md seed=1 data=breast-cancer.txt xf=trainScore{tr=ap}
+    xf=copy{col=ScoreA:Score col=ProbA:Probability} xf=trainScore{tr=ap}
+```
+
+The result is this.
+
+Label | Features                     | PredictedLabel | Score  | Probability  | ScoreA | ProbA
+------|------------------------------|----------------|--------|--------------|--------|-------
+0     | 5, 1, 1, 1, 2, 1, 3, 1, 1    | 0              | -62.07 | 0.0117       | -75.28 | 0.0107
+0     | 5, 4, 4, 5, 7, 10, 3, 2, 1   | 1              |  88.41 | 0.8173       |  92.04 | 0.8349
+0     | 3, 1, 1, 1, 2, 2, 3, 1, 1    | 0              | -40.53 | 0.0269       | -44.23 | 0.0329
+0     | 6, 8, 8, 1, 3, 4, 3, 7, 1    | 1              | 201.21 | 0.9973       | 208.07 | 0.9972
+0     | 4, 1, 1, 3, 2, 1, 3, 1, 1    | 0              | -43.11 | 0.0243       | -55.32 | 0.0221
+1     | 8, 10, 10, 8, 7, 10, 9, 7, 1 | 1              | 259.22 | 0.9997       | 257.43 | 0.9995
+0     | 1, 1, 1, 1, 2, 10, 3, 1, 1   | 1              |  71.10 | 0.6933       |  89.52 | 0.8218
+0     | 2, 1, 2, 1, 2, 1, 3, 1, 1    | 0              | -38.94 | 0.0286       | -39.59 | 0.0388
+0     | 2, 1, 1, 1, 2, 1, 1, 1, 5    | 0              | -32.87 | 0.0360       | -41.52 | 0.0362
+0     | 4, 2, 1, 1, 2, 1, 2, 1, 1    | 0              | -31.76 | 0.0376       | -41.68 | 0.0360
+
+One could argue it's not *really* identically constructed, exactly, since both
+of those transforms (including the underlying averaged perceptron learner!)
+are initialized using the pseudo-random number generator in an `IHost` that
+changes from one to another. But, that's a bit nit-picky.
+
+Note also: when we say functionally identical we include everything about it:
+not just the data, but the schema, its metadata, the implementation of
+shuffling, etc. For this reason, while serializing the data *model* has
+guarantees of consistency, serializing the *data* has no such guarantee: if
+you serialize data using the text saver, practically all metadata (except slot
+names) will be completely lost, which can have implications on how some
+transforms and downstream processes work. Or: if you serialize data using the
+binary saver, suddenly it may become shufflable whereas it may not have been
+before.
+
+The inevitable caveat to all this stuff about "consistency" is that it is
+ultimately limited by hardware and other runtime environment factors: the
+truth is, certain machines will, with identical programs with seemingly
+identical flows of execution result, *sometimes*, in subtly different answers
+where floating point values are concerned. Even on the same machine there are
+runtime considerations, e.g., when .NET's RyuJIT was introduced in VS2015, we
+had lots of test failures around our model consistency tests because the JIT
+was compiling the CLI just *slightly* differently. But, this sort of thing
+aside (which we can hardly help), we expect the models to be the same.
+
+# On Loaders, Data Models, and Empty `IMultiStreamSource`s
+
+When you create a loader you have the option of specifying not only *one* data
+input, but any number of data input files, including zero. But there's also a
+more general principle at work here with zero files: when deserializing a data
+loader from a data model with an `IMultiStreamSource` with `Count == 0` (e.g.,
+as would be constructed with `new MultiFileSource(null)`), we have a protocol
+that *every* `IDataLoader` should work in that circumstance, and merely be a
+data view with no rows, but the same schema as it had when it was serialized.
+The purpose of this is that we often have circumstances were we need to
+understand the schema of the data (what columns were produced, what the
+feature names are, etc.) when all we have is the data model. (E.g., the
+`savemodel` command, and other things.)
+
+# Getters Must Fail for Invalid Types
+
+For a given `IRow`, we must expect that `GetGetter<TValue>(col)` will throw if
+either `IsColumnActive(col)` is `false`, or `typeof(TValue) !=
+Schema.GetColumnType(col).RawType`, as indicated in the code documentation.
+But why? It might seem reasonable to add seemingly "harmless" flexibility to
+this interface. So let's imagine your type should be `float`, because the
+corresponding column's type's `RawType` is `typeof(float)`. Now: if you
+*happen* to call `GetGetter<double>(col)` instead of `GetGetter<float>(col)`,
+it would actually be a fairly easy matter for `GetGetter` to actually
+accommodate it, by doing the necessary transformations under the hood, and
+*not* fail. This type of thinking is actually insidiously and massively
+harmful to the codebase, as I will remark.
+
+The danger of writing code is that there's a chance someone might find it
+useful. Imagine a consumer of your dataview actually relies on your
+"tolerance." What that means, of course, is that this consuming code cannot
+function effectively on any *other* dataview. The consuming code is by
+definition *buggy*: it is requesting data of a type we've explicitly claimed,
+through the schema, that we do not support. And the developer, through a well
+intentioned but misguided design decision, has allowed buggy code to pass a
+test it should have failed, thus making the codebase more fragile when, if we
+had simply maintained requirements, would have otherwise detected the bug.
+
+Moreover: it is a solution to a problem that does not exist. `IDataView`s are
+fundamentally composable structures already, and one of the most fundamental
+operations you can do is transform columns into different types. So, there is
+no need for you to do the conversion yourself. Indeed, it is harmful for you
+to try: if we have the conversion capability in one place, including the logic
+of what can be converted and *how* these things are to be converted, is it
+reasonable to suppose we should have it in *every implementation of
+`IDataView`?* Certainly not. At best the situation will be needless complexity
+in the code: more realistically it will lead to inconsistency, and from
+inconsistency, surprises and bugs for users and developers.
+
+# Thread Safety
+
+Any `IDataView` implementation, as well as the `ISchema`, *must* be thread
+safe. There is a lot of code that depends on this. For example, cross
+validation works by operating over the same dataset (just, of course, filtered
+to different subsets of the data). That amounts to multiple cursors being
+opened, simultaneously, over the same data.
+
+So: `IDataView` and `ISchema` must be thread safe. However, `IRowCursor`,
+being a stateful object, we assume is accessed from exactly one thread at a
+time. The `IRowCursor`s returned through a `GetRowCursorSet`, however, which
+each single one must be accessed by a single thread at a time, multiple
+threads can access this set of cursors simultaneously: that's why we have that
+method in the first place.
+
+# Exceptions and Errors
+
+There is one non-obvious implication of the lazy evaluation while cursoring
+over an `IDataView`: while cursoring, you should almost certainly not throw
+exceptions.
+
+Imagine you have a `TextLoader`. You might expect that if you have a parse
+error, e.g., you have a column of floats, and one of the rows has a value
+like, `"hi!"` or something otherwise uninterpretable, you would throw. Yet,
+consider the implications of lazy evaluation. If that column were not
+selected, the cursoring would *succeed*, because it would not look at that
+`"hi!"` token *at all*, much less detect that it was not parsable as a float.
+
+If we were to throw, the effect is that *sometimes* the cursoring will succeed
+(if the column is not selected), and *sometimes* will fail (if not selected).
+These failures are explainable, ultimately, of course, in the sense that
+anything is explainable, but a user knows nothing about lazy evaluation or
+anything like this: correspondingly this is enormously confusing.
+
+The implication is that we should not throw an exception in this case. We
+instead consider this value "missing," and we *may* register a warning using
+an `IChannel.Warning`, but we cannot fail.
+
+So: If you could reasonably catch the exception on *any* cursoring over your
+`IDataView`, you can throw. If, however, detecting the condition on which you
+could throw the exception requires that a certain column be made active, then
+you should not throw. Of course, there are extreme circumstances: for example,
+one cannot help but throw on a cursoring if, say, there is some weird system
+event, and if one somehow detects in a subsequent iteration that something is
+fundamentally broken then you can throw: e.g., the binary loader will throw if
+it detects the file it is reading is corrupted, even if that corruption may
+not have been obvious immediately.
+
+# `GetGetter` Returning the Same Delegate
+
+On a single instance of `IRowCursor`, since each `IRowCursor` instance has no
+requirement to be thread safe, it is entirely legal for a call to `GetGetter`
+on a single column to just return the same getting delegate. It has come to
+pass that the majority of implementations of `IRowCursor` actually do that,
+since it is in some ways easier to write the code that way.
+
+This practice has inadvertently enabled a fairly attractive tool for analysis
+of data pipelines: by returning the same delegate each time, we can check in a
+data pipeline what data is being passed through by seeing whether the
+references to getter delegates are being passed through. Now this is
+imperfect, because some transforms that could use the same delegate each time
+do not, but the vast majority do.
+
+# Class Structuring
+
+The essential attendant classes of an `IDataView` are its schema, as returned
+through the `Schema` property, as well as the `IRowCursor` implementation(s),
+as returned through the `GetRowCursor` and `GetRowCursorSet` methods. The
+implementations for those two interfaces are typically nested within the
+`IDataView` implementation itself. The cursor implementation is almost always
+at the bottom of the data view class.
+
+# `IRow` and `ICursor` vs. `IRowCursor`
+
+We have `IRowCursor` which descends from both `IRow` and `ICursor`. Why do
+these other interfaces exist?
+
+Firstly, there are implementations of `IRow` or `ICursor` that are not
+`IRowCursor`s. We have occasionally found it useful to have something
+resembling a key-value store, but that is strongly, dynamically typed in some
+fashion. Why not simply represent this using the same idioms of `IDataView`?
+So we put them in an `IRow`. Similarly: we have several things that behave
+*like* cursors, but that are in no way *row* cursors.
+
+However, more than that, there are a number of utility functions where we want
+to operate over something like an `IRowCursor`, but we want to have some
+indication that this function will not move the cursor (in which case `IRow`
+is helpful), or that will not access any values (in which case `ICursor` is
+helpful).
+
+# Schema
+
+The schema contains information about the columns. As we see in [the design
+principles](IDataViewDesignPrinciples.md), it has index, data type, and
+optional metadata.
+
+While *programmatically* accesses to an `IDataView` are by index, from a
+user's perspective the indices are by name; most training algorithms
+conceptually train on the `Features` column (under default settings). For this
+reason nearly all usages of an `IDataView` will be prefixed with a call to the
+schema's `TryGetColumnIndex`.
+
+Regarding name hiding, the principles mention that when multiple columns have
+the same name, other columns are "hidden." The convention all implementations
+of `ISchema` obey is that the column with the *largest* index. Note however
+that this is merely convention, not part of the definition of `ISchema`.
+
+Implementations of `TryGetColumnIndex` should be O(1), that is, practically,
+this mapping ought to be backed with a dictionary in most cases. (There are
+obvious exceptions like, say, things like `LineLoader` which produce exactly
+one column. There, a simple equality test suffices.)
+
+It is best if `GetColumnType` returns the *same* object every time. That is,
+things like key-types and vector-types, when returned, should not be created
+in the function itself (thereby creating a new object every time), but rather
+stored somewhere and returned.
+
+## Metadata
+
+Since metadata is *optional*, one is not obligated to necessarily produce it,
+or conform to any particular schemas for any particular kinds (beyond, say,
+the obvious things like making sure that the types and values are consistent).
+However, the flip side of that freedom given to *producers*, is that
+*consumers* are obligated, when processing a data view input, to react
+gracefully when metadata of a certain kind is absent, or not in a form that
+one expects. One should *never* fail when input metadata is in a form one does
+not expect.
+
+To give a practical example of this: many transforms, learners, or other
+components that process `IDataView`s will do something with the slot names,
+but when the `SlotNames` metadata kind for a given column is either absent,
+*or* not of the right type (vectors of strings), *or* not of the right size
+(same length vectors as the input), the behavior is not to throw or yield
+errors or do anything of the kind, but to simply say, "oh, I don't really have
+slot names," and proceed as if the slot names hadn't been present at all.
\ No newline at end of file
diff --git a/docs/code/IDataViewTypeSystem.md b/docs/code/IDataViewTypeSystem.md
new file mode 100644
index 0000000000..c152a667cf
--- /dev/null
+++ b/docs/code/IDataViewTypeSystem.md
@@ -0,0 +1,843 @@
+# `IDataView` Type System
+
+## Overview
+
+The *IDataView system* consists of a set of interfaces and classes that
+provide efficient, compositional transformation of and cursoring through
+schematized data, as required by many machine-learning and data analysis
+applications. It is designed to gracefully and efficiently handle both
+extremely high dimensional data and very large data sets. It does not directly
+address distributed data, but is suitable for single node processing of data
+partitions belonging to larger distributed data sets.
+
+While `IDataView` is one interface in this system, colloquially, the term
+IDataView is frequently used to refer to the entire system. In this document,
+the specific interface is written using fixed pitch font as `IDataView`.
+
+IDataView is the data pipeline machinery for ML.NET. The ML.NET codebase has
+an extensive library of IDataView related components (loaders, transforms,
+savers, trainers, predictors, etc.). More are being worked on.
+
+The name IDataView was inspired from the database world, where the term table
+typically indicates a mutable body of data, while a view is the result of a
+query on one or more tables or views, and is generally immutable. Note that
+both tables and views are schematized, being organized into typed columns and
+rows conforming to the column types. Views differ from tables in several ways:
+
+* Views are immutable; tables are mutable.
+
+* Views are composable -- new views can be formed by applying transformations
+  (queries) to other views. Forming a new table from an existing table
+  involves copying data, making them decoupled—the new table is not linked to
+  the original table in any way.
+
+* Views are virtual; tables are fully realized/persisted.
+
+Note that immutability and compositionality are critical enablers of
+technologies that require reasoning over transformation, like query
+optimization and remoting. Immutability is also key for concurrency and thread
+safety.
+
+This document includes a very brief introduction to some of the basic concepts
+of IDataView, but then focuses primarily on the IDataView type system.
+
+Why does IDataView need a special type system? The .NET type system is not
+well suited to machine-learning and data analysis needs. For example, while
+one could argue that `typeof(double[])` indicates a vector of double values,
+it explicitly does not include the dimensionality of the vector/array.
+Similarly, there is no good way to indicate a subset of an integer type, for
+example integers from 1 to 100, as a .NET type. In short, there is no
+reasonable way to encode complete range and dimensionality information in a
+`System.Type`.
+
+In addition, a well-defined type system, including complete specification of
+standard data types and conversions, enables separately authored components to
+seamlessly work together without surprises.
+
+### Basic Concepts
+
+`IDataView`, in the narrow sense, is an interface implemented by many
+components. At a high level, it is analogous to the .Net interface
+`IEnumerable<T>`, with some very significant differences.
+
+While `IEnumerable<T>` is a sequence of objects of type `T`, `IDataView` is a
+sequence of rows. An `IDataView` object has an associated `ISchema` object
+that defines the `IDataView`'s columns, including their names, types, indices,
+and associated metadata. Each row of the `IDataView` has a value for each
+column defined by the schema.
+
+Just as `IEnumerable<T>` has an associated enumerator interface, namely
+`IEnumerator<T>`, `IDataView` has an associated cursor interface, namely
+`IRowCursor`. In the enumerable world, an enumerator object implements a
+Current property that returns the current value of the iteration as an object
+of type `T`. In the IDataView world, an `IRowCursor` object encapsulates the
+current row of the iteration. There is no separate object that represents the
+current row. Instead, the cursor implements methods that provide the values of
+the current row, when requested. Additionally, the methods that serve up
+values do not require memory allocation on each invocation, but use sharable
+buffers. This scheme significantly reduces the memory allocations needed to
+cursor through data.
+
+Both `IDataView` and `IEnumerable<T>` present a read-only view on data, in the
+sense that a sequence presented by each is not directly mutable.
+"Modifications" to the sequence are accomplished by additional operators or
+transforms applied to the sequence, so do not modify any underlying data. For
+example, to normalize a numeric column in an `IDataView` object, a
+normalization transform is applied to the sequence to form a new `IDataView`
+object representing the composition. In the new view, the normalized values
+are contained in a new column. Often, the new column has the same name as the
+original source column and "replaces" the source column in the new view.
+Columns that are not involved in the transformation are simply "passed
+through" from the source `IDataView` to the new one.
+
+Detailed specifications of the `IDataView`, `ISchema`, and `IRowCursor`
+interfaces are in other documents.
+
+### Column Types
+
+Each column in an `IDataView` has an associated column type. The collection of
+column types is open, in the sense that new code can introduce new column
+types without requiring modification of all `IDataView` related components.
+While introducing new types is possible, we expect it will also be relatively
+rare.
+
+All column type implementations derive from the abstract class `ColumnType`.
+Primitive column types are those whose implementation derives from the
+abstract class `PrimitiveType`, which derives from `ColumnType`.
+
+### Representation Type
+
+A column type has an associated .Net type, known as its representation type or
+raw type.
+
+Note that a column type often contains much more information than the
+associated .Net representation type. Moreover, many distinct column types can
+use the same representation type. Consequently, code should not assume that a
+particular .Net type implies a particular column type.
+
+### Standard Column Types
+
+There is a set of predefined standard column types, divided into standard
+primitive types and vector types. Note that there can be types that are
+neither primitive nor vector types. These types are not standard types and may
+require extra care when handling them. For example, a `PictureType` value
+might require disposing when it is no longer needed.
+
+Standard primitive types include the text type, the boolean type, numeric
+types, and key types. Numeric types are further split into floating-point
+types, signed integer types, and unsigned integer types.
+
+A vector type has an associated item type that must be a primitive type, but
+need not be a standard primitive type. Note that vector types are not
+primitive types, so vectors of vectors are not supported. Note also that
+vectors are homogeneous—all elements are of the same type. In addition to its
+item type, a vector type contains dimensionality information. At the basic
+level, this dimensionality information indicates the length of the vector
+type. A length of zero means that the vector type is variable length, that is,
+different values may have different lengths. Additional detail of vector types
+is in a subsequent section. Vector types are instances of the sealed class
+`VectorType`, which derives from `ColumnType`.
+
+This document uses convenient shorthand for standard types:
+
+* `TX`: text
+
+* `BL`: boolean
+
+* `R4`, `R8`: single and double precision floating-point
+
+* `I1`, `I2`, `I4`, `I8`: signed integer types with the indicated number of
+  bytes
+
+* `U1`, `U2`, `U4`, `U8`: unsigned integer types with the indicated number of
+  bytes
+
+* `UG`: unsigned type with 16-bytes, typically used as a unique ID
+
+* `TS`: timespan, a period of time
+
+* `DT`: datetime, a date and time but no timezone
+
+* `DZ`: datetime zone, a date and time with a timezone
+
+* `U4[100-199]`: A key type based on `U4` representing legal values from 100
+  to 199, inclusive
+
+* `V<R4,3,2>`: A vector type with item type `R4` and dimensionality
+  information [3,2]
+
+See the sections on the specific types for more detail.
+
+The IDataView system includes many standard conversions between standard
+primitive types. A later section contains a full specification of these
+conversions.
+
+### Default Value
+
+Each column type has an associated default value corresponding to the default
+value of its representation type, as defined by the .Net (C# and CLR)
+specifications.
+
+The standard conversions map source default values to destination default
+values. For example, the standard conversion from `TX` to `R8` maps the empty
+text value to the value zero. Note that the empty text value is distinct from
+the missing text value, as discussed next.
+
+### Missing Value
+
+Most of the standard primitive types support the notion of a missing value. In
+particular, the text type, floating-point types, signed integer types, and key
+types all have an internal representation of missing. We follow R's lead and
+denote such values as `NA`.
+
+Unlike R, the standard primitive types do not distinguish between missing and
+invalid. For example, in floating-point arithmetic, computing zero divided by
+zero, or infinity minus infinity, produces an invalid value known as a `NaN`
+(for Not-a-Number). R uses a specific `NaN` value to represent its `NA` value,
+with all other `NaN` values indicating invalid. The IDataView standard
+floating-point types do not distinguish between the various `NaN` values,
+treating them all as missing/invalid.
+
+A standard conversion from a source type with `NA` to a destination type with
+`NA` maps `NA` to `NA`. A standard conversion from a source type with `NA` to
+a destination type without `NA` maps `NA` to the default value of the
+destination type. For example, converting a text `NA` value to `R4` produces a
+`NaN`, but converting a text `NA` to `U4` results in zero. Note that this
+specification does not address diagnostic user messages, so, in certain
+environments, the latter situation may generate a warning to the user.
+
+Note that a vector type does not support a representation of missing, but may
+contain `NA` values of its item type. Generally, there is no standard
+mechanism faster than O(N) for determining whether a vector with N items
+contains any missing values.
+
+For further details on missing value representations, see the sections
+detailing the particular standard primitive types.
+
+### Vector Representations
+
+Values of a vector type may be represented either sparsely or densely. A
+vector type does not mandate denseness or sparsity, nor does it imply that one
+is favored over the other. A sparse representation is semantically equivalent
+to a dense representation having the suppressed entries filled in with the
+*default* value of the item type. Note that the values of the suppressed
+entries are emphatically *not* the missing/`NA` value of the item type, unless
+the missing and default values are identical, as they are for key types.
+
+### Metadata
+
+A column in an `ISchema` can have additional column-wide information, known as
+metadata. For each string value, known as a metadata kind, a column may have a
+value associated with that metadata kind. The value also has an associated
+type, which is a compatible column type.
+
+For example:
+
+* A column may indicate that it is normalized, by providing a `BL` valued
+  piece of metadata named `IsNormalized`.
+
+* A column whose type is `V<R4,17>`, meaning a vector of length 17 whose items
+  are single-precision floating-point values, might have `SlotNames` metadata
+  of type `V<TX,17>`, meaning a vector of length 17 whose items are text.
+
+* A column produced by a scorer may have several pieces of associated
+  metadata, indicating the "scoring column group id" that it belongs to, what
+  kind of scorer produced the column (e.g., binary classification), and the
+  precise semantics of the column (e.g., predicted label, raw score,
+  probability).
+
+The `ISchema` interface, including the metadata API, is fully specified in
+another document.
+
+## Text Type
+
+The text type, denoted by the shorthand `TX`, represents text values. The
+`TextType` class derives from `PrimitiveType` and has a single instance,
+exposed as `TextType.Instance`. The representation type of `TX` is an
+immutable struct known as `DvText`. A `DvText` value represents a sequence of
+characters whose length is contained in its `Length` field. The missing/`NA`
+value has a `Length` of -1, while all other values have a non-negative
+`Length`. The default value has a `Length` of zero and represents an empty
+sequence of characters.
+
+In text processing transformations, it is very common to split text into
+pieces. A key advantage of using `DvText` instead of `System.String` for text
+values is that these splits require no memory allocation—the derived `DvText`
+references the same underlying `System.String` as the original `DvText` does.
+Another reason that `System.String` is not ideal for text is that we want the
+default value to be empty and not `NA`. For `System.String`, the default value
+is null, which would be a more natural representation for `NA` than for empty
+text. By using a custom struct wrapper around a portion (or span) of a
+`System.String`, we address both the memory efficiency and default value
+problems.
+
+## Boolean Type
+
+The standard boolean type, denoted by the shorthand `BL`, represents
+true/false values. The `BooleanType` class derives from `PrimitiveType` and
+has a single instance, exposed as `BooleanType.Instance`. The representation
+type of `BL` is the `DvBool` enumeration type, logically stored as `sbyte`:
+
+`DvBool` | `sbyte` Value
+--------:|:-------------
+`NA`     | -128
+`False`  | 0
+`True`   | 1
+
+The default value of `BL` is `DvBool.False` and the `NA` value of `BL` is
+`DvBool.NA`. Note that the underlying type of the `DvBool` `enum` is signed
+byte and the default and `NA` values of `BL` align with the default and `NA`
+values of `I1`.
+
+There is a standard conversion from `TX` to `BL`. There are standard
+conversions from `BL` to all signed integer and floating point numeric types,
+with `DvBool.False` mapping to zero, `DvBool.True` mapping to one, and
+`DvBool.NA` mapping to `NA`.
+
+## Number Types
+
+The standard number types are all instances of the sealed class NumberType,
+which is derived from PrimitiveType. There are two standard floating-point
+types, four standard signed integer types, and four standard unsigned integer
+types. Each of these is represented by a single instance of NumberType and
+there are static properties of NumberType to access each instance. For
+example, to test whether a variable type represents `I4`, use the C# code
+`type == NumberType.I4`.
+
+Floating-point arithmetic has a well-deserved reputation for being
+troublesome. This is primarily because it is imprecise, in the sense that the
+result of most operations must be rounded to the nearest representable value.
+This rounding means, among other side effects, that floating-point addition
+and multiplication are not associate, nor satisfy the distributive property.
+
+However, in many ways, floating-point arithmetic is the best-suited system for
+arithmetic computation. For example, the IEEE 754 specification mandates
+precise graceful overflow behavior—as results grow, they lose resolution in
+the least significant digits, and eventually overflow to a special infinite
+value. In contrast, when integer arithmetic overflows, the result is a non-
+sense value. Trapping and handling integer overflow is expensive, both in
+runtime and development costs.
+
+The IDataView system supports integer numeric types mostly for data
+interchange convenience, but we strongly discourage performing arithmetic on
+those values without first converting to floating-point.
+
+### Floating-point Types
+
+The floating-point types, `R4` and `R8`, have representation types
+`System.Single` and `System.Double`. Their default values are zero. Any `NaN`
+is considered an `NA` value, with the specific `Single.NaN` and `Double.NaN`
+values being the canonical `NA` values.
+
+There are standard conversions from each floating-point type to the other
+floating-point type. There are also standard conversions from text to each
+floating-point type and from each integer type to each floating-point type.
+
+### Signed Integer Types
+
+The signed integer types, `I1`, `I2`, `I4`, and `I8`, have representation
+types Sytem.SByte, `System.Int16`, `System.Int32`, and `System.Int64`. The
+default value of each of these is zero. Each of these has a non-zero value
+that is its own additive inverse, namely `(-2)^^{8n-1}`, where `n` is the
+number of bytes in the representation type. This is the minimum value of each
+of these types. We follow R's lead and use these values as the `NA` values.
+
+There are standard conversions from each signed integer type to every other
+signed integer type. There are also standard conversions from text to each
+signed integer type and from each signed integer type to each floating-point
+type.
+
+Note that we have not defined standard conversions from floating-point types
+to signed integer types.
+
+### Unsigned Integer Types
+
+The unsigned integer types, `U1`, `U2`, `U4`, and `U8`, have representation
+types Sytem.Byte, `System.UInt16`, `System.UInt32`, and `System.UInt64`,
+respectively. The default value of each of these is zero. These types do not
+have an `NA` value.
+
+There are standard conversions from each unsigned integer type to every other
+unsigned integer type. There are also standard conversions from text to each
+unsigned integer type and from each unsigned integer type to each floating-
+point type.
+
+Note that we have not defined standard conversions from floating-point types
+to unsigned integer types, or between signed integer types and unsigned
+integer types.
+
+## Key Types
+
+Key types are used for data that is represented numerically, but where the
+order and/or magnitude of the values is not semantically meaningful. For
+example, hash values, social security numbers, and the index of a term in a
+dictionary are all best modeled with a key type.
+
+The representation type of a key type, also called its underlying type, must
+be one of the standard four .Net unsigned integer types. The `NA` and default
+values of a key type are the same value, namely the representational value
+zero.
+
+Key types are instances of the sealed class `KeyType`, which derives from
+`PrimitiveType`.
+
+In addition to its underlying type, a key type specifies:
+
+* A count value, between `0` and `int.MaxValue`, inclusive
+
+* A "minimum" value, between `0` and `ulong.MaxValue`, inclusive
+
+* A Boolean value indicating whether the values of the key type are contiguous
+
+Regardless of the minimum and count values, the representational value zero
+always means `NA` and the representational value one is always the first valid
+value of the key type.
+
+Notes:
+
+* The `Count` property returns the count of the key type. This is of type
+  `int`, but is required to be non-negative. When `Count` is zero, the key
+  type has no known or useful maximum value. Otherwise, the legal
+  representation values are from one up to and including `Count`. The `Count`
+  is required to be representable in the underlying type, so, for example, the
+  `Count` value of a key type based on `System.Byte` must not exceed `255`. As
+  an example of the usefulness of the `Count` property, consider the
+  `KeyToVector` transform implemented as part of ML.NET. It maps from a key
+  type value to an indicator vector. The length of the vector is the `Count`
+  of the key type, which is required to be positive. For a key value of `k`,
+  with `1 ≤ k ≤ Count`, the resulting vector has a value of one in the
+  (`k-1`)th slot, and zero in all other slots. An `NA` value (with
+  representation zero) is mapped to the all- zero vector of length `Count`.
+
+* For a key type with positive `Count`, a representation value should be
+  between `0` and `Count`, inclusive, with `0` meaning `NA`. When processing
+  values from an untrusted source, it is best to guard against values bigger
+  than `Count` and treat such values as equivalent to `NA`.
+
+* The `Min` property returns the minimum semantic value of the key type. This
+  is used exclusively for transforming from a representation value, where the
+  valid values start at one, to user facing values, which might start at any
+  non-negative value. The most common values for `Min` are zero and one.
+
+* The boolean `Contiguous` property indicates whether values of the key type
+  are generally contiguous in the sense that a complete sampling of
+  representation values of the key type would cover most, if not all, values
+  from one up to their max. A `true` value indicates that using an array to
+  implement a map from the key type values is a reasonable choice. When
+  `false`, it is likely more prudent to use a hash table.
+
+* A key type can be non-`Contiguous` only if `Count` is zero. The converse
+  however is not true. A key type that is contiguous but has `Count` equal to
+  zero is one where there is a reasonably small maximum, but that maximum is
+  unknown. In this case, an array might be a good choice for a map from the
+  key type.
+
+* The shorthand for a key type with representation type `U1`, and semantic
+  values from `1000` to `1099`, inclusive, is `U1[1000-1099]`. Note that the
+  `Min` value of this key type is outside the range of the underlying type,
+  `System.Byte`, but the `Count` value is only `100`, which is representable
+  in a `System.Byte`. Recall that the representation values always start at 1
+  and extend up to `Count`, in this case `100`.
+
+* For a key type with representation type `System.UInt32` and semantic values
+  starting at `1000`, with no known maximum, the shorthand is `U4[1000-*]`.
+
+There are standard conversions from text to each key type. This conversion
+parses the text as a standard non-negative integer value and honors the `Min`
+and `Count` values of the key type. If a parsed numeric value falls outside
+the range indicated by `Min` and `Count`, or if the text is not parsable as a
+non-negative integer, the result is `NA`.
+
+There are standard conversions from one key type to another, provided:
+
+* The source and destination key types have the same `Min` and `Count` values.
+
+* Either the number of bytes in the destination's underlying type is greater
+  than the number of bytes in the source's underlying type, or the `Count`
+  value is positive. In the latter case, the `Count` is necessarily less than
+  2k, where k is the number of bits in the destination type's underlying type.
+  For example, `U1[1-*]` can be converted to `U2[1-*]`, but `U2[1-*]` cannot
+  be converted to `U1[1-*]`. Also, `U1[1-100]` and `U2[1-100]` can be
+  converted in both directions.
+
+## Vector Types
+
+### Introduction
+
+Vector types are one of the key innovations of the IDataView system and are
+critical for high dimensional machine-learning applications.
+
+For example, when processing text, it is common to hash all or parts of the
+text and encode the resulting hash values, first as a key type, then as
+indicator or bag vectors using the `KeyToVector` transform. Using a `k`-bit
+hash produces a key type with `Count` equal to `2^^k`, and vectors of the same
+length. It is common to use `20` or more hash bits, producing vectors of
+length a million or more. The vectors are typically very sparse. In systems
+that do not support vector-valued columns, each of these million or more
+values is placed in a separate (sparse) column, leading to a massive explosion
+of the column space. Most tabular systems are not designed to scale to
+millions of columns, and the user experience also suffers when displaying such
+data. Moreover, since the vectors are very sparse, placing each value in its
+own column means that, when a row is being processed, each of those sparse
+columns must be queried or scanned for its current value. Effectively the
+sparse matrix of values has been needlessly transposed. This is very
+inefficient when there are just a few (often one) non-zero entries among the
+column values. Vector types solve these issues.
+
+A vector type is an instance of the sealed `VectorType` class, which derives
+from `ColumnType`. The vector type contains its `ItemType`, which must be a
+`PrimitiveType`, and its dimensionality information. The dimensionality
+information consists of one or more non-negative integer values. The
+`VectorSize` is the product of the dimensions. A dimension value of zero means
+that the true value of that dimension can vary from value to value.
+
+For example, tokenizing a text by splitting it into multiple terms generates a
+vector of text of varying/unknown length. The result type shorthand is
+`V<TX,*>`. Hashing this using `6` bits then produces the vector type
+`V<U4[0-63],*>`. Applying the `KeyToVector` transform then produces the vector
+type `V<R4,*,64>`. Each of these vector types has a `VectorSize` of zero,
+indicating that the total number of slots varies, but the latter still has
+potentially useful dimensionality information: the vector slots are
+partitioned into an unknown number of runs of consecutive slots each of length
+`64`.
+
+As another example, consider an image data set. The data starts with a `TX`
+column containing URLs for images. Applying an `ImageLoader` transform
+generates a column of a custom (non-standard) type, `Picture<*,*,4>`, where
+the asterisks indicate that the picture dimensions are unknown. The last
+dimension of `4` indicates that there are four channels in each pixel: the
+three color components, plus the alpha channel. Applying an `ImageResizer`
+transform scales and crops the images to a specified size, for example,
+`100x100`, producing a type of `Picture<100,100,4>`. Finally, applying a
+`ImagePixelExtractor` transform (and specifying that the alpha channel should
+be dropped), produces the vector type `V<R4,3,100,100>`. In this example, the
+`ImagePixelExtractor` re-organized the color information into separate planes,
+and divided each pixel value by 256 to get pixel values between zero and one.
+
+### Equivalence
+
+Note that two vector types are equivalent when they have equivalent item types
+and have identical dimensionality information. To test for compatibility,
+instead of equivalence, in the sense that the total `VectorSize` should be the
+same, use the `SameSizeAndItem` method instead of the Equals method (see the
+`ColumnType` code below).
+
+### Representation Type
+
+The representation type of a vector type is the struct `VBuffer<T>`, where `T`
+is the representation type of the item type. For example, the representation
+type of `V<R8,10>` is `VBuffer<double>`. When the vector type's `VectorSize`
+is positive, each value of the type will have length equal to the
+`VectorSize`.
+
+The struct `VBuffer<T>`, sketched below, provides both dense and sparse
+representations and encourages cooperative buffer sharing. A complete
+discussion of `VBuffer<T>` and associated coding idioms is in another
+document.
+
+Notes:
+
+* `VBuffer<T>` contains four public readonly fields: `Length`, `Count`,
+  `Values`, and `Indices`.
+
+* `Length` is the logical length of the vector, and must be non-negative.
+
+* `Count` is the number of items explicitly represented in the vector. `Count`
+  is non-negative and less than or equal to Length.
+
+* When `Count` is equal to Length, the vector is dense. Otherwise, the vector
+  is sparse.
+
+* The `Values` array contains the explicitly represented item values. The
+  length of the `Values` array is at least `Count`, but not necessarily equal
+  to `Count`. Only the first `Count` items in `Values` are part of the vector;
+  any remaining items are garbage and should be ignored. Note that when
+  `Count` is zero, `Values` may be null.
+
+* The `Indices` array is only relevant when the vector is sparse. In the
+  sparse case, `Indices` is parallel to `Values`, only the first `Count` items
+  are meaningful, the indices must be non-negative and less than `Length`, and
+  the indices must be strictly increasing. Note that when `Count` is zero,
+  `Indices` may be null. In the dense case, `Indices` is not meaningful and
+  may or may not be null.
+
+* It is very common for the arrays in a `VBuffer<T>` to be larger than needed
+  for their current value. A special case of this is when a dense `VBuffer<T>`
+  has a non-null `Indices` array. The extra items in the arrays are not
+  meaningful and should be ignored. Allowing these buffers to be larger than
+  currently needed reduces the need to reallocate buffers for different
+  values. For example, when cursoring through a vector valued column with
+  `VectorSize` of 100, client code could pre-allocate values and indices
+  arrays and seed a `VBuffer<T>` with those arrays. When fetching values, the
+  client code passes the `VBuffer<T>` by reference. The called code can re-use
+  those arrays, filling them with the current values.
+
+* Generally, vectors should use a sparse representation only when the number
+  of non-default items is at most half the value of Length. However, this
+  guideline is not a mandate.
+
+See the full `IDataView` technical specification for additional details on
+`VBuffer<T>`, including complete discussion of programming idioms, and
+information on helper classes for building and manipulating vectors.
+
+## Standard Conversions
+
+The `IDataView` system includes the definition and implementation of many
+standard conversions. Standard conversions are required to map source default
+values to destination default values. When both the source type and
+destination type have an `NA` value, the conversion must map `NA` to `NA`.
+When the source type has an `NA` value, but the destination type does not, the
+conversion must map `NA` to the default value of the destination type.
+
+Most standard conversions are implemented by the singleton class `Conversions`
+in the namespace `Microsoft.MachineLearning.Data.Conversion`. The standard
+conversions are exposed by the `ConvertTransform`.
+
+### From Text
+
+There are standard conversions from `TX` to the standard primitive types,
+`R4`, `R8`, `I1`, `I2`, `I4`, `I8`, `U1`, `U2`, `U4`, `U8`, and `BL`. For non-
+empty, non-missing `TX` values, these conversions use standard parsing of
+floating-point and integer values. For `BL`, the mapping is case insensitive,
+maps text values `{ true, yes, t, y, 1, +1, + }` to `DvBool.True`, and maps
+the values `{ false, no, f, n, 0, -1, - }` to `DvBool.False`.
+
+If parsing fails, the result is the `NA` value for floating-point, signed
+integer types, and boolean, and zero for unsigned integer types. Note that
+overflow of an integer type is considered failure of parsing, so produces an
+`NA` (or zero for unsigned). These conversions map missing/`NA` text to `NA`,
+for floating-point and signed integer types, and to zero for unsigned integer
+types.
+
+These conversions are required to map empty text (the default value of `TX`)
+to the default value of the destination, which is zero for all numeric types
+and DvBool.False for `BL`. This may seem unfortunate at first glance, but
+leads to some nice invariants. For example, when loading a text file with
+sparse row specifications, it's desirable for the result to be the same
+whether the row is first processed entirely as `TX` values, then parsed, or
+processed directly into numeric values, that is, parsing as the row is
+processed. In the latter case, it is simple to map implicit items (suppressed
+due to sparsity) to zero. In the former case, these items are first mapped to
+the empty text value. To get the same result, we need empty text to map to
+zero.
+
+### Floating Point
+
+There are standard conversions from `R4` to `R8` and from `R8` to `R4`. These
+are the standard IEEE 754 conversions (using unbiased round-to-nearest in the
+case of `R8` to `R4`).
+
+### Signed Integer
+
+There are standard conversions from each signed integer type to each other
+signed integer type. These conversions map `NA` to `NA`, map any other numeric
+value that fits in the destination type to the corresponding value, and maps
+any numeric value that does not fit in the destination type to `NA`. For
+example, when mapping from `I1` to `I2`, the source `NA` value, namely 0x80,
+is mapped to the destination `NA` value, namely 0x8000, and all other numeric
+values are mapped as expected. When mapping from `I2` to `I1`, any value that
+is too large in magnitude to fit in `I1`, such as 312, is mapped to `NA`,
+namely 0x80.
+
+### Signed Integer to Floating Point
+
+There are standard conversions from each signed integer type to each floating-
+point type. These conversions map `NA` to `NA`, and map all other values
+according to the IEEE 754 specification using unbiased round-to-nearest.
+
+### Unsigned Integer
+
+There are standard conversions from each unsigned integer type to each other
+unsigned integer type. These conversions map any numeric value that fits in
+the destination type to the corresponding value, and maps any numeric value
+that does not fit in the destination type to zero. For example, when mapping
+from `U2` to `U1`, any value that is too large in magnitude to fit in `U1`,
+such as 312, is mapped to zero.
+
+### Unsigned Integer to Floating Point
+
+There are standard conversions from each unsigned integer type to each
+floating-point type. These conversions map all values according to the IEEE
+754 specification using unbiased round-to-nearest.
+
+### Key Types
+
+There are standard conversions from one key type to another, provided:
+
+* The source and destination key types have the same `Min` and `Count` values.
+
+* Either the number of bytes in the destination's underlying type is greater
+  than the number of bytes in the source's underlying type, or the `Count`
+  value is positive. In the latter case, the `Count` is necessarily less than
+  `2^^k`, where `k` is the number of bits in the destination type's underlying
+  type. For example, `U1[1-*]` can be converted to `U2[1-*]`, but `U2[1-*]`
+  cannot be converted to `U1[1-*]`. Also, `U1[1-100]` and `U2[1-100]` can be
+  converted in both directions.
+
+The conversion maps source representation values to the corresponding
+destination representation values. There are no special cases, because of the
+requirements above.
+
+### Boolean to Numeric
+
+There are standard conversions from `BL` to each of the signed integer and
+floating point numeric. These map `DvBool.True` to one, `DvBool.False` to
+zero, and `DvBool.NA` to the numeric type's `NA` value.
+
+## Type Classes
+
+This chapter contains information on the C# classes used to represent column
+types. Since the IDataView type system is extensible this list describes only
+the core data types.
+
+### `ColumnType` Abstract Class
+
+The IDataView system includes the abstract class `ColumnType`. This is the
+base class for all column types. `ColumnType` has several convenience
+properties that simplify testing for common patterns. For example, the
+`IsVector` property indicates whether the `ColumnType` is an instance of
+`VectorType`.
+
+In the following notes, the symbol `type` is a variable of type `ColumnType`.
+
+* The `type.RawType` property indicates the representation type of the column
+  type. Its use should generally be restricted to constructing generic type
+  and method instantiations. In particular, testing whether `type.RawType ==
+  typeof(int)` is not sufficient to test for the standard `U4` type. The
+  proper test is `type == NumberType.I4`, since there is a single universal
+  instance of the `I4` type.
+
+* Certain .Net types have a corresponding `DataKind` `enum` value. The value
+  of the `type.RawKind` property is consistent with `type.RawType`. For .Net
+  types that do not have a corresponding `DataKind` value, the `type.RawKind`
+  property returns zero. The `type.RawKind` property is particularly useful
+  when switching over raw type possibilities, but only after testing for the
+  broader kind of the type (key type, numeric type, etc.).
+
+* The `type.IsVector` property is equivalent to `type is VectorType`.
+
+* The `type.IsNumber` property is equivalent to `type is NumberType`.
+
+* The `type.IsText` property is equivalent to `type is TextType`. There is a
+  single instance of the `TextType`, so this is also equivalent to `type ==
+  TextType.Instance`.
+
+* The `type.IsBool` property is equivalent to `type is BoolType`. There is a
+  single instance of the `BoolType`, so this is also equivalent to `type ==
+  BoolType.Instance`.
+
+* Type `type.IsKey` property is equivalent to `type is KeyType`.
+
+* If `type` is a key type, then `type.KeyCount` is the same as
+  `((KeyType)type).Count`. If `type` is not a key type, then `type.KeyCount`
+  is zero. Note that a key type can have a `Count` value of zero, indicating
+  that the count is unknown, so `type.KeyCount` being zero does not imply that
+  `type` is not a key type. In summary, `type.KeyCount` is equivalent to:
+  `type is KeyType ? ((KeyType)type).Count : 0`.
+
+* The `type.ItemType` property is the item type of the vector type, if `type`
+  is a vector type, and is the same as `type` otherwise. For example, to test
+  for a type that is either `TX` or a vector of `TX`, one can use
+  `type.ItemType.IsText`.
+
+* The `type.IsKnownSizeVector` property is equivalent to `type.VectorSize >
+  0`.
+
+* The `type.VectorSize` property is zero if either `type` is not a vector type
+  or if `type` is a vector type of unknown/variable length. Otherwise, it is
+  the length of vectors belonging to the type.
+
+* The `type.ValueCount` property is one if `type` is not a vector type and the
+  same as `type.VectorSize` if `type` is a vector type.
+
+* The `Equals` method returns whether the types are semantically equivalent.
+  Note that for vector types, this requires the dimensionality information to
+  be identical.
+
+* The `SameSizeAndItemType` method is the same as `Equals` for non-vector
+  types. For vector types, it returns true iff the two types have the same
+  item type and have the same `VectorSize` values. For example, for the two
+  vector types `V<R4,3,2>` and `V<R4,6>`, `Equals` returns false but
+  `SameSizeAndItemType` returns true.
+
+### `PrimitiveType` Abstract Class
+
+The `PrimitiveType` abstract class derives from `ColumnType` and is the base
+class of all primitive type implementations.
+
+### `TextType` Sealed Class
+
+The `TextType` sealed class derives from `PrimitiveType` and is a singleton-
+class for the standard text type. The instance is exposed by the static
+`TextType.Instance` property.
+
+### `BooleanType` Sealed Class
+
+The `BooleanType` sealed class derives from `PrimitiveType` and is a
+singleton-class for the standard boolean type. The instance is exposed by the
+static `BooleanType.Instance` property.
+
+### `NumberType` Sealed Class
+
+The `NumberType` sealed class derives from `PrimitiveType` and exposes single
+instances of each of the standard numeric types, `R4`, `R8`, `I1`, `I2`, `I4`,
+`I8`, `U1`, `U2`, `U4`, `U8`, and `UG`.
+
+### `DateTimeType` Sealed Class
+
+The `DateTimeType` sealed class derives from `PrimitiveType` and is a
+singleton-class for the standard datetime type. The instance is exposed by the
+static `DateTimeType.Instance` property.
+
+### `DateTimeZoneType` Sealed Class
+
+The `DateTimeZoneType` sealed class derives from `PrimitiveType` and is a
+singleton-class for the standard datetime timezone type. The instance is
+exposed by the static `DateTimeType.Instance` property.
+
+### `TimeSpanType` Sealed Class
+
+The `TimeSpanType` sealed class derives from `PrimitiveType` and is a
+singleton-class for the standard datetime timezone type. The instance is
+exposed by the static `TimeSpanType.Instance` property.
+
+### `KeyType` Sealed Class
+
+The `KeyType` sealed class derives from `PrimitiveType` and instances
+represent key types.
+
+Notes:
+
+* Two key types are considered equal iff their kind, min, count, and
+  contiguous values are the same.
+
+* The static `IsValidDataKind` method returns true iff kind is `U1`, `U2`,
+  `U4`, or `U8`. These are the only valid underlying data kinds for key types.
+
+* The inherited `KeyCount` property returns the same value as the `Count`
+  property.
+
+### `VectorType` Sealed Class
+
+The `VectorType` sealed class derives from `ColumnType` and instances
+represent vector types. The item type is specified as the first parameter to
+each constructor and the dimension information is inferred from the additional
+parameters.
+
+* The `DimCount` property indicates the number of dimensions and the `GetDim`
+  method returns a particular dimension value. All dimension values are non-
+  negative integers. A dimension value of zero indicates unknown (or variable)
+  in that dimension.
+
+* The `VectorSize` property returns the product of the dimensions.
+
+* The `IsSubtypeOf(VectorType other)` method returns true if this is a subtype
+  of `other`, in the sense that they have the same item type, and either have
+  the same `VectorSize` or `other.VectorSize` is zero.
+
+* The inherited `Equals` method returns true if the two types have the same
+  item type and the same dimension information.
+
+* The inherited `SameSizeAndItemType(ColumnType other)` method returns true if
+  `other` is a vector type with the same item type and the same `VectorSize`
+  value.
diff --git a/docs/code/IdvFileFormat.md b/docs/code/IdvFileFormat.md
new file mode 100644
index 0000000000..4009eed726
--- /dev/null
+++ b/docs/code/IdvFileFormat.md
@@ -0,0 +1,191 @@
+# IDV File Format
+
+This document describes ML.NET's Binary dataview file format, version 1.1.1.5
+written by the `BinarySaver` and `BinaryLoader` classes, commonly known as the
+`.idv` format.
+
+## Goal of the Format
+
+A dataview is a collection of columns, over some number of rows. (Do not
+confuse column with features. Columns can be and often are vector valued, and
+it is expected though not required that commonly all features will be together
+in one vector valued column.)
+
+The actual values are stored in blocks. A block holds values for a single
+column across multiple rows. Block format is dictated by a codec. There is a
+table-of-contents and lookup table to facilitate quasi-random access to
+particular blocks. (Quasi in the sense that you can only seek to a block, not
+to a particular within a block.)
+
+## General Data Format
+
+Before we discuss the format itself we will establish some conventions on how
+individual scalar values, strings, and other data is serialized. All basic
+pieces of data (e.g., a single number, or a single string) are encoded in ways
+reflecting the semantics of the .NET `BinaryWriter` class, those semantics
+being:
+
+* All numbers are stored as little-endian, using their natural fix-length
+  binary encoding.
+
+* Strings are stored using an unsigned
+  [LEB128](https://en.wikipedia.org/wiki/LEB128) number describing the number
+  of bytes, followed by that many bytes containing the UTF-8 encoded string.
+
+A note about this: LEB128 is a simple encoding to encode arbitrarily large
+integers. Each byte of 8-bits follows this convention. The most significant
+bit is 0 if and only if this is the end of the LEB128 encoding. The remaining
+7 bits are a part of the number being encoded. The bytes are stored
+little-endian, that is, the first byte holds the 7 least significant bits, the
+second byte (if applicable) holds the next 7 least significant bits, etc., and
+the last byte holds the 7 most significant bits. LEB128 is used one or two
+places in this format. (I might tend to prefer use of LEB128 in places where
+we are writing values that, on balance, we expect to be relatively small, and
+only in cases where there is no potential for benefit for random access to the
+associated stream, since LEB128 is incompatible with random access. However,
+this is not formulated into anything approaching a definite policy.)
+
+## Header
+
+Every binary instances stream has a header composed of 256 bytes, at the start
+of the stream. Not all bytes are used. Those bytes that are not explicitly
+used have undefined content, and can have anything in them. We strongly
+encourage writers of this format to insert obscene messages in this dead
+space. The content is defined as follows (the offsets being the start of that
+column).
+
+Offsets | Type  | Name and Description
+--------|-------|---------------------
+0       | ulong | **Signature**: The magic number of this file.
+8       | ulong | **Version**: Indicates the version of the data file.
+16      | ulong | **CompatibleVersion**: Indicates the minimum reader version that can interpret this file, possibly with some data loss.
+24      | long  | **TableOfContentsOffset**: The offset to the column table of contents structure.
+32      | long  | **TailOffset**: The eight-byte tail signature starts at this offset. So, the entire dataset stream should be considered to have byte length of eight plus this value.
+40      | long  | **RowCount**: The number of rows in this data file.
+48      | int   | **ColumnCount**: The number of columns in this data file.
+
+Notes on these:
+
+* The signature of this file is `0x00425644004C4D43`, which is, when written
+  little-endian to a file, `CML DVB ` with null characters in the place of
+  spaces. These letters are intended  to suggest "CloudML DataView Binary."
+
+* The tail signature is the byte-reversed version of this, that is,
+  `0x434D4C0044564200`.
+
+* Versions are encoded as four 16-bit unsigned numbers passed into a single
+  ulong, with higher order bits being a more major version. The first
+  supported version of the is 1.1.1.4, that is, `0x0001000100010004`.
+  (Versions prior to 1.1.1.4 did exist, but were not released, so we do not
+  support them, though we do describe them in this document for the sake of
+  completeness.)
+ 
+## Table of Contents Format
+
+The table of contents are packed entries, with there being as many entries as
+there are columns. The version field here indicates the versions where that
+entry is written. ≥ indicates the field occurred in versions after and
+including that version, = indicates the field occurs only in that version.
+
+Description | Entry Type | Version
+------------|------------|--------
+Column name | string     | ≥1.1.1.1
+Codec loadname | string  | ≥1.1.1.1
+Codec parameterization length | LEB128 integer | ≥1.1.1.1
+Codec parameterization, which must have precisely the length indicated above | arbitrary, but with specified length | ≥1.1.1.1
+Compression kind | CompressionKind (byte) | ≥1.1.1.1
+Rows per block in this column | LEB128 integer | ≥1.1.1.1
+Lookup table offset | long | ≥1.1.1.1
+Slot names offset, or 0 if this column has no slot names, if 1.1.1.2 behave as if there are no slot names, with this having value 0) | long | =1.1.1.3
+Slot names byte size (present only if slot names offset is greater than 0) | long | =1.1.1.3
+Slot names count (present only if slot names offset is greater than 0) | int | =1.1.1.3
+Metadata table of contents offset, or 0 if there is no metadata (1.1.1.4) | long | ≥1.1.1.4
+
+For those working in the ML.NET codebase: The three `Codec` fields are handled
+by the `CodecFactory.WriteCodec/TryReadCodec` methods, with the definition
+stream being at the start of the codec loadname, and being at the end of the
+codec parameterization, both in the case of success or failure.
+
+CompressionCodec enums are described below, and describe the compression
+algorithm used to compress blocks.
+
+### Compression Kind
+
+The enum for compression kind is one byte, and follows this scheme:
+
+Compression Kind                                               | Code
+---------------------------------------------------------------|-----
+None                                                           | 0
+DEFLATE (i.e., [RFC1951](http://www.ietf.org/rfc/rfc1951.txt)) | 1
+zlib (i.e., [RFC1950](http://www.ietf.org/rfc/rfc1950.txt))    | 2
+
+None means no compression. DEFLATE is the default scheme. There is a tendency
+to conflate zlib and DEFLATE, so to be clear: zlib can be (somewhat inexactly)
+considered a wrapped version of DEFLATE, but it is still a distinct (but
+closely related) format. However, both are implemented by the zlib library,
+which is probably the source of the confusion.
+
+## Metadata Table of Contents Format
+
+The metadata table of contents begins with a LEB128 integer describing the
+number of entries. (Should be a positive value, since if a column has no
+metadata the expectation is that the offset for the metadata TOC will be
+stored as 0.) What follows that are that many packed entries. Each entry is
+somewhat akin to the column table of contents entry, with some simplifications
+considering that there will be exactly one "block" with one item.
+
+Description                                            | Entry Type
+-------------------------------------------------------|------------
+Metadata kind                                          | string
+Codec loadname                                         | string
+Codec parameterization length                          | LEB128 integer
+Codec parameterization, which must have precisely the length indicated above | arbitrary, but with specified length
+Compression kind                                       | CompressionKind(byte)
+Offset of the block where the metadata item is written | long
+Byte length of the block                               | LEB128 integer
+
+The "block" written is written in exactly same format as the main content
+blocks. This will be very slightly inefficient as that scheme is sometimes
+written to accommodate many entries, but I don't expect that to be much of a
+burden.
+
+## Lookup Table Format
+
+Each table of contents entry is associated with a lookup table starting at the
+indicated lookup table offset. It is written as packed binary, with each
+lookup entry consisting of 16 bytes. So in all, the lookup table takes 16
+bytes, times the total number of blocks for this column.
+
+Description                                               | Entry Type
+----------------------------------------------------------|-----------
+Block offset, position in the file where the block starts | long
+Block length, its size in bytes in the file               | int
+Uncompressed block length, its size in bytes if the block bytes were decompressed according to the column's compression codec | int
+
+## Slot Names
+
+If slot names are stored, they are stored as pairs of integer index/string
+pairs. As many pairs are stored as count of slot names were present in the
+table of contents entry. Note that this only appeared in version 1.1.1.3. With
+1.1.1.4 and later, slot names were just considered yet another piece of
+metadata.
+
+Description       | Entry Type
+------------------|-----------
+Index of the slot | int
+The slot name     | string
+
+## Block Format
+
+Columns are ordered into blocks, with each block holding the binary encoded
+values for one particular columns across a range of rows. So for example, if
+the column's table of contents describes it as having 1000 rows per block, the
+first block will contain the values for the column for rows 0 through 999,
+second block 1000 through 1999, etc., with all blocks containing the same
+number of blocks, except the last block which will contain fewer items (unless
+the number of rows just so happens to be a multiple of the block size).
+
+Each column is a possibly compressed sequence of bytes, compressed according
+to the compression type field in the table of contents.  It begins and ends at
+the offsets indicated in the metadata entry stored in the directory. The
+uncompressed bytes will be stored in the format as described by the codec.
diff --git a/docs/code/KeyValues.md b/docs/code/KeyValues.md
new file mode 100644
index 0000000000..ced135761d
--- /dev/null
+++ b/docs/code/KeyValues.md
@@ -0,0 +1,149 @@
+# Key Values
+
+Most commonly, key-values are used to encode items where it is convenient or
+efficient to represent values using numbers, but you want to maintain the
+logical "idea" that these numbers are keys indexing some underlying, implicit
+set of values, in a way more explicit than simply mapping to a number would
+allow you to do.
+
+A more formal description of key values and types is
+[here](IDataViewTypeSystem.md#key-types). *This* document's motivation is less
+to describe what key types and values are, and more to instead describe why
+key types are necessary and helpful things to have. Necessarily, this document,
+is more anecdotal in its descriptions to motivate its content.
+
+Let's take a few examples of transforms that produce keys:
+
+* The `TermTransform` forms a dictionary of unique observed values to a key.
+  The key type's count indicates the number of items in the set, and through
+  the `KeyValue` metadata "remembers" what each key is representing.
+
+* The `HashTransform` performs a hash of input values, and produces a key
+  value with count equal to the range of the hash function, which, if a b bit
+  hash was used, will produce a 2ᵇ hash.
+
+* The `CharTokenizeTransform` will take input strings and produce key values
+  representing the characters observed in the string.
+
+## Keys as Intermediate Values
+
+Explicitly invoking transforms that produce key values, and using those key
+values, is sometimes helpful. However, given that most trainers expect the
+feature vector to be a vector of floating point values and *not* keys, in
+typical usage the majority of usages of keys is as some sort of intermediate
+value on the way to that final feature vector. (Unless, say, doing something
+like preparing labels for a multiclass learner.)
+
+So why not go directly to the feature vector, and forget this key stuff?
+Actually, to take text as the canonical example, we used to. However, by
+structuring the transforms from, say, text to key to vector, rather than text
+to vector *directly*, we are able to simplify a lot of code on the
+implementation side, which is both less for us to maintain, and also for users
+gives consistency in behavior.
+
+So for example, the `CharTokenize` above might appear to be a strange choice:
+*why* represent characters as keys? The reason is that the ngram transform is
+written to ingest keys, not text, and so we can use the same transform for
+both the n-gram featurization of words, as well as n-char grams.
+
+Now, much of this complexity is hidden from the user: most users will just use
+the `text` transform, select some options for n-grams, and chargrams, and not
+be aware of these internal invisible keys. Similarly, use the categorical or
+categorical hash transforms, without knowing that internally it is just the
+term or hash transform followed by a `KeyToVector` transform. But, keys are
+still there, and it would be impossible to really understand ML.NET's
+featurization pipeline without understanding keys. Any user that wants to
+understand how, say, the text transform resulted in a particular featurization
+will have to inspect the key values to get that understanding.
+
+## Keys are not Numbers
+
+As an actual CLR data type, key values are stored as some form of unsigned
+integer (most commonly `uint`). The most common confusion that arises from
+this is to ascribe too much importance to the fact that it is a `uint`, and
+think these are somehow just numbers. This is incorrect.
+
+For keys, the concept of order and difference has no inherent, real meaning as
+it does for numbers, or at least, the meaning is different and highly domain
+dependent. Consider a numeric `U4` type, with values `0`, `1`, and `2`. The
+difference between `0` and `1` is `1`, and the difference between `1` and `2`
+is `1`, because they're numbers. Very well: now consider that you train a term
+transform over the input tokens `apple`, `pear`, and `orange`: this will also
+map to the keys logically represented as the numbers `0`, `1`, and `2`
+respectively. Yet for a key, is the difference between keys `0` and `1`, `1`?
+No, the difference is `0` maps to `apple` and `1` to `pear`. Also order
+doesn't mean one key is somehow "larger," it just means we saw one before
+another -- or something else, if sorting by value happened to be selected.
+
+Also: ML.NET's vectors can be sparse. Implicit entries in a sparse vector are
+assumed to have the `default` value for that type -- that is, implicit values
+for numeric types will be zero. But what would be the implicit default value
+for a key value be? Take the `apple`, `pear`, and `orange` example above -- it
+would inappropriate for the default value to be `0`, because that means the
+result is `apple`, would be appropriate. The only really appropriate "default"
+choice is that the value is unknown, that is, missing.
+
+An implication of this is that there is a distinction between the logical
+value of a key-value, and the actual physical value of the value in the
+underlying type. This will be covered more later.
+
+## As an Enumeration of a Set: `KeyValues` Metadata
+
+While keys can be used for many purposes, they are often used to enumerate
+items from some underlying set. In order to map keys back to this original
+set, many transform producing key values will also produce `KeyValues`
+metadata associated with that output column.
+
+Valid `KeyValues` metadata is a vector of length equal to the count of the
+type of the column. This can be of varying types: it is often text, but does
+not need to be. For example, a `term` applied to a column would have
+`KeyValue` metadata of item type equal to the item type of the input data.
+
+How this metadata is used downstream depends on the purposes of who is
+consuming it, but common uses are: in multiclass classification, for
+determining the human readable class names, or if used in featurization,
+determining the names of the features.
+
+Note that `KeyValues` data is optional, and sometimes is not even sensible.
+For example, if we consider a clustering algorithm, the prediction of the
+cluster of an example would. So for example, if there were five clusters, then
+the prediction would indicate the cluster by `U4<0-4>`. Yet, these clusters
+were found by the algorithm itself, and they have no natural descriptions.
+
+## Actual Implementation
+
+This may be of use only to writers or extenders of ML.NET, or users of our
+API. How key values are presented *logically* to users of ML.NET, is distinct
+from how they are actually stored *physically* in actual memory, both in
+ML.NET source and through the API. For key values:
+
+* All key values are stored in unsigned integers.
+* The missing key values is always stored as `0`. See the note above about the
+  default value, to see why this must be so.
+* Valid non-missing key values are stored from `1`, onwards, irrespective of
+whatever we claim in the key type that minimum value is.
+
+So when, in the prior example, the term transform would map `apple`, `pear`,
+and `orange` seemingly to `0`, `1`, and `2`, values of `U4<0-2>`, in reality,
+if you were to fire up the debugger you would see that they were stored with
+`1`, `2`, and `3`, with unrecognized values being mapped to the "default"
+missing value of `0`.
+
+Nevertheless, we almost never talk about this, no more than we would talk
+about our "strings" really being implemented as string slices: this is purely
+an implementation detail, relevant only to people working with key values at
+the source level. To a regular non-API user of ML.NET, key values appear
+*externally* to be simply values, just as strings appear to be simply strings,
+and so forth.
+
+There is another implication: a hypothetical type `U1<4000-4002>` is actually
+a sensible type in this scheme. The `U1` indicates that is stored in one byte,
+which would on first glance seem to conflict with values like `4000`, but
+remember that the first valid key-value is stored as `1`, and we've identified
+the valid range as spanning the three values 4000 through 4002. That is,
+`4000` would be represented physically as `1`.
+
+The reality cannot be seen by any conventional means I am aware of, save for
+viewing ML.NET's workings in the debugger or using the API and inspecting
+these raw values yourself: that `4000` you would see is really stored as the
+`byte` `1`, `4001` as `2`, `4002` as `3`, and a missing value stored as `0`.
\ No newline at end of file
diff --git a/docs/code/VBufferCareFeeding.md b/docs/code/VBufferCareFeeding.md
new file mode 100644
index 0000000000..1de7239dc6
--- /dev/null
+++ b/docs/code/VBufferCareFeeding.md
@@ -0,0 +1,270 @@
+# `VBuffer` Care and Feeding
+
+The `VBuffer` is ML.NET's central vector type, used throughout our data
+pipeline and many other places to represent vectors of values. For example,
+nearly all trainers accept feature vectors as `VBuffer<float>`.
+
+## Technical `VBuffers`
+
+A `VBuffer<T>` is a generic type that supports both dense and sparse vectors
+over items of type `T`. This is the representation type for all
+[`VectorType`](IDataViewTypeSystem.md#vector-representations) instances in the
+`IDataView` ecosystem. When an instance of this is passed to a row cursor
+getter, the callee is free to take ownership of and re-use the arrays
+(`Values` and `Indices`).
+
+A `VBuffer<T>` is a struct, and has the following `readonly` fields:
+
+* `int Length`: The logical length of the buffer.
+
+* `int Count`: The number of items explicitly represented. This equals `Length`
+when the representation is dense and is less than `Length` when sparse.
+
+* `T[] Values`: The values. Only the first `Count` of these are valid.
+
+* `int[] Indices`: The indices. For a dense representation, this array is not
+  used, and may be `null`. For a sparse representation it is parallel to
+  values and specifies the logical indices for the corresponding values. Only
+  the first `Count` of these are valid.
+
+`Values` must have length equal to at least `Count`. If the representation is
+sparse, that is, `Count < Length`, then `Indices` must have length also
+greater than or equal to `Count`. If `Count == 0`, then it is entirely legal
+for `Values` or `Indices` to be `null`, and if dense then `Indices` can always
+be `null`.
+
+On the subject of `Count == 0`, note that having no valid values in `Indices`
+and `Values` merely means that no values are explicitly defined, and the
+vector should be treated, logically, as being filled with `default(T)`.
+
+For sparse vectors, `Indices` must have length equal to at least `Count`, and
+the first `Count` indices must be increasing, with all indices between `0`
+inclusive and `Length` exclusive.
+
+Regarding the generic type parameter `T`, the only real assumption made about
+this type is that assignment (that is, using `=`) is sufficient to create an
+*independent* copy of that item. All representation types of the [primitive
+types](IDataViewTypeSystem.md#standard-column-types) have this property (e.g.,
+`DvText`, `DvInt4`, `Single`, `Double`, etc.), but for example, `VBuffer<>`
+itself does not have this property. So, no `VBuffer` of `VBuffer`s for you.
+
+## Sparse Values as `default(T)`
+
+Any implicit value in a sparse `VBuffer<T>` **must** logically be treated as
+though it has value `default(T)`. For example, suppose we have the following
+two declarations:
+
+```csharp
+var a = new VBuffer<float>(5, new float[] { 0, 1, 0, 0, 2 });
+var b = new VBuffer<float>(5, 2, new float[] { 1, 2 }, new int[] { 1, 4 });
+```
+
+Here, `a` is dense, and `b` is sparse. However, any operations over either
+must treat the logical indices `0`, `2`, and `3` as if they have value `0.0f`.
+The two should be equivalent!
+
+ML.NET throughout its codebase assumes in many places that sparse and dense
+representations are interchangeable: if it is more efficient to consider
+something sparse or dense, the code will have no qualms about making that
+conversion. This does mean though, that we depend upon all code that deals
+with `VBuffer` responding in the same fashion, and respecting this convention.
+
+As a corollary to the above note about equivalence of sparse and dense
+representations, since they are equivalent it follows that any code consuming
+`VBuffer`s must work equally well with *both*. That is, there must never be a
+condition where data is read and assumed to be either sparse, or dense, since
+implementers of `IDataView` and related interfaces are perfectly free to
+produce either.
+
+The only "exception" to this rule is a necessary acknowledgment of the reality
+of floating point mathematics: sometimes due to the way the JIT will optimize
+code one code path or another, and due to the fact that floating point math is
+not commutative, operations over sparse `VBuffer<float>` or `VBuffer<double>`
+vectors can sometimes result in modestly different results than the "same"
+operation over dense values.
+
+## Why Buffer Reuse
+
+The question is often asked by people new to this codebase: why bother with
+buffer reuse at all? Without going into too many details, we used to not and
+suffered for it. We had a far simpler system where examples were yielded
+through an
+[`IEnumerable<>`](https://msdn.microsoft.com/en-us/library/9eekhta0.aspx), and
+our vector type at the time had `Indices` and `Values` arrays as well, but
+their sizes were there actual sizes, and being returned through an
+`IEnumerable<>` there was no plausible way to "recycle" the buffers.
+
+Also: who "owned" a fetched example (the caller, or callee) was not clear.
+Because it was not clear, code was inevitably written and checked in that made
+*either* assumption, which meant, ultimately, that everything that touched
+these would try to duplicate everything by default, because doing anything
+else would fail in some case.
+
+The reason why this becomes important is because [garbage
+collection](https://msdn.microsoft.com/en-us/library/0xy59wtx.aspx) in the
+.NET framework is not free. Creating and destroying these arrays *can* be
+cheap, provided that they are sufficiently small, short lived, and only ever
+exist in a single thread. But, violate any of these, there is a possibility
+these arrays could be allocated on the large object heap, or promoted to gen-2
+collection. The results could be disastrous: in one particularly memorable
+incident regarding neural net training, the move to `IDataView` and its
+`VBuffer`s resulted in a more than tenfold decrease in runtime performance,
+because under the old regime the garbage collection of the feature vectors was
+just taking so much time.
+
+This is somewhat unfortunate: a joke-that's-not-really-a-joke on the team was
+that we were writing C# as though it were C code. Be that as it may, buffer
+reuse is essential to our performance, especially on larger problems.
+
+This design requirement of buffer reuse has deeper implications for the
+ecosystem merely than the type here. For example, it is one crucial reason why
+so many value accessors in the `IDataView` ecosystem fill in values passed in
+through a `ref` parameter, rather than, say, being a return value.
+
+## Buffer Re-use as a User
+
+Let's imagine we have an `IDataView` in a variable `dataview`, and we just so
+happen to know that the column with index 5 has representation type
+`VBuffer<float>`. (In real code, this would presumably we achieved through
+more complicated involving an inspection of `dataview.Schema`, but we omit
+such details here.)
+
+```csharp
+using (IRowCursor cursor = dataview.GetRowCursor(col => col == 5))
+{
+    ValueGetter<VBuffer<float>> getter = cursor.GetGetter<VBuffer<float>>(5);
+    var value = default(VBuffer<float>);
+    while (cursor.MoveNext())
+    {
+        getter(ref value);
+        // Presumably something else is done with value.
+    }
+}
+```
+
+In this example, we open a cursor (telling it to make only column 5 active),
+then get the "getter" over this column. What enables buffer re-use for this is
+that, as we go row by row over the data with the `while` loop, we pass in the
+same `value` variable in to the `getter` delegate, again and again. Presumably
+the first time, or several, memory is allocated. Initially `value =
+default(VBuffer<float>)`, that is, it has zero `Length` and `Count` and `null`
+`Indices` and `Values`. Presumably at some point, probably the first call,
+`value` is replaced with a `VBuffer<float>` that has actual values allocated.
+In subsequent calls, perhaps these are judged as insufficiently large, and new
+arrays are allocated, but we would expect at some point the arrays would
+become "large enough" to accommodate many values, so reallocations would
+become increasingly rare.
+
+A common mistake made by first time users is to do something like move the
+`var value` declaration inside the `while` loop, thus dooming `getter` to have
+to allocate the arrays every single time, completely defeating the purpose of
+buffer reuse.
+
+## Buffer Re-use as a Developer
+
+Nearly all methods in ML.NET that "return" a `VBuffer<T>` do not really return
+a `VBuffer<T>` *at all*, but instead have a parameter `ref VBuffer<T> dst`,
+where they are expected to put the result. See the above example, with the
+`getter`. A `ValueGetter` is defined:
+
+```csharp
+public delegate void ValueGetter<TValue>(ref TValue value);
+```
+
+Let's describe the typical practice of "returning" a `VBuffer` in, say, a
+`ref` parameter named `dst`: if `dst.Indices` and `dst.Values` are
+sufficiently large to contain the result, they are used, and the value is
+calculated, or sometimes copied, into them. If either is insufficiently large,
+then a new array is allocated in its place. After all the calculation happens,
+a *new* `VBuffer` is constructed and assigned to `dst`. (And possibly, if they
+were large enough, using the same `Indices` and `Values` arrays as were passed
+in, albeit with different values.)
+
+`VBuffer`s can be either sparse or dense. However, even when returning a dense
+`VBuffer`, you would not discard the `Indices` array of the passed in buffer,
+assuming there was one. The `Indices` array was merely larger than necessary
+to store *this* result: that you happened to not need it this call does not
+justify throwing it away. We don't care about buffer re-use just for a single
+call, after all! The dense constructor for the `VBuffer` accepts an `Indices`
+array for precisely this reason!
+
+Also note: when you return a `VBuffer` in this fashion, the caller is assumed
+to *own* it at that point. This means they can do whatever they like to it,
+like pass the same variable into some other getter, or modify its values.
+Indeed, this is quite common: normalizers in ML.NET get values from their
+source, then immediately scale the contents of `Values` appropriately. This
+would hardly be possible if the callee was considered to have some stake in
+that result.
+
+There is a corollary on this point: because the caller owns any `VBuffer`,
+then you shouldn't do anything that irrevocably destroys their usefulness to
+the caller. For example, consider this method that takes a vector `src`, and
+stores the scaled result in `dst`.
+
+```csharp
+VectorUtils.ScaleBy(ref VBuffer<float> src, ref VBuffer<float> dst, float c)
+```
+
+What this does is, copy the values from `src` to `dst`, while scaling each
+value seen by `c`.
+
+One possible alternate (wrong) implementation of this would be to just say
+`dst=src` then scale all contents of `dst.Values` by `c`. But, then `dst` and
+`src` would share references to their internal arrays, completely compromising
+the caller's ability to do anything useful with them: if the caller were to
+pass `dst` into some other method that modified it, this could easily
+(silently!) modify the contents of `src`. The point is: if you are writing
+code *anywhere* whose end result is that two distinct `VBuffer` structs share
+references to their internal arrays, you've almost certainly introduced a
+**nasty** pernicious bug for your users.
+
+## Utilities for Working with `VBuffer`s
+
+ML.NET's runtime code has a number of utilities for operating over `VBuffer`s
+that we have written to be generally useful. We will not treat on these in
+detail here, but:
+
+* `Microsoft.ML.Runtime.Data.VBuffer<T>` itself contains a few methods for
+  accessing and iterating over its values.
+
+* `Microsoft.ML.Runtime.Internal.Utilities.VBufferUtils` contains utilities
+  mainly for non-numeric manipulation of `VBuffer`s.
+
+* `Microsoft.ML.Runtime.Numeric.VectorUtils` contains math operations
+  over `VBuffer<float>` and `float[]`, like computing norms, dot-products, and
+  whatnot.
+
+* `Microsoft.ML.Runtime.Data.BufferBuilder<T>` is an abstract class whose
+  concrete implementations are used throughout ML.NET to build up `VBuffer<T>`
+  instances. Note that if one *can* simply build a `VBuffer` oneself easily
+  and do not need the niceties provided by the buffer builder, you should
+  probably just do it yourself.
+
+* `Microsoft.MachineLearning.Internal.Utilities.EnsureSize` is often useful to
+ensure that the arrays are of the right size.
+
+## Golden Rules
+
+Here are some golden rules to remember:
+
+Remember the conditions under which `Indices` and `Values` can be `null`! A
+developer forgetting that `null` values for these fields are legal is probably
+the most common error in our code. (And unfortunately one that sometimes takes
+a while to pop up: most users don't feed in empty inputs to our trainers.)
+
+In terms of accessing anything in `Values` or `Indices`, remember, treat
+`Count` as the real length of these arrays, not the actual length of the
+arrays.
+
+If you write code that results in two distinct `VBuffer`s sharing references
+to their internal arrays, (e.g., there are two `VBuffer`s `a` and `b`, with
+`a.Indices == b.Indices` with `a.Indices != null`, or `a.Values == b.Values`
+with `a.Values != null`) then you've almost certainly done something wrong.
+
+Structure your code so that `VBuffer`s have their buffers re-used as much as
+possible. If you have code called repeatedly where you are passing in some
+`default(VBuffer<T>)`, there's almost certainly an opportunity there.
+
+When re-using a `VBuffer` that's been passed to you, remember that even when
+constructing a dense vector, you should still re-use the `Indices` array that
+was passed in.
\ No newline at end of file
diff --git a/docs/release-notes/0.1/release-0.1.md b/docs/release-notes/0.1/release-0.1.md
index def4723a31..a36055527a 100644
--- a/docs/release-notes/0.1/release-0.1.md
+++ b/docs/release-notes/0.1/release-0.1.md
@@ -13,7 +13,7 @@ dotnet add package Microsoft.ML
 
 From package manager:
 ```
-Install-Package Microsoft.ML	
+Install-Package Microsoft.ML
 ```
 
 Or from within Visual Studio's NuGet package manager.
diff --git a/src/Microsoft.ML.Core/Data/ICursor.md b/src/Microsoft.ML.Core/Data/ICursor.md
new file mode 100644
index 0000000000..403107acc6
--- /dev/null
+++ b/src/Microsoft.ML.Core/Data/ICursor.md
@@ -0,0 +1,174 @@
+﻿# `ICursor` Notes
+
+This document includes some more in depth notes on some expert topics for
+`ICursor` implementations.
+
+## `Batch`
+
+Some cursorable implementations, like `IDataView`, can through
+`GetRowCursorSet` return a set of parallel cursors that partition the sequence
+of rows as would have normally been returned through a plain old
+`GetRowCursor`, just sharded into multiple cursors. These cursors can be
+accessed across multiple threads to enable parallel evaluation of a data
+pipeline. This is key for the data pipeline performance.
+
+However, even though the data pipeline can perform this parallel evaluation,
+at the end of this parallelization we usually ultimately want to recombine the
+separate thread's streams back into a single stream. This is accomplished
+through `Batch`.
+
+So, to review what actually happens in ML.NET code: multiple cursors are
+returned through a method like `IDataView.GetRowCursorSet`. Operations can
+happen on top of these cursors -- most commonly, transforms creating new
+cursors on top of them -- and the `IRowCursorConsolidator` implementation will
+utilize this `Batch` field to "reconcile" the multiple cursors back down into
+one cursor.
+
+It may help to first understand this process intuitively, to understand
+`Batch`'s requirements: when we reconcile the outputs of multiple cursors, the
+consolidator will take the set of cursors. It will find the one with the
+"lowest" `Batch` ID. (This must be uniquely determined: that is, no two
+cursors should ever return the same `Batch` value.) It will iterate on that
+cursor until the `Batch` ID changes. Whereupon, the consolidator will find the
+next cursor with the next lowest batch ID (which should be greater, of course,
+than the `Batch` value we were just iterating on).
+
+Put another way: if we called `GetRowCursor` (possibly with an `IRandom`
+instance), and we store all the values from the rows from that cursoring in
+some list, in order. Now, imagine we create `GetRowCursorSet` (with an
+identically constructed `IRandom` instance), and store the values from the
+rows from the cursorings from all of them in a different list, in order,
+accompanied by their `Batch` value. Then: if we were to perform a *stable*
+sort on the second list keyed by the stored `Batch` value, it should have
+content identical to the first list.
+
+So: `Batch` is a `long` value associated with every `ICounted` implementation
+(including implementations of `ICursor`). This quantity must be:
+
+Non-decreasing as we call `MoveNext` or `MoveMany`. That is, it is fine for
+the `Batch` to repeat the same batch value within the same cursor (though not
+across cursors from the same set), but any change in the value must be an
+increase.
+
+The requirement of consistency is for one cursor or cursors from a *single*
+call to `GetRowCursor` or `GetRowCursorSet`. It is not required that the
+`Batch` be consistent among multiple independent cursorings.
+
+## `MoveNext` and `MoveMany`
+
+Once `MoveNext` or `MoveMany` returns `false`, naturally all subsequent calls
+to either of these two methods should return `false`. It is important that
+they not throw, return `true`, or have any other behavior.
+
+## `GetIdGetter`
+
+This treats on the requirements of a proper `GetIdGetter` implementation.
+
+It is common for objects to serve multiple `ICounted` instances to iterate
+over what is supposed to be the same data, e.g., in an `IDataView` a cursor
+set will produce the same data as a serial cursor, just partitioned, and a
+shuffled cursor will produce the same data as a serial cursor or any other
+shuffled cursor, only shuffled. The ID exists for applications that need to
+reconcile which entry is actually which. Ideally this ID should be unique, but
+for practical reasons, it suffices if collisions are simply extremely
+improbable.
+
+To be specific, the original case motivating this functionality was SDCA where
+it is both simultaneously important that we see data in a "random-enough"
+fashion (so shuffled), but each instance has an associated dual variable. The
+ID is used to associate each instance with the corresponding dual variable
+across multiple iterations of the data. (Note that in this specific
+application collisions merely being improbable is sufficient, since if there
+was hypothetically a collision it would not actually probably materially
+affect the results anyway, though I'm making that claim without
+justification).
+
+Note that this ID, while it must be consistent for multiple streams according
+to the semantics above, is not considered part of the data per se. So, to take
+the example of a data view specifically, a single data view must render
+consistent IDs across all cursorings, but there is no suggestion at all that
+if the "same" data were presented in a different data view (as by, say, being
+transformed, cached, saved, or whatever), that the IDs between the two
+different data views would have any discernable relationship.
+
+Since this ID is practically often derived from the IDs of some other
+`ICounted` (e.g., for a transform, the IDs of the output are usually derived
+from the IDs of the input), it is not only necessary to claim that the ID
+generated here is probabilistically unique, but also describe a procedure or
+set of guidelines implementors of this method should attempt to follow, in
+order to ensure that downstream components have a fair shake at producing
+unique IDs themselves.
+
+Duplicate IDs being improbable is practically accomplished with a
+hashing-derived mechanism. For this we have the `UInt128` methods `Fork`,
+`Next`, and `Combine`. See their documentation for specifics, but they all
+have in common that they treat the `UInt128` as some sort of intermediate hash
+state, then return a new hash state based on hashing of a block of additional
+'bits.' (Since the bits hashed may be fixed, depending on the operation, this
+can be very efficient.) The basic assumption underlying all of that collisions
+between two different hash states on the same data, or hashes on the same hash
+state on different data, are unlikely to collide. Note that this is also the
+reason why `UInt128` was introduced; collisions become likely when we have the
+number of elements on the order of the square root of the hash space. The
+square root of `UInt64.MaxValue` is only several billion, a totally reasonable
+number of instances in a dataset, whereas a collision in a 128-bit space is
+less likely.
+
+Let's consider the IDs of a collection of entities, then, to be ideally an
+"acceptable set." An "acceptable set" is one that is not especially or
+perversely likely to contain collisions versus other sets, and also one
+unlikely to result in an especially or perversely likely to collide set of
+IDs, so long as the IDs are done according to the following operations that
+operate on acceptable sets.
+
+1. The simple enumeration of `UInt128` numeric values from any number is an
+   acceptable set. (This covers how most loaders generate IDs. Typically, we
+   start from 0, but other choices, like -1, are acceptable.)
+
+2. The subset of any acceptable set is an acceptable set. (For example, all
+   filter transforms that map any input row to 0 or 1 output rows, can just
+   pass through the input cursor's IDs.)
+
+3. Applying `Fork` to every element of an acceptable set exactly once will
+   result in an acceptable set.
+
+4. As a generalization of the above, if for each element of an acceptable set,
+   you built the set comprised of the single application of `Fork` on that ID
+   followed by the set of any number of application of `Next`, the union of
+   all such sets would itself be an acceptable set. (This is useful, for
+   example, for operations that produce multiple items per input item. So, if
+   you produced two rows based on every single input row, if the input ID were
+   _id_, then, the ID of the first row could be `Fork` of _id_, and the second
+   row could have ID of `Fork` then `Next` of the same _id_.)
+
+5. If you have potentially multiple acceptable sets, while the union of them
+   obviously might not be acceptable, if you were to form a mapping from each
+   set, to a different ID of some other acceptable set (each such ID should be
+   different), and then for each such set/ID pairing, create the set created
+   from `Combine` of the items of that set with that ID, and then union of
+   those sets will be acceptable. (This is useful, for example, if you had
+   something like a join, or a Cartesian product transform, or something like
+   that.)
+
+6. Moreover, similar to the note about the use of `Fork`, and `Next`, if
+   during the creation of one of those sets describe above, you were to form
+   for each item of that set, a set resulting from multiple applications of
+   `Next`, the union of all those would also be an acceptable set.
+
+This list is not exhaustive. Other operations I have not listed above might
+result in an acceptable set as well, but one should not attempt other
+operations without being absolutely certain of what one is doing. The general
+idea is that one should structure the construction of IDs, so that it will
+never arise that the same ID is hashed against the same data, and are
+introduced as if we expect them to be two separate IDs.
+
+Of course, with a malicious actor upstream, collisions are possible and can be
+engineered quite trivially (e.g., just by returning a constant ID for all
+rows), but we're not supposing that the input `IDataView` is maliciously
+engineering hash states, or applying the operations above in any strange way
+to attempt to induce collisions. E.g., you could take, operation 1, define it
+to be the enumeration of all `UInt128` values, then take operation 2 to select
+out specifically those that are hash states that will result in collisions.
+But I'm supposing this is not happening. If you are running an implementation
+of a dataview in memory that you're supposing is malicious, you probably have
+bigger problems than someone inducing collisions.
\ No newline at end of file
diff --git a/src/Microsoft.ML.Data/Transforms/TermTransform.md b/src/Microsoft.ML.Data/Transforms/TermTransform.md
new file mode 100644
index 0000000000..d245fda91b
--- /dev/null
+++ b/src/Microsoft.ML.Data/Transforms/TermTransform.md
@@ -0,0 +1,41 @@
+﻿# `TermTransform` Architecture
+
+The term transform takes one or more input columns, and builds a map mapping
+observed values into a key type, with various options. This requires first
+that we build a map given observed data, and then later have a means of
+applying that map to new data. There are four helper classes of objects to
+perform this task. We describe them here.
+
+* `Builder` instances can have different behavior depending on the item type
+  of the input, and whether we are sorting the input. They have mutable state.
+  Crucially they work over only primitive types, and are not aware of whether
+  the input data is vector or scalar. As their name implies they are stateful
+  objects.
+
+* `Trainer` objects wrap a builder, and have different implementations
+  depending on whether their input is vector or scalar. They are also
+  responsible for making sure the number of values accumulated does not exceed
+  the max terms limit. During the term transform's training, these objects are
+  constructed given a row on a particular column, and during training a method
+  is called to process that row.
+
+The above two classes of objects will be created and in existence only when
+the transform is being trained, that is, in the non-deserializing constructor,
+and will not be persisted beyond that point.
+
+* `TermMap` objects are created from builder objects, and are the final term
+  map. These are sort of the frozen immutable cousins of builders. Like
+  builders they work over primitive types. These objects are the ones
+  responsible for serialization and deserialization to the model stream and
+  other informational streams, construction of the per-item value mapper
+  delegates, and accessors for the term values used in constructing the
+  metadata (though they do not handle the actual metadata functions
+  themselves). Crucially, these objects can be shared among multiple term
+  transforms or multiple columns, and are not associated themselves with a
+  particular input dataview or column per se.
+
+* `BoundTermMap` objects are bound to a particular dataview, and a particular
+  column. They are responsible for the polymorphism depending on whether the
+  column they're mapping is vector or scalar, the creation of the metadata
+  accessors, and the creation of the actual getters (though, of course, they
+  rely on the term map to do this).