Skip to content

Glossary

bmendell edited this page Jun 30, 2015 · 3 revisions

Glossary

AbstractDataAdapter

A class that generically supports most operations necessary to implement a Data Adapter. Can be easily extended to support specific data types.

Accumulo

A sorted, distributed key/value store that acts as a robust, scalable, high performance data storage and retrieval system.

Adapter

Known as a structural pattern, it’s used to identify a simple way to realize relationships between entities. Converts the interface of a class into another interface clients expect. Adapter lets classes work together that couldn’t otherwise because of incompatible interfaces.

Adapter Persistence

In the Adapter persistence table a pointer to the java class (expected to be on the classpath) is stored. This is loaded dynamically when the data is queried and results are translated to the native data type.

AdapterId

Used to uniquely identify data of a specific type that should be handled by a specific instance of an adapter.

attributes

A property of a SimpleFeature used to distinguish the feature or to provide additional information related to the SimpleFeature.

Binning

A way to group a number of more or less continuous values into a smaller number of "bins". In GeoWave, dimensions are represented by a consistent binning strategy through the use of a unit of time (currently day, month, or year) or choice of area size on a map (latitude/longitude degrees). This enables a dimension to define a methodology for applying bins to a full set of values which can be used by a general purpose space filling curve implementation or for filtering purposes.

ByteArray<obj>

A class that is a wrapper around a byte array containing the <obj> to ensure equals and hashcode operations use the values of the bytes rather than explicit object identity.

Common Index

These are a collection of attributes. There can be any number of attributes, but they must conform to the DimensionField interface - the attribute type must have a FieldReader and a FieldWriter that is within the classpath of the tablet servers. GeoWave provides a basic implementation for these attribute types: Boolean, Byte, Short, Float, Double, BigDecimal, Integer, Long, BigInteger, String, Geometry, Date, Calendar.

Constraint

range limitation on a dimension. Used for filtering of dimensions in an index.

DataAdapter

Describes how to serialize a data type. DataAdapter provides the communication between the Dataset and the Data source, and acts as a bridge between DataSet and store. This DataAdapter object is used to read the data from the store and bind that data to an object. DataAdapter is a disconnected oriented architecture. GeoWave allows the user to create their own data adapters. They determine how the data is actually stored (serialization/deserialization). A data adapter could theoretically take a dependency on ffmpeg, store the feature as metadata in a video stream, and persist that value to the database. A geowave data adapter is an implementation of the DataAdapter interface that handles the persistence serialization of whatever object being stored.

DataStore

A repository of a set of data objects. These objects are modelled using classes in java and serialized/deserialized according to the associated adapter. A data store is a general concept that includes not just repositories like databases, but also simpler store types such as flat files.

De-duplication

Process by which duplicates of objects returned from a query are deleted in client side software and not by iterators on the tablet servers

deserializer/decode

Convert a stream of bytes stored on disk, contained in a store or sent over the network into an object in memory

dimension

A topological measure of the size of its covering properties. Roughly speaking, it is the number of coordinates needed to specify a point on the object. For example, a rectangle is two-dimensional, while a cube is three-dimensional. Keep in mind an n-dimensional coordinate space can represent any set of properties and more than what is typically considered spatial.

Dimension Types

Types of objects that define the attributes and methods of a class which forms the Space Filling Curve dimension. Example types:

  • BasicDimensionDefinition: class defines a Space Filling Curve dimension as a minimum and maximum range with values linearly interpolated within the range. Values outside of the range will be clamped within the range.

  • LatitudeDefinition: a convenience class used to define a dimension which is associated with the Y axis on a Cartesian plane. (Minimum bounds = -90 and maximum bounds = 90)

  • LongitudeDefinition: a convenience class used to define a dimension which is associated with the X axis on a Cartesian plane. (Minimum bounds = -180 and maximum bounds = 180)

  • NumericDimensionDefinition: defines the attributes and methods of a class which forms the Space Filling Curve dimension

  • SFCDimensionDefinition: class wraps a dimension definition with a cardinality (bits of precision) on a space filling curve

  • TemporalBinningStrategy: class is useful for establishing a consistent binning strategy using a unit of time (currently day, month, or year). Each bin will then be defined by the boundaries of that unit within the timezone given in the constructor. So if the unit is year and the data spreads across 2011-2013, the bins will be 2011, 2012, and 2013. The unit chosen should represent a much more significant range than the average query range (at least 20x larger) for efficiency purposes. So if the average query is for a 24 hour period, the unit should not be a day, but could be perhaps a month or a year (depending on the temporal extent of the dataset).

  • TimeDefinition: a convenience class used to define a dimension which is associated with a time dimension.

  • UnboundedDimensionDefinition: Because space filling curves require an extent (minimum & maximum), the unbounded implementation relies on an external binning strategy to translate an unbounded variable into bounded bins

DimensionalityType (enum)

Defines the most commonly used type of dimensions

  • SPATIAL (Longitude, Latitude)

  • SPATIAL_TEMPORAL (Longitude, Latitude, Time)

Extended Data

Encoded Native Objects

feature

Something that can be drawn on a map that consists of coordinates and attributes such as location, time or timerange associated with a featureor any other specific information associated with the feature.

FeatureDataAdapter

This data adapter will handle all reading/writing concerns for storing and retrieving GeoTools SimpleFeature objects to and from a GeoWave persistent store in Accumulo. This adapter type is a subclass of the AbstractDataAdapter and supports a hierarchy of attribute types. Attributes are specific to features, ie. specific to FeatureDataAdapter. The AbstractDataAdapter does have a concept of “Fields” which map directly to attributes for the existing FeatureDataAdapter. The reason for this hierarchy has to do with extensibility vs. optimization. Storage of SimpleFeatures allows one to leverage the provided FeatureDataAdapter.

field

A data structure for a single piece of data. A set of fields comprise a record, which contain all the information within the table relevant to a specific entity.

field (custom)

A custom field is …

field (index)

An indexfield is an attribute field used as part of the index for accessing in the store. An IndexFieldHandler is used by the AbstractDataAdapter to translate between native values and persistence encoded values. The basic implementation of this will perform type matching on the index field type - for explicitly defining the supported dimensions, use DimensionMatchingIndexFieldHandler…

field (native)

A nativefield is an attribute field that is not used as part of the index An NativeFieldHandler is used by the AbstractDataAdapter to get individual field values from the row

FieldType

object type for a given field. These are some on the ones defined:

  • LatitudeField: This field can be used as a EPSG:4326 latitude dimension within GeoWave. It can utilize JTS geometry as the underlying spatial object for this dimension

  • LongitudeField: This field can be used as a EPSG:4326 longitude dimension within GeoWave. It can utilize JTS geometry as the underlying spatial object for this dimension.

  • SpatialField: A base class for EPSG:4326 latitude/longitude fields that use JTS geometry

  • TimeField: This field definition can be used for temporal data (either as a time range or a single instant in time).

FieldValue

value of the object for the given field

field id

unique identifier for the given field

fieldVisibility

Control the visibility in the DataStore for attributes of the object specific to a DataAdapter. For example, each attribute in a SimpleFeature type can have selective visibility when using the Feature mechanism for storing in GeoWave. Keep in mind that field visibility isn’t specific to FeatureDataAdapters. Since data adapters should be DataStore agnostic, any store that supports the concept of field visibility can be used (HBase supports a concept of visibility). Ideally the field visibility will remain as part of the adapters, and data stores can opt in to actually support it, if its applicable.

Filter

Mechanism by which a subset of data is returned from GeoWave based on selected criteria. Coarse-grain filtering is done by decomposing the query geometry into a series of ranges and creating a primary index based on a compact Hilbert space filling curve.

GeoTools

Open source Java library that provides tools for geospatial data.

GeoServer

Open source server for sharing geospatial data. Designed for interoperability, it provides open standard interfaces for data from any major spatial data source.

GeoWave

A library for storage, index, and search of multi-dimensional data on top of a sorted keyvalue datastore. GeoWave includes specific tailored implementations that have advanced support for OGC spatial types (up to 3 dimensions), and both bounded and unbounded temporal values. Both single and ranged values are supported on all axes. GeoWave’s geospatial support is built on top of the GeoTools extensibility model, so it plugins natively to GeoServer, uDig, and any other GeoTools compatible project – and can ingest GeoTools compatible data sources. GeoWave comes out of the box with an Accumulo implementation.

GeoWave Namespace

this is not an Accumulo namespace; rather think of it as a prefix geowave will use on any tables it creates. The only current constraint is only one index type is allowed per namespace.

GIS File Format

A GIS file format is a standard of encoding geographical information into a file. They are created mainly by government mapping agencies (such as the USGS or National Geospatial-Intelligence Agency) or by GIS software developers.

index

Method/class used to access data within GeoWave. Can consist of multiple dimensions including time. This class fully describes everything necessary to index data within GeoWave. The key components are the indexing strategy and the common index model. The index defines which attributes are indexed, and how that index is constructed.

IndexConstraints

The minimum and maximum allowed values specified for index dimensions used to support a space filling curve implementation

IndexField

the data that has been mapped from the Native data to the field that used in the Index. Defined by the Common Index Model.

IndexFieldHandler

used by the AbstractDataAdapter to translate between native values and persistence encoded values.

IndexModel

A set of fields that is common to all entries in the index (table). Entries can have null values for these common fields, and by default the fields that define each dimension of the index strategy are in the index model.

IndexStrategy

An interface for resolving insertion IDs given a set of values or ranges in a pre-defined set of dimensions. Every entry in the index should have non-null values/ranges in all dimensions. Used to …

ingest

Process of obtaining, importing, and processing data for later use or storage in a database. For GeoWave, the ingest process requires an adapter to translate the native data into a format that can be persisted into the data store. Also, the ingest process requires an Index which is a definition of all the configured parameters that defines how data translates to row IDs (how it is indexed) and what common fields need to be maintained within the table to be used by fine-grained and secondary filters.

Native Data

(composed of many native values) raw data to be stored in GeoWave. Typically a java object that will be converted to a binary stream (Base64 encoded chunk of data).

persist

store

persistence

The characteristic of state that outlives the process that created it. Without this capability, state would only exist in RAM and could be lost. In GeoWave, persistence relates to the storage of configuration information of the adapter used to convert data to/from a store, the data itself. Additionally, the index configuration and the statistics associated with an adapter and index are persisted.

Persistence encoded value

the representation of a ‘native value’ that can be stored or streamed

Primary Index

These are sets of data which are also used to construct the primary index (space filling curve). They will typically be geometry coordinates and optionally time - but could be any set of numeric values (think decomposed feature vectors, etc.). They cannot be null. The values that are not part of the primary index can be used for distributed secondary filtering, and can be null. The values that are associated with the primary index will be used for fine-grained filtering within an iterator.

query

Primary mechanism for retrieving information from the store. A query in GeoWave currently consists of a set of ranges on the dimensions of the primary index. Up to 3 dimensions (plus temporal optionally) can take advantage of any complex OGC geometry for the query window. For dimensions of 4 or greater the query can only be a set of ranges on each dimension (i.e. hyper-rectangle, etc.).

Raster graphic

A raster graphic is an image made of many tiny squares of color information, referred to as either pixels (on a monitor) or dots (on a printed page). The most common type of raster graphic is a photograph and typically comes in files named with extensions of jpg/jpeg, png, tiff and bmp. A Geo-centric raster graphic includes a raster image, gridded data and its associated metadata. These can be found in the form of Oracle’s GeoRaster and JPEG2000 formatting.

Self-Describing Data

GeoWave keeps data configuration, format, and other information needed to manipulate data in the database itself. This allows software to programmatically interrogate all the data stored in a single or set of GeoWave instances without needing bits of configuration from clients, application servers, or other external stores.

serializer/encode

Process of turning an object in memory into a stream of bytes so actions can be taken to store it on disk or send it over the network.

Serialization Provider

Set of classes that are specific to field types for converting the field values to/from a byte stream.

SimpleFeature

An instance of SimpleFeatureType that is composed of a fixed list of values in a known order. The definition of a "simple feature" can be summed up as the following:

  • made up of only non-complex attributes, no associations

  • attributes are of multiplicity 1

  • attributes are ordered

  • attribute names are unqualified (namespaceURI == null)

SimpleFeatureType

An OGC specification for defining geospatial features. Defines a simple feature model of attributes in a prescribed order by defining the names, types, and other metadata (nullable, etc) of a feature. Think of it as a Map of Name:Values where the values are typed. This interface also defines several helper methods that only make sense given the above constraints. Name conflict is not permitted (in order to allow lookup by a simple String). Leveraging this standard is one of the easiest ways to get GIS data into GeoWave. For reference these are the limitations of a SimpleFeatureType:

  • Attributes - properties limited to attributes only!

  • Attributes - List collection - ie. order of attributes matters

  • Attribute lookup by index

  • Attribute lookup by name (ie String)

  • getSuper() is null, required for point 3

Single Tier Index Strategy

(TBD)

Statistics

Adapters provide a set of statistics stored within a statistic store. The set of available statistics is specific to each adapter and the set of attributes for those data items managed by the adapter. Statistics include:

  • Ranges over an attribute, including time.

  • Enveloping bounding box over all geometries.

  • Cardinality of the number of stored items.

  • Histograms over the range of values for an attribute. (Optional)

  • Cardinality of discrete values of an attribute. (Optional)

Tiered Index Strategy

(TBD)

Time

The base interface for time values, could be either a time range (an interval) or a timestamp (an instant)

TimeField

Field definition that can be used for temporal data, either as a time range or a single instant in time

TimeRange

A range of time. A class that wraps a start instant and stop instant represented in milliseconds with a visibility tag for the field value.

TimeStamp

A class that wraps an instant of time represented in milliseconds and if desired, with a visibility tag for the field value.

Vector GIS File

In a GIS, geographical features are often expressed as vectors or objects representing features as geometrical shapes. There are different types of geometries to represent different geographical features: points to represent simple locations; one-dimensional lines to represent linear features; two-dimensional polygons that represent geographical features that cover part of the earth’s surface.

Vector Graphic

A vector graphic uses math, defined objects and coordinates to draw shapes using points, lines and curves with much less information. A raster image of a 1” x 1” square at 300 dpi will have 90,000 individual pieces of information and a vector image will only contain four points, one for each corner. The computer will use math to “connect the dots” and fill in all of the missing information. The most common type of vector graphic is an icon and typically comes in files named with extensions of png or gif.

visibility

A boolean AND (&) and OR (|) combination of authorization tokens. Authorization tokens are arbitrary strings taken from a restricted ASCII character set. Parentheses are required to specify order of operations in visibilities. Authorizations are sets of authorization tokens. Each Geowave user has authorizations and each scan of the data store has authorizations. Scan authorizations are only allowed to be a subset of the user’s authorizations. By default, a user’s authorizations set is empty. Examples:

  • A

  • A&B

  • apple&carrot|broccoli|spinach (wrong)

  • (apple&carrot)|broccoli|spinach