The Tabular Data Type

Within a dataset, there are typically columns that can best be described as reference labels. Reference label values are usually textual. They wouldn't be used as operands in a mathematical calculation. A "reference label" can be a name, color, type, size, shape, or any word or string that might be used to further describe and identify a particular dataset row. Quite often, the reference label values will repeat themselves over and over again within the dataset.

Consider some sample data generated by the Noodle demo. The Letter, Answer, Item, Category, and Color columns have repetitive, non-numeric values. These columns would be good candidates for being stored as tabular.

Seq Letter Answer Item Category Color

1 D Maybe Grass Vegetable Green

2 K True Grass Vegetable Brown

9 Z Yes Tree Vegetable Green

57 N True Algae Vegetable Green

123 A False Moss Vegetable Yellow

1234 R No Fern Vegetable Green

2345 K Yes Grass Vegetable Blue

2525 D So Flower Vegetable Red

4567 A True Tree Vegetable White

Data stored as tabular consists of integer reference numbers that point to the various reference label values. The reference strings are stored in a separate array of distinct values; i.e a string table. String table arrays are kept in ascending sort order. A dataset reference number of n corresponds to the nth element of its string table array.

A blank dataset value has a reference number of zero. Blank values are never stored in the string table array.

Suppose that the Answer column is converted to tabular. Our sample dataset would be stored as shown below.

Seq Letter Answer Item Category Color

1 D 2 Grass Vegetable Green

2 K 5 Grass Vegetable Brown

9 Z 6 Tree Vegetable Green

57 N 5 Algae Vegetable Green

123 A 1 Moss Vegetable Yellow

1234 R 3 Fern Vegetable Green

2345 K 6 Grass Vegetable Blue

2525 D 4 Flower Vegetable Red

4567 A 5 Tree Vegetable White

The Answer strings are stored in an "Answer" string table array

element
number reference
string

1 False

2 Maybe

3 No

4 So

5 True

6 Yes

If one stores all the text columns as tabular, the dataset becomes a rectangular array of numbers. The tabular data type allows one to reduce data redundancy while maintaining a denormalized dataset.

Seq Letter Answer Item Category Color

1 2 2 4 1 3

2 3 5 4 1 2

9 6 6 6 1 3

57 4 5 1 1 3

123 1 1 5 1 6

1234 5 3 2 1 3

2345 3 6 4 1 1

2525 2 4 3 1 4

4567 1 5 6 1 5

Usage

The application developer will typically perform the string table lookups as needed for display to the user. Since the dataset cell contains the element number of its string table value, the data retrieval can be performed quickly. From the user's point of view, they are working with the data in its original, denormalized form.

    // Get the reference number from the dataset
    num = ds[row][col];

    // Blank values have a reference number of zero 
    if (num === 0)
        return "";

    // Get the string
    return table[num-1];

During editing, look for the user's input value in the table. If the input isn't found, append a new value to the string table. Save the resulting reference number to the dataset.

    index = 0;

    // Remove trailing blanks
    str = val.replace(/\s+$/,'');

    // Non-blank
    if (str.length > 0) {

        // Find the input value in the table
        index = table.indexOf(str) + 1;

        // Not found: append a new value to the table
        if (index === 0)
            index = table.push(str);
    }

    // Save the index value to the dataset
    ds[row][col] = index;

Note that when new input strings are added to the string table, these items will sort last. When saving the dataset, stale string table entries are removed, the string table is re-sorted, and new reference numbers are assigned to the dataset rows.

When sorting an unedited dataset, one can use the reference numbers for comparisons rather than examining the string table values.

    aval = ds[arow][col];
    bval = ds[brow][col];
    return aval - bval;

Notes

The tabular data type is a feature of the NoodleDatabase object, Noodle's preferred file format.

JavaScript uses double-precision (64-bit) floating point numbers. For this reason, the NoodleDatabase object uses packed binary numbers when storing table reference numbers (see the Roots functions pacbin and bunpac). In the small example above, 1 byte suffices. This allows for 255 table entries. Two byte reference numbers allow for 65535 entries.

The combination of using the tabular data type along with packed reference numbers can result in a dataset that takes substantially less space to store.

Home

Noodle object

Roots library

List Processing Structures

the Tabular Data Type

Natural Merge Sort with Lists

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Tabular Data Type

Usage

Notes

Clone this wiki locally

Seq	Letter	Answer	Item	Category	Color
1	D	Maybe	Grass	Vegetable	Green
2	K	True	Grass	Vegetable	Brown
9	Z	Yes	Tree	Vegetable	Green
57	N	True	Algae	Vegetable	Green
123	A	False	Moss	Vegetable	Yellow
1234	R	No	Fern	Vegetable	Green
2345	K	Yes	Grass	Vegetable	Blue
2525	D	So	Flower	Vegetable	Red
4567	A	True	Tree	Vegetable	White