Skip to content

The Tabular Data Type

Dan Kranz edited this page Dec 21, 2022 · 52 revisions

Within a dataset, there are typically columns that can best be described as reference labels. Reference label values are usually textual. They wouldn't be used as operands in a mathematical calculation. A "reference label" can be a name, color, type, size, shape, or any word or string that might be used to further describe and identify a particular dataset row. Quite often, the reference label values will repeat themselves over and over again within the dataset.

Consider some sample data generated by the Noodle demo. The Letter, Answer, Item, Category, and Color columns have repetitive, non-numeric values. These columns would be good candidates for being stored as tabular.

Seq Letter Answer Item Category Color
1 D Maybe Grass Vegetable Green
2 K True Grass Vegetable Brown
9 Z Yes Tree Vegetable Green
57 N True Algae Vegetable Green
123 A False Moss Vegetable Yellow
1234 R No Fern Vegetable Green
2345 K Yes Grass Vegetable Blue
2525 D So Flower Vegetable Red
4567 A True Tree Vegetable White


Data stored as tabular consists of integer reference numbers that point to the various reference label values. The reference strings are stored in a separate array of distinct values; i.e a string table. String table arrays are kept in ascending sort order. A dataset reference number of n corresponds to the nth element of its string table array.

A blank dataset value has a reference number of zero. Blank values are never stored in the string table array.


Suppose that the Answer column is converted to tabular. Our sample dataset would be stored as shown below.

Seq Letter Answer Item Category Color
1 D 2 Grass Vegetable Green
2 K 5 Grass Vegetable Brown
9 Z 6 Tree Vegetable Green
57 N 5 Algae Vegetable Green
123 A 1 Moss Vegetable Yellow
1234 R 3 Fern Vegetable Green
2345 K 6 Grass Vegetable Blue
2525 D 4 Flower Vegetable Red
4567 A 5 Tree Vegetable White


The Answer strings are stored in an "Answer" string table array

element
number
reference
string
1 False
2 Maybe
3 No
4 So
5 True
6 Yes


If one stores all the text columns as tabular, the dataset becomes a rectangular array of numbers. The tabular data type allows one to reduce data redundancy while maintaining a denormalized dataset.

Seq Letter Answer Item Category Color
1 2 2 4 1 3
2 3 5 4 1 2
9 6 6 6 1 3
57 4 5 1 1 3
123 1 1 5 1 6
1234 5 3 2 1 3
2345 3 6 4 1 1
2525 2 4 3 1 4
4567 1 5 6 1 5


Usage

The application developer will typically perform the string table lookups as needed for display to the user. Since the dataset cell contains the element number of its string table value, the data retrieval can be performed quickly. From the user's point of view, they are working with the data in its original, denormalized form.

    // Get the reference number from the dataset
    num = ds[row][col];

    // Blank values have a reference number of zero 
    if (num === 0)
        return "";

    // Get the string
    return table[num-1];


During editing, look for the user's input value in the table. If the input isn't found, append a new value to the string table. Save the resulting reference number to the dataset.

    index = 0;

    // Remove trailing blanks
    str = val.replace(/\s+$/,'');

    // Non-blank
    if (str.length > 0) {

        // Find the input value in the table
        index = table.indexOf(str) + 1;

        // Not found: append a new value to the table
        if (index === 0)
            index = table.push(str);
    }

    // Save the index value to the dataset
    ds[row][col] = index;

Note that when new input strings are added to the string table, these items will sort last. When saving the dataset, stale string table entries are removed, the string table is re-sorted, and new reference numbers are assigned to the dataset rows.


When sorting an unedited dataset, one can use the reference numbers for comparisons rather than examining the string table values.

    aval = ds[arow][col];
    bval = ds[brow][col];
    return aval - bval;


Notes

The tabular data type is a feature of the NoodleDatabase object, Noodle's preferred file format.

JavaScript uses double-precision (64-bit) floating point numbers. For this reason, the NoodleDatabase object uses packed binary numbers when storing table reference numbers (see the Roots functions pacbin and bunpac). In the small example above, 1 byte suffices. This allows for 255 table entries. Two byte reference numbers allow for 65535 entries.

The combination of using the tabular data type along with packed reference numbers can result in a dataset that takes substantially less space to store.

Clone this wiki locally