-
Notifications
You must be signed in to change notification settings - Fork 0
The Tabular Data Type
Within a dataset, there are typically columns that can best be described as reference labels. Reference label values are usually textual. They wouldn't be used as operands in a mathematical calculation. A "reference label" can be a name, color, type, size, shape, or any word or string that might be used to further describe and identify a particular dataset row. Quite often, the reference label values will repeat themselves over and over again within the dataset.
Consider some sample data generated by the Noodle demo. The Letter, Answer, Item, Category, and Color columns have repetitive, non-numeric values. These columns would be good candidates for being stored as tabular.
Seq Letter Answer Item Category Color 1 D Maybe Grass Vegetable Green 2 K True Grass Vegetable Brown 9 Z Yes Tree Vegetable Green 57 N True Algae Vegetable Green 123 A False Moss Vegetable Yellow 1234 R No Fern Vegetable Green 2345 K Yes Grass Vegetable Blue 2525 D So Flower Vegetable Red 4567 A True Tree Vegetable White
Data stored as tabular consists of integer reference numbers that point to the various reference label values. The reference strings are stored in a separate array of distinct values; i.e a string table. String table arrays are kept in ascending sort order. A dataset reference number of n corresponds to the nth element of its string table array.
A blank dataset value has a reference number of zero. Blank values are never stored in the string table array.
Suppose that the Answer column is converted to tabular. Our sample dataset would be stored as shown below.
Seq Letter Answer Item Category Color 1 D 2 Grass Vegetable Green 2 K 5 Grass Vegetable Brown 9 Z 6 Tree Vegetable Green 57 N 5 Algae Vegetable Green 123 A 1 Moss Vegetable Yellow 1234 R 3 Fern Vegetable Green 2345 K 6 Grass Vegetable Blue 2525 D 4 Flower Vegetable Red 4567 A 5 Tree Vegetable White
The Answer strings are stored in an "Answer" string table array
element
numberreference
string1 False 2 Maybe 3 No 4 So 5 True 6 Yes
If one stores all the text columns as tabular, the dataset becomes a rectangular array of numbers. The tabular data type allows one to reduce data redundancy while maintaining a denormalized dataset.
Seq Letter Answer Item Category Color 1 2 2 4 1 3 2 3 5 4 1 2 9 6 6 6 1 3 57 4 5 1 1 3 123 1 1 5 1 6 1234 5 3 2 1 3 2345 3 6 4 1 1 2525 2 4 3 1 4 4567 1 5 6 1 5
The application developer will typically perform the string table lookups as needed for display to the user. Since the dataset cell contains the element number of its string table value, the data retrieval can be performed quickly. From the user's point of view, they are working with the data in its original, denormalized form.
// Get the reference number from the dataset
num = ds[row][col];
// Blank values have a reference number of zero
if (num === 0)
return "";
// Get the string
return table[num-1];
During editing, look for the user's input value in the table. If the input isn't found, append a new value to the string table. Save the resulting reference number to the dataset.
index = 0;
// Remove trailing blanks
str = val.replace(/\s+$/,'');
// Non-blank
if (str.length > 0) {
// Find the input value in the table
index = table.indexOf(str) + 1;
// Not found: append a new value to the table
if (index === 0)
index = table.push(str);
}
// Save the index value to the dataset
ds[row][col] = index;
Note that when new input strings are added to the string table, these items will sort last. When saving the dataset, stale string table entries are removed, the string table is re-sorted, and new reference numbers are assigned to the dataset rows.
When sorting an unedited dataset, one can use the reference numbers for comparisons rather than examining the string table values.
aval = ds[arow][col];
bval = ds[brow][col];
return aval - bval;
The tabular data type is a feature of the NoodleDatabase object, Noodle's preferred file format.
JavaScript uses double-precision (64-bit) floating point numbers. For this reason, the NoodleDatabase object uses packed binary numbers when storing table reference numbers (see the Roots functions pacbin and bunpac). In the small example above, 1 byte suffices. This allows for 255 table entries. Two byte reference numbers allow for 65535 entries.
The combination of using the tabular data type along with packed reference numbers can result in a dataset that takes substantially less space to store.