Understanding how spreadsheets are parsed

The main way in which an importer processes the contents of a spreadsheet is by mapping columns in it to attributes in the data model. Therefore, it's important to understand how to structure columns in the input file, and how the importer interprets these columns.

How to layout columns in the spreadsheet

Columns in the spreadsheet are expected to have headers with names, and these names are the cornerstone for active_importer to make sense of the structure of the data in it. Columns are referenced by their names (as given in the column header) and not by their relative position in the data sheet.

Let's make an example with the table below. Instead of instructing the importer to assume that the first column has the names, the second the department, and the third one the salary, we just tell the importers about the headers, and it'll do the rest.

Name	Department	Salary
John Doe	Accounting	$1800
Jane Doe	IT	$2100
Mary Jane	Sales	$1600

This feature permits more freedom in how the data is placed in the file. The order of the columns is not important, or their position in the tabular structure of the spreadsheet. As long as the correct data is placed under the correct header, you can have spreadsheets containing extra columns that will be ignored by the importer, or even empty columns in between. You don't even have to start placing significant data columns in the first column of the spreadsheet.

Let's make an importer for the table structure shown above:

class EmployeeImporter < ActiveImporter::Base
  imports Employee

  column 'Name', :full_name,
  column 'Department', :department_name,
  column 'Salary', :salary
end

Now, this importer will also work if given a spreadsheet with this other structure shown below. Note that there are empty columns, as well as extra data columns not used by the importer.

Salary	Name	Age	Department
$1800	John Doe	31	Accounting
$2100	Jane Doe	36	IT
$1600	Mary Jane	28	Sales

The important thing is to correctly name the column headers, not the order in which they are placed. A column's header will determine how the importer interprets it.

Where to place the header row

Also, referencing columns by header/name allows the importer to avoid any special requirements about where the first row of significant data should appear. Instead, it starts scanning the data sheet from the top, and as soon as a row is found that matches all the column headers declared, this row is assumed to be the header row, and all rows from then on are scanned for data to be imported.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding how spreadsheets are parsed

How to layout columns in the spreadsheet

Where to place the header row

Clone this wiki locally