-
Notifications
You must be signed in to change notification settings - Fork 19
Understanding how spreadsheets are parsed
The main way in which an importer processes the contents of a spreadsheet is by mapping columns in it to attributes in the data model. Therefore, it's important to understand how to structure columns in the input file, and how the importer interprets these columns.
Columns in the spreadsheet are expected to have headers with names, and these names are the cornerstone for active_importer
to make sense of the structure of the data in it. Columns are referenced by their names (as given in the column header) and not by their relative position in the data sheet.
Let's make an example with the table below. Instead of instructing the importer to assume that the first column has the names, the second the department, and the third one the salary, we just tell the importers about the headers, and it'll do the rest.
Name | Department | Salary |
---|---|---|
John Doe | Accounting | $1800 |
Jane Doe | IT | $2100 |
Mary Jane | Sales | $1600 |
This feature permits more freedom in how the data is placed in the file. The order of the columns is not important, or their position in the tabular structure of the spreadsheet. As long as the correct data is placed under the correct header, you can have spreadsheets containing extra columns that will be ignored by the importer, or even empty columns in between. You don't even have to start placing significant data columns in the first column of the spreadsheet.
Let's make an importer for the table structure shown above:
class EmployeeImporter < ActiveImporter::Base
imports Employee
column 'Name', :full_name,
column 'Department', :department_name,
column 'Salary', :salary
end
Now, this importer will also work if given a spreadsheet with this other structure shown below. Note that there are empty columns, as well as extra data columns not used by the importer.
Salary | Name | Age | Department | ||
---|---|---|---|---|---|
$1800 | John Doe | 31 | Accounting | ||
$2100 | Jane Doe | 36 | IT | ||
$1600 | Mary Jane | 28 | Sales |
The important thing is to correctly name the column headers, not the order in which they are placed. A column's header will determine how the importer interprets it.
Also, referencing columns by header/name allows the importer to avoid any special requirements about where the first row of significant data should appear. Instead, it starts scanning the data sheet from the top, and as soon as a row is found that matches all the column headers declared, this row is assumed to be the header row, and all rows from then on are scanned for data to be imported.
© 2014 Continuum