Skip to content
This repository has been archived by the owner on Nov 23, 2021. It is now read-only.

Understanding how spreadsheets are parsed

Ernesto García edited this page Feb 18, 2014 · 3 revisions

The main way in which an importer processes the contents of a spreadsheet is by mapping columns in it to attributes in the data model. Therefore, it's important to understand how to structure columns in the input file, and how the importer interprets these columns.

How to layout columns in the spreadsheet

Columns in the spreadsheet are expected to have headers with names, and these names are the cornerstone for active_importer to make sense of the structure of the data in it. Columns are referenced by their names (as given in the column header) and not by their relative position in the data sheet.

Let's make an example with the table below. Instead of instructing the importer to assume that the first column has the names, the second the department, and the third one the salary, we just tell the importers about the headers, and it'll do the rest.

Name Department Salary
John Doe Accounting $1800
Jane Doe IT $2100
Mary Jane Sales $1600

This feature permits more freedom in how the data is placed in the file. The order of the columns is not important, or their position in the tabular structure of the spreadsheet. As long as the correct data is placed under the correct header, you can have spreadsheets containing extra columns that will be ignored by the importer, or even empty columns in between. You don't even have to start placing significant data columns in the first column of the spreadsheet.

Let's make an importer for the table structure shown above:

class EmployeeImporter < ActiveImporter::Base
  imports Employee

  column 'Name', :full_name,
  column 'Department', :department_name,
  column 'Salary', :salary
end

Now, this importer will also work if given a spreadsheet with this other structure shown below. Note that there are empty columns, as well as extra data columns not used by the importer.

Salary Name Age Department
$1800 John Doe 31 Accounting
$2100 Jane Doe 36 IT
$1600 Mary Jane 28 Sales

The important thing is to correctly name the column headers, not the order in which they are placed. A column's header will determine how the importer interprets it.

Where to place the header row

Also, referencing columns by header/name allows the importer to avoid any special requirements about where the first row of significant data should appear. Instead, it starts scanning the data sheet from the top, and as soon as a row is found that matches all the column headers declared, this row is assumed to be the header row, and all rows from then on are scanned for data to be imported.