Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle values starting with double quotes in papaparse #1057

Closed
4integration opened this issue Jun 11, 2024 · 1 comment
Closed

How to handle values starting with double quotes in papaparse #1057

4integration opened this issue Jun 11, 2024 · 1 comment

Comments

@4integration
Copy link

Using latest Papaparse to parse large CSV files.
It handles double quotes in the value but not when value starts with double quotes.

Using this code:

const parsePromise = new Promise<void>((resolve, reject) => {
    Papa.parse<Equipment>(fileStream, {
        header: true,
        delimiter: "\t",
        dynamicTyping: true,
        skipEmptyLines: true,
        step: (result) => {
            const rowData = {
                vehicle_id: result.data.vehicle_id,
                schema_id: result.data.schema_id,
                option_id: result.data.option_id,
                record_id: result.data.record_id,
                location: result.data.location,
                data_value: result.data.data_value,
                condition: result.data.condition,
            };
            entities.push(rowData);
            console.log(rowData)
        },
        complete: () => resolve(),
        error: (error) => reject(error),
    });
});

If I have the following csv data:

vehicle_id	schema_id	option_id	record_id	location	data_value	condition
425972620240523	15102	1266	7700	W	"Första hjälpen"- förbandslåda med varningstriangel, 2 varselvästar	
425972620240523	15104	1266	7700	W	W	
425972620240523	15101	1266	7800	INT	S	
425972620240523	15102	1266	7800	INT	medical kit, warning triangle, 2 safety vests	
425972620240523	15104	1266	7800	INT	INT	
425972620240523	15101	1267	7900	W	S	
425972620240523	15102	1267	7900	W	Papperskorg (borttagbar)	

It outputs

{
  vehicle_id: 425972620240523,
  schema_id: 15102,
  option_id: 1266,
  record_id: 7700,
  location: 'W',
  data_value: 'Första hjälpen"- förbandslåda med varningstriangel, 2 varselvästar\t\r\n' +
    '425972620240523\t15104\t1266\t7700\tW\tW\t\r\n' +
    '425972620240523\t15101\t1266\t7800\tINT\tS\t\r\n' +
    '425972620240523\t15102\t1266\t7800\tINT\tmedical kit, warning triangle, 2 safety vests\t\r\n' +
    '425972620240523\t15104\t1266\t7800\tINT\tINT\t\r\n' +
    '425972620240523\t15101\t1267\t7900\tW\tS\t\r\n' +
    '425972620240523\t15102\t1267\t7900\tW\tPapperskorg (borttagbar)\t\r\n',
  condition: undefined
}

If I move the first double quote as in:

vehicle_id	schema_id	option_id	record_id	location	data_value	condition
425972620240523	15102	1266	7700	W	Första "hjälpen"- förbandslåda med varningstriangel, 2 varselvästar	
425972620240523	15104	1266	7700	W	W	
425972620240523	15101	1266	7800	INT	S	
425972620240523	15102	1266	7800	INT	medical kit, warning triangle, 2 safety vests	
425972620240523	15104	1266	7800	INT	INT	
425972620240523	15101	1267	7900	W	S	
425972620240523	15102	1267	7900	W	Papperskorg (borttagbar)	

The result is correct:

{
  vehicle_id: 425972620240523,
  schema_id: 15102,
  option_id: 1266,
  record_id: 7700,
  location: 'W',
  data_value: 'Första "hjälpen"- förbandslåda med varningstriangel, 2 varselvästar',
  condition: null
}
{
  vehicle_id: 425972620240523,
  schema_id: 15104,
  option_id: 1266,
  record_id: 7700,
  location: 'W',
  data_value: 'W',
  condition: null
}
....

How can Papaparse handle values starting with a double quote?
@janisdd
Copy link
Contributor

janisdd commented Aug 28, 2024

Hardly known, there is a setting "quoteChar". The default value is ".
It is normally used to encode fields that contain the separator (here tabulator).

So papaparse thinks that "Första hjälpen" is actually the whole field (because it starts and ends with the quoteChar) and gives an error.

Let m be your data

let m = `...`
Papa.parse(m, { delimiter: '\t', quoteChar: '#' }

Tested on https://www.papaparse.com/demo today

@pokoli pokoli closed this as not planned Won't fix, can't repro, duplicate, stale Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants