Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Differentiate between empty string and null value #125

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 13 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,18 +163,19 @@ val tsvReader = csvReader {
}
```

| Option | default value | description |
|--------------------------------|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| logger | _no-op_ | Logger instance for logging debug information at runtime. |
| charset | `UTF-8` | Charset encoding. The value must be supported by [java.nio.charset.Charset](https://docs.oracle.com/javase/8/docs/api/java/nio/charset/Charset.html). |
| quoteChar | `"` | Character used to quote fields. |
| delimiter | `,` | Character used as delimiter between each field.<br />Use `"\t"` if reading TSV file. |
| escapeChar | `"` | Character to escape quote inside field string.<br />Normally, you don't have to change this option.<br />See detail comment on [ICsvReaderContext](src/commonMain/kotlin/com/github/doyaaaaaken/kotlincsv/dsl/context/CsvReaderContext.kt). |
| skipEmptyLine | `false` | Whether to skip or error out on empty lines. |
| autoRenameDuplicateHeaders | `false` | Whether to auto rename duplicate headers or throw an exception. |
| ~~skipMissMatchedRow~~ | `false` | Deprecated. Replace with appropriate values in `excessFieldsRowBehaviour` and `insufficientFieldsRowBehaviour`, e.g. both set to `IGNORE`. ~~Whether to skip an invalid row. If `ignoreExcessCols` is true, only rows with less than the expected number of columns will be skipped.~~ |
| excessFieldsRowBehaviour | `ERROR` | Behaviour to use when a row has more fields (columns) than expected. `ERROR` (default), `IGNORE` (skip the row) or `TRIM` (remove the excess fields at the end of the row to match the expected number of fields). |
| insufficientFieldsRowBehaviour | `ERROR` | Behaviour to use when a row has fewer fields (columns) than expected. `ERROR` (default), `IGNORE` (skip the row) or `EMPTY_STRING` (replace missing fields with an empty string). |
| Option | default value | description |
|--------------------------------|---------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| logger | _no-op_ | Logger instance for logging debug information at runtime. |
| charset | `UTF-8` | Charset encoding. The value must be supported by [java.nio.charset.Charset](https://docs.oracle.com/javase/8/docs/api/java/nio/charset/Charset.html). |
| quoteChar | `"` | Character used to quote fields. |
| delimiter | `,` | Character used as delimiter between each field.<br />Use `"\t"` if reading TSV file. |
| escapeChar | `"` | Character to escape quote inside field string.<br />Normally, you don't have to change this option.<br />See detail comment on [ICsvReaderContext](src/commonMain/kotlin/com/github/doyaaaaaken/kotlincsv/dsl/context/CsvReaderContext.kt). |
| skipEmptyLine | `false` | Whether to skip or error out on empty lines. |
| autoRenameDuplicateHeaders | `false` | Whether to auto rename duplicate headers or throw an exception. |
| ~~skipMissMatchedRow~~ | `false` | Deprecated. Replace with appropriate values in `excessFieldsRowBehaviour` and `insufficientFieldsRowBehaviour`, e.g. both set to `IGNORE`. ~~Whether to skip an invalid row. If `ignoreExcessCols` is true, only rows with less than the expected number of columns will be skipped.~~ |
| excessFieldsRowBehaviour | `ERROR` | Behaviour to use when a row has more fields (columns) than expected. `ERROR` (default), `IGNORE` (skip the row) or `TRIM` (remove the excess fields at the end of the row to match the expected number of fields). |
| insufficientFieldsRowBehaviour | `ERROR` | Behaviour to use when a row has fewer fields (columns) than expected. `ERROR` (default), `IGNORE` (skip the row) or `EMPTY_STRING` (replace missing fields with an empty string). |
| withFieldAsNull | `NEITHER` | Behaviour to handle two empty separators or quotes as null. `NEITHER` (default, two sequential separators or quotes are handled as empty string), `EMPTY_SEPARATORS` (two sequential separators are null), `EMPTY_QUOTES` (two sequential quotes are null) or `BOTH` (two sequential separators and two sequential quotes are null). |

### CSV Write examples

Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
package com.github.doyaaaaaken.kotlincsv.client

import com.github.doyaaaaaken.kotlincsv.dsl.context.CSVReaderNullFieldIndicator
import com.github.doyaaaaaken.kotlincsv.dsl.context.CsvReaderContext
import com.github.doyaaaaaken.kotlincsv.dsl.context.ExcessFieldsRowBehaviour
import com.github.doyaaaaaken.kotlincsv.dsl.context.InsufficientFieldsRowBehaviour
import com.github.doyaaaaaken.kotlincsv.parser.CsvParser
import com.github.doyaaaaaken.kotlincsv.parser.ParserNullFieldIndicator
import com.github.doyaaaaaken.kotlincsv.util.CSVAutoRenameFailedException
import com.github.doyaaaaaken.kotlincsv.util.CSVFieldNumDifferentException
import com.github.doyaaaaaken.kotlincsv.util.logger.Logger
import com.github.doyaaaaaken.kotlincsv.util.MalformedCSVException
import com.github.doyaaaaaken.kotlincsv.util.logger.Logger

/**
* CSV Reader class, which controls file I/O flow.
Expand All @@ -23,7 +25,7 @@ class CsvFileReader internal constructor(
private val reader = BufferedLineReader(reader)
private var rowNum = 0L

private val parser = CsvParser(ctx.quoteChar, ctx.delimiter, ctx.escapeChar)
private val parser = CsvParser(ctx.quoteChar, ctx.delimiter, ctx.escapeChar, ctx.withFieldAsNull.toParserNullFieldIndicator())

/**
* read next csv row
Expand All @@ -33,14 +35,14 @@ class CsvFileReader internal constructor(
* or return null, if all line are already read.
*/
@Deprecated("We are considering making it a private method. If you have feedback, please comment on Issue #100.")
fun readNext(): List<String>? {
fun readNext(): List<String?>? {
return readUntilNextCsvRow("")
}

/**
* read all csv rows as Sequence
*/
fun readAllAsSequence(fieldsNum: Int? = null): Sequence<List<String>> {
fun readAllAsSequence(fieldsNum: Int? = null): Sequence<List<String?>> {
var expectedNumFieldsInRow: Int? = fieldsNum
return generateSequence {
@Suppress("DEPRECATION") readNext()
Expand Down Expand Up @@ -76,7 +78,7 @@ class CsvFileReader internal constructor(

private fun skipMismatchedRow(
idx: Int,
row: List<String>,
row: List<String?>,
numFieldsInRow: Int
): Nothing? {
logger.info("skip miss matched row. [csv row num = ${idx + 1}, fields num = ${row.size}, fields num of first row = $numFieldsInRow]")
Expand All @@ -86,9 +88,9 @@ class CsvFileReader internal constructor(
/**
* read all csv rows as Sequence with header information
*/
fun readAllWithHeaderAsSequence(): Sequence<Map<String, String>> {
fun readAllWithHeaderAsSequence(): Sequence<Map<String, String?>> {
@Suppress("DEPRECATION")
var headers = readNext() ?: return emptySequence()
var headers = readNext()?.map { it ?: "" } ?: return emptySequence()
if (ctx.autoRenameDuplicateHeaders) {
headers = deduplicateHeaders(headers)
} else {
Expand All @@ -108,7 +110,7 @@ class CsvFileReader internal constructor(
* @return return fields in row as List<String>.
* or return null, if all line are already read.
*/
private tailrec fun readUntilNextCsvRow(leftOver: String = ""): List<String>? {
private tailrec fun readUntilNextCsvRow(leftOver: String = ""): List<String?>? {
val nextLine = reader.readLineWithTerminator()
rowNum++
return if (nextLine == null) {
Expand Down Expand Up @@ -160,4 +162,11 @@ class CsvFileReader internal constructor(
if (results.size != results.distinct().size) throw CSVAutoRenameFailedException()
}
}

private fun CSVReaderNullFieldIndicator.toParserNullFieldIndicator() = when(this) {
CSVReaderNullFieldIndicator.EMPTY_SEPARATORS -> ParserNullFieldIndicator.EMPTY_SEPARATORS
CSVReaderNullFieldIndicator.EMPTY_QUOTES -> ParserNullFieldIndicator.EMPTY_QUOTES
CSVReaderNullFieldIndicator.BOTH -> ParserNullFieldIndicator.BOTH
CSVReaderNullFieldIndicator.NEITHER -> ParserNullFieldIndicator.NEITHER
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,12 @@ expect class CsvReader(
ctx: CsvReaderContext = CsvReaderContext()
) {
/**
* read csv data as String, and convert into List<List<String>>
* read csv data as String, and convert into List<List<String?>>
*/
fun readAll(data: String): List<List<String>>
fun readAll(data: String): List<List<String?>>

/**
* read csv data with header, and convert into List<Map<String, String>>
* read csv data with header, and convert into List<Map<String, String?>>
*/
fun readAllWithHeader(data: String): List<Map<String, String>>
fun readAllWithHeader(data: String): List<Map<String, String?>>
}
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,11 @@ interface ICsvReaderContext {
* If a row exceeds have the expected number of fields (columns), how, and if, the reader should proceed
*/
val excessFieldsRowBehaviour: ExcessFieldsRowBehaviour

/**
* Configures which field values should be handled as null value by the reader.
*/
val withFieldAsNull: CSVReaderNullFieldIndicator
}

enum class InsufficientFieldsRowBehaviour {
Expand Down Expand Up @@ -125,6 +130,29 @@ enum class ExcessFieldsRowBehaviour {
TRIM
}

enum class CSVReaderNullFieldIndicator {

/**
* Two sequential separators are null.
*/
EMPTY_SEPARATORS,

/**
* Two sequential quotes are null.
*/
EMPTY_QUOTES,

/**
* Two sequential separators and two sequential quotes are null.
*/
BOTH,

/**
* Default. Both are considered empty string.
*/
NEITHER
}

/**
* CSV Reader settings used in `csvReader` DSL method.
*
Expand All @@ -142,4 +170,5 @@ class CsvReaderContext : ICsvReaderContext {
override var autoRenameDuplicateHeaders: Boolean = false
override var insufficientFieldsRowBehaviour: InsufficientFieldsRowBehaviour = InsufficientFieldsRowBehaviour.ERROR
override var excessFieldsRowBehaviour: ExcessFieldsRowBehaviour = ExcessFieldsRowBehaviour.ERROR
override var withFieldAsNull: CSVReaderNullFieldIndicator = CSVReaderNullFieldIndicator.NEITHER
}
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ package com.github.doyaaaaaken.kotlincsv.parser
internal class CsvParser(
private val quoteChar: Char,
private val delimiter: Char,
private val escapeChar: Char
private val escapeChar: Char,
private val withFieldAsNull: ParserNullFieldIndicator
) {

/**
Expand All @@ -18,8 +19,8 @@ internal class CsvParser(
* @return return parsed row fields
* return null, if passed line string is on the way of csv row.
*/
fun parseRow(line: String, rowNum: Long = 1): List<String>? {
val stateMachine = ParseStateMachine(quoteChar, delimiter, escapeChar)
fun parseRow(line: String, rowNum: Long = 1): List<String?>? {
val stateMachine = ParseStateMachine(quoteChar, delimiter, escapeChar, withFieldAsNull)
var lastCh: Char? = line.firstOrNull()
var skipCount = 0L
line.zipWithNext { ch, nextCh ->
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,18 @@ import com.github.doyaaaaaken.kotlincsv.util.Const
internal class ParseStateMachine(
private val quoteChar: Char,
private val delimiter: Char,
private val escapeChar: Char
private val escapeChar: Char,
private val withFieldAsNull: ParserNullFieldIndicator
) {

private var state = ParseState.START

private val fields = ArrayList<String>()
private val fields = ArrayList<String?>()

private var field = StringBuilder()

private var handleFieldAsNull = false

private var pos = 0L

/**
Expand All @@ -33,6 +36,7 @@ internal class ParseStateMachine(
Const.BOM -> Unit
quoteChar -> state = ParseState.QUOTE_START
delimiter -> {
handleEmptySeparators()
flushField()
state = ParseState.DELIMITER
}
Expand Down Expand Up @@ -89,6 +93,7 @@ internal class ParseStateMachine(
when (ch) {
quoteChar -> state = ParseState.QUOTE_START
delimiter -> {
handleEmptySeparators()
flushField()
state = ParseState.DELIMITER
}
Expand Down Expand Up @@ -126,6 +131,7 @@ internal class ParseStateMachine(
state = ParseState.QUOTED_FIELD
pos += 1
} else {
handleEmptyQuotes()
state = ParseState.QUOTE_END
}
} else {
Expand Down Expand Up @@ -167,10 +173,15 @@ internal class ParseStateMachine(
* @return return parsed CSV Fields.
* return null, if current position is on the way of csv row.
*/
fun getResult(): List<String>? {
fun getResult(): List<String?>? {
return when (state) {
ParseState.DELIMITER -> {
fields.add("")
val value = when(withFieldAsNull) {
ParserNullFieldIndicator.EMPTY_SEPARATORS -> null
ParserNullFieldIndicator.BOTH -> null
else -> ""
}
fields.add(value)
fields.toList()
}
ParseState.QUOTED_FIELD -> null
Expand All @@ -183,8 +194,27 @@ internal class ParseStateMachine(
}

private fun flushField() {
fields.add(field.toString())
val value = if (handleFieldAsNull) null else field.toString()

fields.add(value)
field.clear()
handleFieldAsNull = false
}

private fun handleEmptySeparators() {
handleFieldAsNull = when(withFieldAsNull) {
ParserNullFieldIndicator.EMPTY_SEPARATORS -> true
ParserNullFieldIndicator.BOTH -> true
else -> false
}
}

private fun handleEmptyQuotes() {
handleFieldAsNull = when(withFieldAsNull) {
ParserNullFieldIndicator.EMPTY_QUOTES -> field.isEmpty()
ParserNullFieldIndicator.BOTH -> field.isEmpty()
else -> false
}
}
}

Expand All @@ -197,3 +227,10 @@ private enum class ParseState {
QUOTE_END,
QUOTED_FIELD
}

internal enum class ParserNullFieldIndicator {
EMPTY_SEPARATORS,
EMPTY_QUOTES,
BOTH,
NEITHER
}
Loading