Releases: vincentlaucsb/csv-parser
CSV Parser 2.3.0: Race Condition Fix
What's Changed
- CSVField: new member function try_parse_decimal() to specify one or more decimal symbols by @wilfz in #226
- Replace the includes of Windows.h with windows.h (#204) by @ludovicdelfau in #235
- Use const CSVFormat& in calculate_score by @rajgoel in #236
- Fix memory issues in CSVFieldList by @vincentlaucsb in #237
Race Condition Notes
Background
The CSV Parser tries to perform as few allocations as possible. Instead of naively storing individual CSV fields as singular std::string
s in a std::vector
, the parser keeps references to the raw input and uses lightweight RawCSVField
objects to mark where a specific field starts and ends in that field (as well as flag indicating if an escaped quote is present). This has the benefits of:
- Avoiding the cost of constructing many
std::string
instances - Avoiding the cost of constant
std::vector
reallocations - Preserving locality of reference
Furthermore, the CSV Parser also uses separate threads for parsing CSV and for iterating over the data. As CSV rows are parsed, they are made available to the user who may utilize them without interrupting the parsing of new rows.
The Race Condition
The RawCSVField
objects mentioned previously were stored as contiguous blocks, and an std::vector
of pointers to these blocks were used to keep track of them.
However, as @ludovicdelfau accurately diagnosed, if the reading thread attempted to access a RawCSVField
(e.g. through reading a CSVField
) at the same time that a parsing thread was pushing a new RawCSVField
to an at-capacity std::vector
, the parsing thread's push would cause the contents of the std::vector
to be reallocated, thus causing the reading thread to access deallocated memory.
This issue was first reported in #217.
The Fix
The fix was simple. An std::deque
was dropped in to replace std::vector
to store RawCSVField
pointers, as std::deque
does not perform reallocations. This change appears to even improve the CSV Parser's performance as the cost of constant reallocations is avoided. The loss of memory locality typical in std::deque
applications was avoided as, again, the CSV Parser is storing pointers to RawCSVField[]
and not the RawCSVField
objects themselves.
New Contributors
- @wilfz made their first contribution in #226
- @ludovicdelfau made their first contribution in #235
- @rajgoel made their first contribution in #236
Full Changelog: 2.2.3...2.3.0
CSV Parser 2.2.3
- Fix
n_rows()
being off-by-one when theCSVReader
iterator was used (reported in #173)- Note: This was due to a simple counting error where the iterator did not increment the row counter for the first row. All rows were still correctly read.
- Implement ability to handle arbitrary combinations of
\r
and\n
in line endings (#223) - Fix CSV writers incorrectly converting decimal values between 0 and -1 to positive numbers
CSV Parser 2.2.2
What's Changed
- Allow parsing of numbers that begin with
+
, fixing #213 - Fix compiler warnings in g++ from using
abs
and intry_parse_hex()
#227 - Fix invalid memory access issue in g++ builds #228
- Issue was caused when using
CSVField
methods in conjunction withCSVRow
reverse iterators
- Issue was caused when using
- CMake options to disable programs building by @BaptisteLemarcis in #148
New Contributors
- @BaptisteLemarcis made their first contribution in #148
Full Changelog: 2.2.1...2.2.2
CSV Parser 2.2.1
This is a simple CMake change that makes it easier to #include "csv.hpp"
in a CMake project that grabs csv-parser
using FetchContent_Declare()
.
What's Changed
New Contributors
Full Changelog: 2.2.0...2.2.1
CSV Parser 2.2.0
- Fixed bug which caused inaccurate serialization of floating point values in CSV Writer as reported by #188
- Bug affected numbers close to 10 ^n; was caused by usage of inaccurate
std::log()
function (see: https://stackoverflow.com/questions/1489830/efficient-way-to-determine-number-of-digits-in-an-integer)
- Bug affected numbers close to 10 ^n; was caused by usage of inaccurate
- Fixed issue where strings consisting of numbers and dashes (e.g. phone numbers) were inaccurately identified as integers
- Silenced some compiler warnings
CSV Parser 2.1.3
- Fix various compatibility issues with g++ and clang
- Added hex value parsing
- Fixed a rare out-of-bounds condition
CSV Parser 2.1.2
- Fixed compilation issues with C++11 and 14.
- CSV Parser should now be should C++11 compatible once again with g++ 7.5 or up
- Allowed users to customize decimal place precision when writing CSVs
- Fixed floating point output
- Arbitrarily large integers stored in doubles can now be output w/o limits
- Fixed newlines not being escaped by
CSVWriter
CSV Parser 2.1.1
- Fixed CSVStats only processing first 5000 rows thanks to @TobyEalden
- Fixed parsing """fields like this""" thanks to @rpadrela
- Fixed CSVReader move semantics thanks to @artpaul
Minor Patch
Better, faster, stronger
New Features
CSVReader
can now parse from memory mapped files,std::stringstream
, andstd::ifstream
DelimWriter
now supports writing rows encoded asstd::tuple
DelimWriter
automatically converts numbers and other data types stored in vectors, arrays, and tuples
Improvements
CSVReader
is now a no-copy parser when memory-mapped IO is usedCSVRow
andCSVField
now refer to the original memory map
- Significant performance improvements for some files
Bug Fixes
- Fixed potential thread safety issues with
internals::CSVFieldList
API Changes
CSVReader::feed()
andCSVReader::end_feed()
have been removed. In-memory parsing should be performed via the interface forstd::stringsteam
.