Skip to content

Commit

Permalink
Added CSVRow to_json() method (#48)
Browse files Browse the repository at this point in the history
* Experimental CSVRow to_json() method

* Added to_json_array() method for CSVRow

* Update test_csv_row_json.cpp

* Added more tests

* Added assert_no_char_overlap()

* Updated documentation

* Bumped version number to 1.2.2 🔖

* Update README.md
  • Loading branch information
vincentlaucsb authored Sep 3, 2019
1 parent d3f73b8 commit 7c7830f
Show file tree
Hide file tree
Showing 12 changed files with 857 additions and 29 deletions.
33 changes: 31 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
* [Specifying the CSV Format](#specifying-the-csv-format)
* [Trimming Whitespace](#trimming-whitespace)
* [Setting Column Names](#setting-column-names)
* [Converting to JSON](#converting-to-json)
* [Parsing an In-Memory String](#parsing-an-in-memory-string)
* [Writing CSV Files](#writing-csv-files)
* [Contributing](#contributing)
Expand Down Expand Up @@ -160,7 +161,7 @@ CSVReader reader("very_big_file.csv");
for (auto& row: reader) {
if (row["timestamp"].is_int()) {
// Can use get<>() with any integer type, but negative
// Can use get<>() with any integer type, but negative
// numbers cannot be converted to unsigned types
row["timestamp"].get<int>();
Expand All @@ -170,6 +171,35 @@ for (auto& row: reader) {
```

### Converting to JSON
You can serialize individual rows as JSON objects, where the keys are column names, or as
JSON arrays (which don't contain column names). The outputted JSON contains properly escaped
strings with minimal whitespace and no quoting for numeric values. How these JSON fragments are
assembled into a larger JSON document is an exercise left for the user.

```cpp
# include <sstream>
# include "csv.hpp"

using namespace csv;

...

CSVReader reader("very_big_file.csv");
std::stringstream my_json;

for (auto& row: reader) {
my_json << row.to_json() << std::endl;
my_json << row.to_json_array() << std::endl;

// You can pass in a vector of column names to
// slice or rearrange the outputted JSON
my_json << row.to_json({ "A", "B", "C" }) << std::endl;
my_json << row.to_json_array({ "C", "B", "A" }) << std::endl;
}

```
### Specifying the CSV Format
Although the CSV parser has a decent guessing mechanism, in some cases it is preferrable to specify the exact parameters of a file.
Expand Down Expand Up @@ -214,7 +244,6 @@ CSVFormat format;
format.set_column_names(col_names);
```

### Parsing an In-Memory String

```cpp
Expand Down
2 changes: 2 additions & 0 deletions docs/source/Doxy.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ For quick examples, go to this project's [GitHub page](https://github.com/vincen
* csv::CSVRow::iterator
* csv::CSVRow::begin()
* csv::CSVRow::end()
* csv::CSVRow::to_json()
* csv::CSVRow::to_json_array()
* csv::CSVField
* csv::CSVField::get(): \copybrief csv::CSVField::get()
* csv::CSVField::operator==()
Expand Down
2 changes: 1 addition & 1 deletion include/csv.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
CSV for C++, version 1.2.1
CSV for C++, version 1.2.2
https://github.com/vincentlaucsb/csv-parser
MIT License
Expand Down
1 change: 1 addition & 0 deletions include/internal/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ target_sources(csv
csv_reader_iterator.cpp
csv_row.hpp
csv_row.cpp
csv_row_json.cpp
csv_stat.cpp
csv_stat.hpp
csv_utility.cpp
Expand Down
50 changes: 50 additions & 0 deletions include/internal/csv_format.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
* Defines an object used to store CSV format settings
*/

#include <algorithm>
#include <set>

#include "csv_format.hpp"

namespace csv {
Expand Down Expand Up @@ -31,21 +34,25 @@ namespace csv {

CSVFormat& CSVFormat::delimiter(char delim) {
this->possible_delimiters = { delim };
this->assert_no_char_overlap();
return *this;
}

CSVFormat& CSVFormat::delimiter(const std::vector<char> & delim) {
this->possible_delimiters = delim;
this->assert_no_char_overlap();
return *this;
}

CSVFormat& CSVFormat::quote(char quote) {
this->quote_char = quote;
this->assert_no_char_overlap();
return *this;
}

CSVFormat& CSVFormat::trim(const std::vector<char> & chars) {
this->trim_chars = chars;
this->assert_no_char_overlap();
return *this;
}

Expand All @@ -70,4 +77,47 @@ namespace csv {
this->unicode_detect = detect;
return *this;
}

void CSVFormat::assert_no_char_overlap()
{
auto delims = std::set<char>(
this->possible_delimiters.begin(), this->possible_delimiters.end()),
trims = std::set<char>(
this->trim_chars.begin(), this->trim_chars.end());

// Stores intersection of possible delimiters and trim characters
std::vector<char> intersection = {};

// Find which characters overlap, if any
std::set_intersection(
delims.begin(), delims.end(),
trims.begin(), trims.end(),
std::back_inserter(intersection));

// Make sure quote character is not contained in possible delimiters
// or whitespace characters
if (delims.find(this->quote_char) != delims.end() ||
trims.find(this->quote_char) != trims.end()) {
intersection.push_back(this->quote_char);
}

if (!intersection.empty()) {
std::string err_msg = "There should be no overlap between the quote character, "
"the set of possible delimiters "
"and the set of whitespace characters. Offending characters: ";

// Create a pretty error message with the list of overlapping
// characters
for (size_t i = 0; i < intersection.size(); i++) {
err_msg += "'";
err_msg += intersection[i];
err_msg += "'";

if (i + 1 < intersection.size())
err_msg += ", ";
}

throw std::runtime_error(err_msg + '.');
}
}
}
15 changes: 13 additions & 2 deletions include/internal/csv_format.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,22 +24,30 @@ namespace csv {
/** Settings for parsing a RFC 4180 CSV file */
CSVFormat() = default;

/** Sets the delimiter of the CSV file */
/** Sets the delimiter of the CSV file
*
* @throws `std::runtime_error` thrown if trim, quote, or possible delimiting characters overlap
*/
CSVFormat& delimiter(char delim);

/** Sets a list of potential delimiters
*
* @throws `std::runtime_error` thrown if trim, quote, or possible delimiting characters overlap
* @param[in] delim An array of possible delimiters to try parsing the CSV with
*/
CSVFormat& delimiter(const std::vector<char> & delim);

/** Sets the whitespace characters to be trimmed
*
* @throws `std::runtime_error` thrown if trim, quote, or possible delimiting characters overlap
* @param[in] ws An array of whitespace characters that should be trimmed
*/
CSVFormat& trim(const std::vector<char> & ws);

/** Sets the quote character */
/** Sets the quote character
*
* @throws `std::runtime_error` thrown if trim, quote, or possible delimiting characters overlap
*/
CSVFormat& quote(char quote);

/** Sets the column names.
Expand Down Expand Up @@ -89,6 +97,9 @@ namespace csv {
return this->possible_delimiters.size() > 1;
}

/**< Throws an error if delimiters and trim characters overlap */
void assert_no_char_overlap();

/**< Set of possible delimiters */
std::vector<char> possible_delimiters = { ',' };

Expand Down
4 changes: 4 additions & 0 deletions include/internal/csv_row.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ namespace csv {
static const std::string ERROR_FLOAT_TO_INT =
"Attempted to convert a floating point value to an integral type.";
static const std::string ERROR_NEG_TO_UNSIGNED = "Negative numbers cannot be converted to unsigned types.";

std::string json_escape_string(csv::string_view s) noexcept;
}

/**
Expand Down Expand Up @@ -201,6 +203,8 @@ namespace csv {
CSVField operator[](size_t n) const;
CSVField operator[](const std::string&) const;
csv::string_view get_string_view(size_t n) const;
std::string to_json(const std::vector<std::string>& subset = {}) const;
std::string to_json_array(const std::vector<std::string>& subset = {}) const;

/** Convert this CSVRow into a vector of strings.
* **Note**: This is a less efficient method of
Expand Down
Loading

0 comments on commit 7c7830f

Please sign in to comment.