- namespace
Pyther.Parser.CSV;
- easy to use
- extreme fast (take a look at the end of this document)
- lightweight code
- can handle csv files of any size with minimal memory footprint
- can transform rows into associative records, dynamic objects or objects of any class
- lots of optional settings
One of many ways to read CSV files:
using Pyther.Parser.CSV;
var csv = new CSVReader();
foreach (var record in csv.ReadRecordFromPath(@"C:\orders.csv"))
{
Console.WriteLine(record["customer-lastname"].ToString());
}
One of many ways to write CSV files:
using Pyther.Parser.CSV;
var csv = new CSVWriter(@"C:\heroes.csv");
csv.Headers.Add("Name", "FirstName", "LastName", "Height", "Remarks");
csv.Write("Parker, Peter", "Peter", "Parker", 175.3, "Arguments example");
Lets take a csv like this as an example:
We will discover 4 different ways to parse this file.
- returns a
System.Collections.Generic.List
ofobject
per entry - access by index
var csv = new CSVReader();
foreach (List<object> row in csv.ReadRowFromPath(@"C:\orders.csv"))
{
string remark = row[5].ToString() ?? "";
string lastname = row[2].ToString() ?? "";
}
- returns a
Pyther.Parser.CSV.Record
per entry - access by index or column header name
var csv = new CSVReader();
foreach (Record row in csv.ReadRecordFromPath(@"C:\orders.csv"))
{
string remark = row[5].ToString() ?? "";
string lastname = row["customer-lastname"].ToString() ?? "";
}
- if you only need to access the record by header name or index, you can use one of these values as the second paramater:
RecordFlags.Indexed
... allow access by index onlyRecordFlags.Associative
... allow access by name onlyRecordFlags.Both
... allow access by index and name (default)
- returns a
dynamic
object - HeaderTransformMethod defines how the column name should be transformed
- TransformMethods.KebabCaseToTitleCase:
customer-lastname
=>CustomerName
- TransformMethods.KebabCaseToTitleCase:
- access by object property (named from colum header)
var csv = new CSVReader(new Settings()
{
HeaderTransformMethod = TransformMethods.Auto
});
foreach (dynamic obj in csv.ReadDynamicFromPath(@"C:\orders.csv"))
{
string remark = obj.Remark.ToString() ?? "";
string lastname = obj.CustomerLastname.ToString() ?? "";
}
- Hint: This is the slowest way
lets say we have an object like
public class Order
{
public string? OrderId { get; set; }
public string? CustomerFirstname { get; set; }
public string? CustomerLastname { get; set; }
public string? CustomerPhone { get; set; }
public DateTime DateOfPurchase { get; set; }
public string? Remark { get; set; }
}
we can read each line transformed to this object:
var csv = new CSVReader(new Settings()
{
HeaderTransformMethod = TransformMethods.Auto
});
foreach (var obj in csv.ReadObjectFromPath<Order>(@"C:\orders.csv"))
{
string remark = obj.Remark;
string lastname = obj.CustomerLastname;
}
- You can also populate an existing object by using it as second argument. This way you can also recycle an object to improve performance.
Order myOrder = new Order();
foreach (var _ in csv.ReadObjectFromPath(@"C:\orders.csv", myOrder))
{
Console.WriteLine($"{csv.RowId,3} | {myOrder.OrderId} / {myOrder.DateOfPurchase}");
}
- no matter what way you choose. After initialization you don't have to care about enclosures, delimters, escapes, aso.
var csv = new CSVWriter(@"C:\heroes.csv");
- by default headers are optional, but are required if you want to write (dynamic) objects
- let create the following headers:
Name
,FirstName
,LastName
,Height
andRemarks
:
csv.Headers.Add("Name", "FirstName", "LastName", "Height", "Remarks");
// or
csv.Headers.
Add("Name").
Add("FirstName").
Add("LastName").
Add("Height").
Add("Remarks");
// or
csv.Headers.
Add("Name", "FirstName").
Add("LastName").
Add("Height", "Remarks");
after that you can write the headers to the file. If you skip this, they will be written at the time the first record was written.
csv.WriteHeader();
- all parameters are
object
csv.Write("Parker, Peter", "Peter", "Parker", 175.3, "Arguments example");
var row = new List<object>{"Parker, Peter", "Peter", "Parker", 175.3, "List of objects example" };
csv.Write(row);
or
var row = new List<string>{"Parker, Peter", "Peter", "Parker", "175.3", "List of string example" };
csv.Write(row);
- Hint: on all cases, the ordering doesn't matter
a) from associative records
var rec = new Record();
rec["Name"] = "Parker, Peter";
rec["FirstName"] = "Peter";
rec["Height"] = 175.3;
rec["LastName"] = "Parker";
rec["Remarks"] = "List of mixed Record example";
csv.Write(rec);
b) from indexed records
var rec = new Record();
rec[0] = "Parker, Peter";
rec[1] = "Peter";
rec[2] = "Parker";
rec[3] = 175.3;
rec[4] = "List of indexed Record example";
csv.Write(rec);
c) or mixed
var rec = new Record();
rec["Name"] = "Parker, Peter";
rec["FirstName"] = "Peter";
rec[2] = "Parker";
rec["Height"] = 175.3;
rec["Remarks"] = "List of mixed Record example";
csv.Write(rec);
Performance Hint: if you know the amount of columns upfront (what is almost always the case), you should give this information as the first constructor argument:
var rec = new Record(5);
...
You can also write dynamic objects
dynamic obj = new ExpandoObject();
obj.Name = "Parker, Peter";
obj.FirstName = "Peter";
obj.LastName = "Parker";
obj.Height = 175.3;
obj.Remarks = "dynamic object example";
csv.WriteDynamic(obj);
Lets say we have the following Person
class
class Person
{
public string? FirstName { get; set; }
public string? LastName { get; set; }
public string? Name => LastName + ", " + FirstName;
public double Height { get; set; }
public string? Remarks { get; set; }
}
with the following example data
var person = new Person()
{
FirstName = "Peter",
LastName = "Parker",
Height = 175.3,
Remarks = "custom object example"
};
we can simply write it the following way
csv.Write(person);
Lets say we have two model class:
class Order
{
public string? Id { get; set; }
public Address? Billing { get; set; }
public Address? Shipping { get; set; }
}
class Address
{
public string? FirstName { get; set; }
public string? LastName { get; set; }
public string? Company { get; set; }
}
with the following example data
Order order = new()
{
Id = "123",
Shipping = new Address()
{
FirstName = "Peter",
LastName = "Parker",
Company = "Marvel"
}
};
and we have the following csv headers:
csv.Headers
.Add("Id")
.Add("Billing.FirstName", "Billing.LastName", "Billing.Company")
.Add("Shipping.FirstName", "Shipping.LastName", "Shipping.Company");
we can simply write it the following way
csv.WriteNested(order);
and we get the follow result (remember order.Billing
was not set)
Id,Billing.FirstName,Billing.LastName,Billing.Company,Shipping.FirstName,Shipping.LastName,Shipping.Company
123,,,,Peter,Parker,Marvel
You can affect the way how the csv file will be parsed using a Pyther.Parser.CSV.Settings
object as a constrructor parameter:
var csv = new CSVReader(new Settings()
{
...
});
- Defines the file encoding.
- type:
Encoding
- default:
Encoding.UTF8
- The buffer size used to read the file.
- type:
int
- default:
1MB
- Defines how the records are separated.
- type:
string
- default:
Environment.NewLine
- Defines how the columns/cells are separated.
- type:
string
- default:
,
- Defines the field enclosure.
- type:
string
- default:
"
- Should values always be enclosed?
- type:
bool
- default:
false
- An optional escape symbol.
- type:
string
- default:
\
- Enable/Disable escaping using the enclosure symbol twice.
- type:
bool
- default:
false
- Does the CSV contain headers?
- type:
bool
- default:
true
- Callback method to transform column header names.
- type:
Func<string, string>?
(string) -> string - default:
null
- Callback method to transform cell content.
- type:
Func<object, int, string?, object>?
(cell data, column index, column name) -> object - default:
null
- Auto trim cell values?
- type:
bool
- default:
true
- Ignore empty lines?
- type:
bool
- default:
true
- Format provider used when writing data (
null
means current culture) - type
IFormatProvider
- default:
CultureInfo.InvariantCulture
- How to handle the error, if there are more column headers than record cells.
- type:
ErrorHandling
- default:
ErrorHandling.TryToSolve
- values:
Ignore
,TryToSolve
orThrow
- How to handle the error, if there are less column headers than record cells.
- type:
ErrorHandling
- default:
ErrorHandling.TryToSolve
- values:
Ignore
,TryToSolve
orThrow
- How to handle the error, if a property doesn't exists in the object.
- type:
ErrorHandling
- default:
ErrorHandling.TryToSolve
- values:
Ignore
,TryToSolve
orThrow
Since the Read....()
method returns an IEnumerable
, you can use all methods they define. This include the Take()
, Skip()
, Where()
aso. :
// skip 2 and get 5 records
foreach (var obj in csv.ReadRecord(@"C:\orders.csv").Skip(2).Take(5))
{
...
}
First create a callback method, that is called for each cell:
- Arguments
- data ... raw cell data
- columnIndex ... the index of the column of the current cell
- columnName ... If headers are given, this argument will hold the column name of the current cell
- Return
- This method has to return final cell data
private static object MyCellTransform(object data, int columnIndex, string? columnName)
{
switch (columnName)
{
case "DateOfPurchase":
return DateTime.Parse((string)data).ToUniversalTime();
default:
return data;
}
}
Set the method in the settings
var csv = new CSVReader(new Settings()
{
...
CellTransformMethod = MyCellTransform
});```
- Test System
- Intel Core i5 13600KF 14x 5.1 GHz
- 32GB DDR4-RAM PC-3600
- NVME M.2 SSD 1TB Kingston KC3000
- Windows 11 Pro 64-Bit
- Test Scenario
- real world data (shop orders)
- 80 col x 100k rows = 8 Mio cells
- ~75 MB
- Average of 5 iterations
ReadRow | ReadRecord | ReadRecord (indexed) | ReadDynamic | ReadObject | |
---|---|---|---|---|---|
Time in Seconds | 0.7918 | 0.8926 | 0.7074 | 3.7348 | 1.598 |
Cells per Seconds | 10.10 Mio | 8.96 Mio | 11.31 Mio | 2..14 Mio | 5.01 Mio |
- Test Scenario
- same as above using amazon orders, with the following options:
- Enclosure = null
- Delimeter = '\t'
- Escape = null
- RecordSeparator = Environment.NewLine
- these options will use the fast path for parsing
- same as above using amazon orders, with the following options:
ReadRow | ReadRecord | ReadRecord (indexed) | ReadDynamic | ReadObject | |
---|---|---|---|---|---|
Time in Seconds | 0.312 | 0.472 | 0.320 | 3.011 | 0.968 |
Cells per Seconds | 25.64 Mio | 16.95 Mio | 25 Mio | 2.66 Mio | 8.26 Mio |
- CSVWriter: Allow Header alias or Transform
- CSVWriter: ErrorToManyColumns, ErrorToFewColumns, ErrorInvalidClassProperty