Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/Change the default csv dialect #45

Merged
merged 4 commits into from
Feb 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed

- Add options to JSONFile implementation (sort_keys, skip_keys, ensure_ascii, separators, strict).
- Add options to JSONFile implementation (`sort_keys`, `skip_keys`, `ensure_ascii`, `separators`, `strict`).
- Set the default CSV dialect to `'excel'` when writing (this reflects the default value from the Python library).
- Set the default CSV dialect to `'auto'` when reading (the dialect will be sniffed from the first few rows).

### Fixed

Expand Down
35 changes: 18 additions & 17 deletions docs/toolbox.files.csv_file.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,22 +50,23 @@ with file:
---------------
- **CSV_ENCODING**
- **CSV_DIALECT**
- **CSV_AUTO**
- **CSV_SAMPLE_SIZE**
- **CSV_READER_PARAMS**
- **CSV_WRITER_PARAMS**
- **FILE_OPEN_PARAMS**

---

<a href="../src/cerbernetix/toolbox/files/csv_file.py#L475"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="../src/cerbernetix/toolbox/files/csv_file.py#L479"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

## <kbd>function</kbd> `read_csv_file`

```python
read_csv_file(
filename: 'str',
encoding: 'str' = 'utf-8',
dialect: 'str' = 'unix',
dialect: 'str' = 'auto',
iterator: 'bool' = False,
**kwargs
) → Iterable[dict | list]
Expand All @@ -81,7 +82,7 @@ The returned value can be either a list (default) or an iterator (when the itera

- <b>`filename`</b> (str): The path to the file to read.
- <b>`encoding`</b> (str, optional): The file encoding. Defaults to CSV_ENCODING.
- <b>`dialect`</b> (str, optional): The CSV dialect to use. If 'auto' is given, the reader will try detecting the CSV dialect by reading a sample at the head of the file. Defaults to CSV_DIALECT.
- <b>`dialect`</b> (str, optional): The CSV dialect to use. If 'auto' is given, the reader will try detecting the CSV dialect by reading a sample at the head of the file. Defaults to CSV_AUTO.
- <b>`iterator`</b> (bool, optional): When True, the function will return an iterator instead of a list. Defaults to False.
- <b>`delimiter`</b> (str, optional): A one-character string used to separate fields. Defaults to ','.
- <b>`doublequote`</b> (bool, optional): Controls how instances of quotechar appearing inside a field should themselves be quoted. When True, the character is doubled. When False, the escapechar is used as a prefix to the quotechar. Defaults to True.
Expand Down Expand Up @@ -123,7 +124,7 @@ for row in read_csv_file('path/to/file', iterator=True):

---

<a href="../src/cerbernetix/toolbox/files/csv_file.py#L547"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="../src/cerbernetix/toolbox/files/csv_file.py#L551"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

## <kbd>function</kbd> `write_csv_file`

Expand All @@ -132,7 +133,7 @@ write_csv_file(
filename: 'str',
data: 'Iterable[dict | list]',
encoding: 'str' = 'utf-8',
dialect: 'str' = 'unix',
dialect: 'str' = 'excel',
**kwargs
) → int
```
Expand Down Expand Up @@ -189,7 +190,7 @@ write_csv_file('path/to/file', csv_data, encoding='UTF-8', dialect='excel')

---

<a href="../src/cerbernetix/toolbox/files/csv_file.py#L620"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="../src/cerbernetix/toolbox/files/csv_file.py#L624"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

## <kbd>function</kbd> `read_zip_csv`

Expand All @@ -199,7 +200,7 @@ read_zip_csv(
filename: 'str' = None,
encoding: 'str' = 'utf-8',
decoding_errors: 'str' = 'ignore',
dialect: 'str' = 'unix',
dialect: 'str' = 'auto',
iterator: 'bool' = False,
**kwargs
) → Iterable[dict | list]
Expand All @@ -217,7 +218,7 @@ The returned value can be either a list (default) or an iterator (when the itera
- <b>`filename`</b> (str, optional): The name of the file to extract from the zip If omitted, the first file having a '.csv' extension will be selected. Defaults to None.
- <b>`encoding`</b> (str, optional): The file encoding. Defaults to CSV_ENCODING.
- <b>`decoding_errors`</b> (str, optional): Controls how decoding errors are handled. If 'strict', a UnicodeError exception is raised. Other possible values are 'ignore', 'replace', and any other name registered via codecs.register_error(). See Error Handlers for details. Defaults to "ignore".
- <b>`dialect`</b> (str, optional): The CSV dialect to use. If 'auto' is given, the reader will try detecting the CSV dialect by reading a sample at the head of the file. Defaults to CSV_DIALECT.
- <b>`dialect`</b> (str, optional): The CSV dialect to use. If 'auto' is given, the reader will try detecting the CSV dialect by reading a sample at the head of the file. Defaults to CSV_AUTO.
- <b>`iterator`</b> (bool, optional): When True, the function will return an iterator instead of a list. Defaults to False.
- <b>`delimiter`</b> (str, optional): A one-character string used to separate fields. Defaults to ','.
- <b>`doublequote`</b> (bool, optional): Controls how instances of quotechar appearing inside a field should themselves be quoted. When True, the character is doubled. When False, the escapechar is used as a prefix to the quotechar. Defaults to True.
Expand Down Expand Up @@ -265,7 +266,7 @@ with open('path/to/file.zip', 'rb') as file:

---

<a href="../src/cerbernetix/toolbox/files/csv_file.py#L93"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="../src/cerbernetix/toolbox/files/csv_file.py#L97"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

## <kbd>class</kbd> `CSVFile`
Offers a simple API for reading and writing CSV files.
Expand Down Expand Up @@ -310,7 +311,7 @@ with file(create=True):
csv = file.read_file()
```

<a href="../src/cerbernetix/toolbox/files/csv_file.py#L134"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="../src/cerbernetix/toolbox/files/csv_file.py#L138"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>method</kbd> `__init__`

Expand All @@ -322,7 +323,7 @@ __init__(
read: 'bool' = False,
write: 'bool' = False,
encoding: 'str' = 'utf-8',
dialect: 'str' = 'unix',
dialect: 'str' = 'auto',
**kwargs
)
```
Expand All @@ -339,7 +340,7 @@ Creates a file manager for CSV files.
- <b>`read`</b> (bool, optional): Expect to also read the file. Defaults to False.
- <b>`write`</b> (bool, optional): Expect to also write to the file. Defaults to False.
- <b>`encoding`</b> (str, optional): The file encoding. Defaults to CSV_ENCODING.
- <b>`dialect`</b> (str, optional): The CSV dialect to use. If 'auto' is given, the reader will try detecting the CSV dialect by reading a sample at the head of the file. Defaults to CSV_DIALECT.
- <b>`dialect`</b> (str, optional): The CSV dialect to use. If 'auto' is given, the reader will try detecting the CSV dialect by reading a sample at the head of the file. Defaults to CSV_AUTO for reading or to CSV_DIALECT for writing.
- <b>`delimiter`</b> (str, optional): A one-character string used to separate fields. Defaults to ",".
- <b>`doublequote`</b> (bool, optional): Controls how instances of quotechar appearing inside a field should themselves be quoted. When True, the character is doubled. When False, the escapechar is used as a prefix to the quotechar. Defaults to True.
- <b>`escapechar`</b> (str, optional): A one-character string used by the writer to escape the delimiter if quoting is set to QUOTE_NONE and the quotechar if doublequote is False. On reading, the escapechar removes any special meaning from the following character. Defaults to None, which disables escaping.
Expand Down Expand Up @@ -566,7 +567,7 @@ size = file.size

---

<a href="../src/cerbernetix/toolbox/files/csv_file.py#L263"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="../src/cerbernetix/toolbox/files/csv_file.py#L267"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>method</kbd> `close`

Expand Down Expand Up @@ -604,7 +605,7 @@ file.close()

---

<a href="../src/cerbernetix/toolbox/files/csv_file.py#L363"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="../src/cerbernetix/toolbox/files/csv_file.py#L367"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>method</kbd> `read`

Expand Down Expand Up @@ -648,7 +649,7 @@ csv_data = [row for row in file]

---

<a href="../src/cerbernetix/toolbox/files/csv_file.py#L294"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="../src/cerbernetix/toolbox/files/csv_file.py#L298"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>method</kbd> `read_file`

Expand Down Expand Up @@ -699,7 +700,7 @@ for row in file.read_file(iterator=True):

---

<a href="../src/cerbernetix/toolbox/files/csv_file.py#L414"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="../src/cerbernetix/toolbox/files/csv_file.py#L418"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>method</kbd> `write`

Expand Down Expand Up @@ -746,7 +747,7 @@ with file(create=True):

---

<a href="../src/cerbernetix/toolbox/files/csv_file.py#L332"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>
<a href="../src/cerbernetix/toolbox/files/csv_file.py#L336"><img align="right" style="float:right;" src="https://img.shields.io/badge/-source-cccccc?style=flat-square"></a>

### <kbd>method</kbd> `write_file`

Expand Down
1 change: 1 addition & 0 deletions docs/toolbox.files.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ csv_data = file.read_zip_csv(data)

**Global Variables**
---------------
- **CSV_AUTO**
- **CSV_DIALECT**
- **CSV_ENCODING**
- **JSON_ENCODING**
Expand Down
1 change: 1 addition & 0 deletions src/cerbernetix/toolbox/files/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@
"""

from cerbernetix.toolbox.files.csv_file import (
CSV_AUTO,
CSV_DIALECT,
CSV_ENCODING,
CSVFile,
Expand Down
24 changes: 14 additions & 10 deletions src/cerbernetix/toolbox/files/csv_file.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
first = file.read()
```
"""

from __future__ import annotations

import csv
Expand All @@ -52,7 +53,10 @@
CSV_ENCODING = "utf-8"

# The default CSV dialect
CSV_DIALECT = "unix"
CSV_DIALECT = "excel"

# The value for auto-detecting the CSV dialect
CSV_AUTO = "auto"

# The amount of bytes to read for auto-detecting the CSV dialect
CSV_SAMPLE_SIZE = 1024
Expand Down Expand Up @@ -139,7 +143,7 @@ def __init__(
read: bool = False,
write: bool = False,
encoding: str = CSV_ENCODING,
dialect: str = CSV_DIALECT,
dialect: str = CSV_AUTO,
**kwargs,
):
r"""Creates a file manager for CSV files.
Expand All @@ -157,7 +161,7 @@ def __init__(
encoding (str, optional): The file encoding. Defaults to CSV_ENCODING.
dialect (str, optional): The CSV dialect to use. If 'auto' is given, the reader will
try detecting the CSV dialect by reading a sample at the head of the file.
Defaults to CSV_DIALECT.
Defaults to CSV_AUTO for reading or to CSV_DIALECT for writing.
delimiter (str, optional): A one-character string used to separate fields.
Defaults to ",".
doublequote (bool, optional): Controls how instances of quotechar appearing inside a
Expand Down Expand Up @@ -400,7 +404,7 @@ def read(self) -> dict | list:
reader = csv.DictReader

dialect = self.dialect
if dialect == "auto":
if dialect == CSV_AUTO:
dialect = csv.Sniffer().sniff(self._file.read(CSV_SAMPLE_SIZE))
self._file.seek(0)

Expand Down Expand Up @@ -461,7 +465,7 @@ def write(self, data: dict | list) -> int:
writer = csv.writer

dialect = self.dialect
if dialect == "auto":
if dialect == CSV_AUTO:
dialect = CSV_DIALECT

self._writer = writer(self._file, dialect=dialect, **kwargs)
Expand All @@ -475,7 +479,7 @@ def write(self, data: dict | list) -> int:
def read_csv_file(
filename: str,
encoding: str = CSV_ENCODING,
dialect: str = CSV_DIALECT,
dialect: str = CSV_AUTO,
iterator: bool = False,
**kwargs,
) -> Iterable[dict | list]:
Expand All @@ -489,7 +493,7 @@ def read_csv_file(
encoding (str, optional): The file encoding. Defaults to CSV_ENCODING.
dialect (str, optional): The CSV dialect to use. If 'auto' is given, the reader will
try detecting the CSV dialect by reading a sample at the head of the file.
Defaults to CSV_DIALECT.
Defaults to CSV_AUTO.
iterator (bool, optional): When True, the function will return an iterator instead of a
list. Defaults to False.
delimiter (str, optional): A one-character string used to separate fields.
Expand Down Expand Up @@ -622,7 +626,7 @@ def read_zip_csv(
filename: str = None,
encoding: str = CSV_ENCODING,
decoding_errors: str = "ignore",
dialect: str = CSV_DIALECT,
dialect: str = CSV_AUTO,
iterator: bool = False,
**kwargs,
) -> Iterable[dict | list]:
Expand All @@ -642,7 +646,7 @@ def read_zip_csv(
Defaults to "ignore".
dialect (str, optional): The CSV dialect to use. If 'auto' is given, the reader will
try detecting the CSV dialect by reading a sample at the head of the file.
Defaults to CSV_DIALECT.
Defaults to CSV_AUTO.
iterator (bool, optional): When True, the function will return an iterator instead of a
list. Defaults to False.
delimiter (str, optional): A one-character string used to separate fields.
Expand Down Expand Up @@ -704,7 +708,7 @@ def read_zip_csv(
else:
reader_factory = csv.DictReader

if dialect == "auto":
if dialect == CSV_AUTO:
dialect = csv.Sniffer().sniff(text[:CSV_SAMPLE_SIZE])

lines = re.split(r"[\r\n]+", text.strip("\r\n"))
Expand Down
Loading