Skip to content

Commit

Permalink
Added analyze command provide human-readable information
Browse files Browse the repository at this point in the history
  • Loading branch information
ivbeg committed Jan 31, 2022
1 parent d166540 commit e91821c
Show file tree
Hide file tree
Showing 26 changed files with 15,456 additions and 329,082 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
.json
.xml
.bson
.jsonl
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
6 changes: 5 additions & 1 deletion HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,11 @@
History
=======

1.0.10 (2022-01-30)
1.0.12 (2022-01-30)
-------------------
* Added command "analyze" it provides human-readable information about data files: CSV, JSON lines, JSON, XML. Detects encoding, delimiters, type of files, fields with objects for JSON and XML files. Doesn't support Gzipped, ZIPped and other comressed files yet.

1.0.11 (2022-01-30)
-------------------
* Updated setup.py and requirements.txt to require certain versions of libs and Python 3.8

Expand Down
40 changes: 40 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,46 @@ Analysis of JSON lines file and verifies each field that it's date field, detect
$ undatum stats --checkdates examples/ausgovdir.jsonl
Analyze command
---------------

Analyzes data format and provides human-readable information.


.. code-block:: bash
$ undatum analyze examples/ausgovdir.jsonl
Returned values will include:

* Filename - name of the file
* File type - type of the file, could be: jsonl, xml, csv, json
* Encoding - file encoding
* Delimiter - file delimiter if CSV file
* File size - size of the file, bytes
* Objects count - number of objects in file
* Fields - list of file fields

Also for XML AND JSON files:

* Miltiple tables exists - True or False, if multiple data tables in XML files
* Full data key - full path to data key (field with list of objects) in XML file
* Short data key - final name of field with objects in XML file

For JSON files:

* JSON type - could be "objects list", "objects list with key' and "single object"

For XML, JSON lines and JSON files:

* Is flat table? - True if table is flat and could be converted to CSV, False if not convertable

For CSV and JSON lines:

* Number of lines - number of lines in file



Split command
-------------
Expand Down
3,025 changes: 3,025 additions & 0 deletions examples/2201300013917000096.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions examples/6cf9cead-4755-4ae4-85b9-c0742ccbafef.xml

Large diffs are not rendered by default.

22,807 changes: 0 additions & 22,807 deletions examples/ausgovdir.jsonl

This file was deleted.

281,434 changes: 0 additions & 281,434 deletions examples/ausgovdir.xml

This file was deleted.

3,424 changes: 0 additions & 3,424 deletions examples/budgetgovru-fbpgu.jsonl

This file was deleted.

Loading

0 comments on commit e91821c

Please sign in to comment.