-
Notifications
You must be signed in to change notification settings - Fork 1
How to use
From here, we will show some examples of execution in the appendix/csv_samples/
directory using sample data.
Suppose the keys are the 0th column and the 2nd column.
-
sample_lhs.csv
head1, head2, head3, head4, head5 key1-2, value1-2, key2-2, value2-2, 20201224T035908 key1-3, value1-3, key2-3, value2-3, 20201224T180527 key1-4, value1-4, key2-4, value2-4, 20201225T104851 key1-5, value1-5, key2-5, value2-5, 20201225T142142
-
sample_rhs.csv
head1, head2, head3, head4, head5 key1-1, value1-1, key2-1, value2-1, 20210108T142358 key1-2, value1-3, key2-2, value2-z, 20210108T174216 key1-4, value1-4, key2-4, value2-4, 20210109T090245 key1-5, value1-v, key2-5, value2-5, 20210109T111231
- The minimum required arguments are
-
-
Two files to compare
-
Key column index (
-k
option (--matching-keys
))-
Index is 0 based
-
Multiple columns can be specified separated by commas
-
In this example, index 0 and index 2 are specified
-
If the key consists of only the 0th column, you don’t even need to specify the -k option
-
-
As a result, only the number of differences and the line number is displayed.
$ ../../src/csvdiff3/csvdiff.py sample_lhs.csv sample_rhs.csv -k 0,2
============ Report ============
● Count & Row number
same lines : 0 # (1)
left side only (<): 1 :-- Row Numbers -->: [3] # (2)
right side only (>): 1 :-- Row Numbers -->: [2] # (3)
with differences (!): 3 :-- Row Number Pairs -->: [(2, 3), (4, 4), (5, 5)] # (4)
- Report description
No. | heading | description |
---|---|---|
(1) |
same lines |
Number of lines that exist in both files and have the same content |
(2) |
left side only |
Number of lines that existed only in the left-hand file, and their line numbers |
(3) |
right side only |
Number of lines that existed only in the right-hand file, and their line numbers |
(4) |
with differences |
Number of lines that exist in both files but have different contents, and their line number pairs |
⚠️ Caution-
If the key is a number without zero padding,
you need to specify the number of digits after the colon (:
).
For example, if the column at index 0 is a number with up to 6 digits, specify as follows.$ ../../src/csvdiff3/csvdiff.py sample_lhs.csv sample_rhs.csv -k 0:6,2 ^^
To view the contents of different lines, Use the -d
(--show-difference-only
) option.
$ ../../src/csvdiff3/csvdiff.py sample_lhs.csv sample_rhs.csv -k 0,2 -d
============ Report ============
● Differences
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sample_lhs.csv sample_rhs.csv Column indices with difference
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 2 ['key1-1', 'value1-1', 'key2-1', 'value2-1', '20210108T142358']
2 ['key1-2', 'value1-2', 'key2-2', 'value2-2', '20201224T035908'] ! 3 ['key1-2', 'value1-3', 'key2-2', 'value2-z', '20210108T174216'] @ [1, 3, 4]
3 ['key1-3', 'value1-3', 'key2-3', 'value2-3', '20201224T180527'] <
4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20201225T104851'] ! 4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20210109T090245'] @ [4]
5 ['key1-5', 'value1-5', 'key2-5', 'value2-5', '20201225T142142'] ! 5 ['key1-5', 'value1-v', 'key2-5', 'value2-5', '20210109T111231'] @ [1, 4]
-
Differences are indicated by the following DIFF-MARKs
-
!
: There is a difference -
<
: Exists only on the left side -
>
: Exists only on the right side
-
-
The number displayed before each CSV line data is the line number of the actual file
-
line number is 1 based
-
-
For rows with differences, the column indices with differences will be displayed after
@
-
column index is 0 based
-
If you also want to see the number of differences, specify the -c
option (--show-count
).
$ ../../src/csvdiff3/csvdiff.py sample_lhs.csv sample_rhs.csv -k 0,2 -dc
============ Report ============
● Differences
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sample_lhs.csv sample_rhs.csv Column indices with difference
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 2 ['key1-1', 'value1-1', 'key2-1', 'value2-1', '20210108T142358']
2 ['key1-2', 'value1-2', 'key2-2', 'value2-2', '20201224T035908'] ! 3 ['key1-2', 'value1-3', 'key2-2', 'value2-z', '20210108T174216'] @ [1, 3, 4]
3 ['key1-3', 'value1-3', 'key2-3', 'value2-3', '20201224T180527'] <
4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20201225T104851'] ! 4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20210109T090245'] @ [4]
5 ['key1-5', 'value1-5', 'key2-5', 'value2-5', '20201225T142142'] ! 5 ['key1-5', 'value1-v', 'key2-5', 'value2-5', '20210109T111231'] @ [1, 4]
● Count & Row number
same lines : 0
left side only (<): 1 :-- Row Numbers -->: [3]
right side only (>): 1 :-- Row Numbers -->: [2]
with differences (!): 3 :-- Row Number Pairs -->: [(2, 3), (4, 4), (5, 5)]
Try specifying the columns you don’t want to compare, using the -i
option (--ignore-columns
).
You can specify multiple columns separated by commas.
In this example, let’s ignore the column at index 4.
$ ../../src/csvdiff3/csvdiff.py sample_lhs.csv sample_rhs.csv -k 0,2 -dc -i 4
============ Report ============
● Differences
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sample_lhs.csv sample_rhs.csv Column indices with difference
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 2 ['key1-1', 'value1-1', 'key2-1', 'value2-1', '20210108T142358']
2 ['key1-2', 'value1-2', 'key2-2', 'value2-2', '20201224T035908'] ! 3 ['key1-2', 'value1-3', 'key2-2', 'value2-z', '20210108T174216'] @ [1, 3]
3 ['key1-3', 'value1-3', 'key2-3', 'value2-3', '20201224T180527'] <
5 ['key1-5', 'value1-5', 'key2-5', 'value2-5', '20201225T142142'] ! 5 ['key1-5', 'value1-v', 'key2-5', 'value2-5', '20210109T111231'] @ [1]
● Count & Row number
same lines : 1
left side only (<): 1 :-- Row Numbers -->: [3]
right side only (>): 1 :-- Row Numbers -->: [2]
with differences (!): 2 :-- Row Number Pairs -->: [(2, 3), (5, 5)]
To show all lines, including lines with no differences, use the -a
option (--show-all-lines
).
$ ../../src/csvdiff3/csvdiff.py sample_lhs.csv sample_rhs.csv -k 0,2 -ac -i 4
============ Report ============
● All
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sample_lhs.csv sample_rhs.csv Column indices with difference
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 2 ['key1-1', 'value1-1', 'key2-1', 'value2-1', '20210108T142358']
2 ['key1-2', 'value1-2', 'key2-2', 'value2-2', '20201224T035908'] ! 3 ['key1-2', 'value1-3', 'key2-2', 'value2-z', '20210108T174216'] @ [1, 3]
3 ['key1-3', 'value1-3', 'key2-3', 'value2-3', '20201224T180527'] <
4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20201225T104851'] 4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20210109T090245']
5 ['key1-5', 'value1-5', 'key2-5', 'value2-5', '20201225T142142'] ! 5 ['key1-5', 'value1-v', 'key2-5', 'value2-5', '20210109T111231'] @ [1]
● Count & Row number
same lines : 1
left side only (<): 1 :-- Row Numbers -->: [3]
right side only (>): 1 :-- Row Numbers -->: [2]
with differences (!): 2 :-- Row Number Pairs -->: [(2, 3), (5, 5)]
Lines with no differences ('same lines') do not have a DIFF-MARK.
⚠️ Caution-
-a
option and-d
option cannot be specified at the same time.
Let’s try to display the report vertically.
Use the -v
option (--vertical-style
).
$ ../../src/csvdiff3/csvdiff.py sample_lhs.csv sample_rhs.csv -k 0,2 -ac -i 4 -v
============ Report ============
● All
--------------------------------------------------------------------------------
L sample_lhs.csv
R sample_rhs.csv
--------------------------------------------------------------------------------
> R 2 ['key1-1', 'value1-1', 'key2-1', 'value2-1', '20210108T142358']
! @ [1, 3]
L 2 ['key1-2', 'value1-2', 'key2-2', 'value2-2', '20201224T035908']
R 3 ['key1-2', 'value1-3', 'key2-2', 'value2-z', '20210108T174216']
< L 3 ['key1-3', 'value1-3', 'key2-3', 'value2-3', '20201224T180527']
=
L 4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20201225T104851']
R 4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20210109T090245']
! @ [1]
L 5 ['key1-5', 'value1-5', 'key2-5', 'value2-5', '20201225T142142']
R 5 ['key1-5', 'value1-v', 'key2-5', 'value2-5', '20210109T111231']
● Count & Row number
same lines : 1
left side only (<): 1 :-- Row Numbers -->: [3]
right side only (>): 1 :-- Row Numbers -->: [2]
with differences (!): 2 :-- Row Number Pairs -->: [(2, 3), (5, 5)]
-
Differences are indicated by the following DIFF-MARKs
-
=
: No difference -
!
: There is a difference -
<
: Exists only on the left side -
>
: Exists only on the right side
-
-
Unlike the horizontal report, the DIFF-MARK (
=
) is also displayed on the lines where there is no difference -
The
L
mark represents the first specified file, and theR
mark represents the next specified file -
For rows with differences, the column indices with differences are displayed after
@
Use the -u
option (--unique-key
) if you want to detect errors in key columns that should be unique but are not.
Without this option, the comparison process will be performed as is,
On the other hand, If this option is specified, processing will end when duplicates are detected.
If the result is not displayed normally,
You can check the status with the -x
option (--show-context-from-arguments
).
$ ../../src/csvdiff3/csvdiff.py sample_lhs.csv sample_rhs.csv -k 0,2 -ac -i 4 -x
============ Report ============
● Context
File Path on the Left-Hand Side: /path/to/sample_lhs.csv
File Path on the Right-Hand Side : /path/to/sample_rhs.csv
Matching Key Indices: [MatchingKeyInfo(0, '<not specified>'), MatchingKeyInfo(2, '<not specified>')]
Matching Key Is Unique?: False
Column Indices to Ignore: [4]
with Header?: True
Report Style: Two facing (Horizontal)
Show Count?: True
Show Difference Only?: False
Show All?: True
Show Context?: True
CSV Sniffing Size: 4096
--- csv analysis conditions ---
Forces Individual Specified Conditions?: False
column_separator_for_lhs: ,
column_separator_for_rhs: ,
line_separator_for_lhs: 0d0a
line_separator_for_rhs: 0d0a
quote_char_for_lhs: "
quote_char_for_rhs: "
skips_space_after_column_separator_for_lhs: True
skips_space_after_column_separator_for_rhs: True
● All
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sample_lhs.csv sample_rhs.csv Column indices with difference
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 2 ['key1-1', 'value1-1', 'key2-1', 'value2-1', '20210108T142358']
2 ['key1-2', 'value1-2', 'key2-2', 'value2-2', '20201224T035908'] ! 3 ['key1-2', 'value1-3', 'key2-2', 'value2-z', '20210108T174216'] @ [1, 3]
3 ['key1-3', 'value1-3', 'key2-3', 'value2-3', '20201224T180527'] <
4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20201225T104851'] 4 ['key1-4', 'value1-4', 'key2-4', 'value2-4', '20210109T090245']
5 ['key1-5', 'value1-5', 'key2-5', 'value2-5', '20201225T142142'] ! 5 ['key1-5', 'value1-v', 'key2-5', 'value2-5', '20210109T111231'] @ [1]
● Count & Row number
same lines : 1
left side only (<): 1 :-- Row Numbers -->: [3]
right side only (>): 1 :-- Row Numbers -->: [2]
with differences (!): 2 :-- Row Number Pairs -->: [(2, 3), (5, 5)]