Catch littered status feedback in csv_parser #338

swozniewski · 2024-02-28T14:01:33Z

I observed a case where the ssh-executor running the squeue command returned slurm_status.stdout containing

Switch to work directory on corresponding scratch directory.
Switch to work directory on corresponding scratch directory.
Switch to work directory on corresponding scratch directory.
5545112||PENDING

where only the last line is expected to be returned from squeue. The other lines might be some system messages (?).
This PR adds a filter that only keeps lines with "|" before feeding them into the csv-Parser.

maxfischer2781

Please change the loop to a comprehension. For the other comment I don't have a strong opinion.

tardis/adapters/sites/slurm.py

maxfischer2781 · 2024-02-28T14:22:17Z

tardis/adapters/sites/slurm.py

+        cleaned_stdout = []
+        for row in slurm_status.stdout.splitlines():
+            if "|" in row:
+                cleaned_stdout.append(row)
        for row in csv_parser(


It might be better to directly use csv.DictReader here. This would avoid the useless roundtrip of creating an iterable of lines, joining that to newline separated lines, and then creating an iterable of lines from it again.
Not sure how straightforward that is nor whether I'm overlooking something, though.

I'm fine either way, please follow your preference.

Actually it uses csv.DictReader.

tardis/tardis/utilities/utils.py

Lines 36 to 66 in 4f93e96

Parses CSV formatted input

:param input_csv: CSV formatted input

:type input_csv: str

:param fieldnames: corresponding field names

:type fieldnames: [List, Tuple]

:param delimiter: delimiter between entries

:type delimiter: str

:param replacements: fields to be replaced

:type replacements: dict

:param skipinitialspace: ignore whitespace immediately following the delimiter

:type skipinitialspace: bool

:param skiptrailingspace: ignore whitespace at the end of each csv row

:type skiptrailingspace: bool

"""

if skiptrailingspace:

input_csv = "\n".join((line.strip() for line in input_csv.splitlines()))

replacements = replacements or {}

with StringIO(input_csv) as csv_input:

csv_reader = csv.DictReader(

csv_input,

fieldnames=fieldnames,

delimiter=delimiter,

skipinitialspace=skipinitialspace,

)

for row in csv_reader:

yield {

key: value if value not in replacements.keys() else replacements[value]

for key, value in row.items()

}

I think I should not touch this if there is no clear request from you side. So I leave it for now.

I think the changes should definitely go to

tardis/tardis/utilities/utils.py

Line 53 in 1c9cd84

instead. This can also have impact on other site adapters using the csv_parser like Moab and HTCondor.

In addition, it would be nice to have a unittest for that issue as well. It should go to the following class

tardis/tests/utilities_t/test_utils.py

Line 90 in 4bc109d

class TestCSVParser(TestCase):

I thought as well that the csv_parser could be modified to skip lines which don't contain the delimiter. But I wasn't sure if this is always intended or expected. Should we add another parameter skiplineswithoutdelimiter with some default to make it adjustable?

Should the unittest be a separate test, which would duplicate some of the existing tests' code, or just an additional nonsense line in the input of one of the existing tests?

Should the unittest be a separate test, which would duplicate some of the existing tests' code, or just an additional nonsense line in the input of one of the existing tests?

Should be a separate member function of that class.

I thought as well that the csv_parser could be modified to skip lines which don't contain the delimiter. But I wasn't sure if this is always intended or expected. Should we add another parameter skiplineswithoutdelimiter with some default to make it adjustable?

I think we have no use-case, where the delimiter is expected to be missing. To be save, I would suggest to require len(fieldnames)>1 in addtion, so the delimiter is expected to exist. There is still the possibility to run in this issue, if the delimiter is a space. However, I think this is something that is hard to avoid and in addition, our site adapters currently use either \t or | as delimiter and generally using a space as a delimiter in a csv is close to suicide ;-).

Co-authored-by: Max Fischer <maxfischer2781@gmail.com>

giffels

See inline comments.

This reverts commit 1c9cd84.

This reverts commit 4f93e96.

swozniewski · 2024-02-29T13:14:21Z

I believe I addressed all comments. Please check again.

maxfischer2781

PR-Reviewers wish you happy merges!

(Yam, missed waiting for the CI to run... 😓 )

codecov-commenter · 2024-02-29T14:53:01Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.92%. Comparing base (5251da3) to head (2c3d2aa).
Report is 11 commits behind head on master.

❗ Current head 2c3d2aa differs from pull request most recent head bf59168. Consider uploading reports for the commit bf59168 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #338      +/-   ##
==========================================
+ Coverage   98.87%   98.92%   +0.04%     
==========================================
  Files          55       55              
  Lines        2225     2226       +1     
==========================================
+ Hits         2200     2202       +2     
+ Misses         25       24       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

giffels

Thanks a lot for your contribution. LGTM!

Catch littered status feedback in slurm.py

4f93e96

giffels requested review from a team, giffels and maxfischer2781 and removed request for a team February 28, 2024 14:08

giffels added the bug Something isn't working label Feb 28, 2024

maxfischer2781 requested changes Feb 28, 2024

View reviewed changes

use list comprehension instead of loop

1c9cd84

Co-authored-by: Max Fischer <maxfischer2781@gmail.com>

giffels requested changes Feb 28, 2024

View reviewed changes

swozniewski added 5 commits February 29, 2024 12:10

Revert "use list comprehension instead of loop"

f8fd06e

This reverts commit 1c9cd84.

Revert "Catch littered status feedback in slurm.py"

cbfb49c

This reverts commit 4f93e96.

make csv_parser ignore lines without delimiter

fb22bcc

add unit test for csv_parser cleaning

e886931

apply csv_parser cleaning only if multiple fields are expected

2c3d2aa

maxfischer2781 previously approved these changes Feb 29, 2024

View reviewed changes

Better formatting

bf59168

swozniewski dismissed maxfischer2781’s stale review via bf59168 February 29, 2024 14:52

Better formatting

bf6b16f

giffels requested review from giffels and maxfischer2781 February 29, 2024 15:05

giffels approved these changes Feb 29, 2024

View reviewed changes

giffels changed the title ~~Catch littered status feedback in slurm.py~~ Catch littered status feedback in csv_parser Feb 29, 2024

maxfischer2781 approved these changes Feb 29, 2024

View reviewed changes

giffels added this pull request to the merge queue Feb 29, 2024

Merged via the queue into MatterMiners:master with commit fda5a8c Feb 29, 2024
16 checks passed

giffels mentioned this pull request Feb 29, 2024

Prepare release version 0.8.2 #339

Merged

4 tasks

giffels added a commit to giffels/tardis that referenced this pull request Feb 29, 2024

Add changelog for MatterMiners#338

c6cca5d

giffels added a commit to giffels/tardis that referenced this pull request Apr 29, 2024

Add changelog for MatterMiners#338

b68f28c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catch littered status feedback in csv_parser #338

Catch littered status feedback in csv_parser #338

swozniewski commented Feb 28, 2024

maxfischer2781 left a comment

maxfischer2781 Feb 28, 2024

giffels Feb 28, 2024

swozniewski Feb 28, 2024

giffels Feb 28, 2024

giffels Feb 28, 2024

swozniewski Feb 28, 2024

swozniewski Feb 28, 2024

giffels Feb 29, 2024

giffels Feb 29, 2024 •

edited

Loading

giffels left a comment

swozniewski commented Feb 29, 2024

maxfischer2781 left a comment •

edited

Loading

codecov-commenter commented Feb 29, 2024

giffels left a comment

	Parses CSV formatted input

	:param input_csv: CSV formatted input
	:type input_csv: str
	:param fieldnames: corresponding field names
	:type fieldnames: [List, Tuple]
	:param delimiter: delimiter between entries
	:type delimiter: str
	:param replacements: fields to be replaced
	:type replacements: dict
	:param skipinitialspace: ignore whitespace immediately following the delimiter
	:type skipinitialspace: bool
	:param skiptrailingspace: ignore whitespace at the end of each csv row
	:type skiptrailingspace: bool
	"""
	if skiptrailingspace:
	input_csv = "\n".join((line.strip() for line in input_csv.splitlines()))

	replacements = replacements or {}
	with StringIO(input_csv) as csv_input:
	csv_reader = csv.DictReader(
	csv_input,
	fieldnames=fieldnames,
	delimiter=delimiter,
	skipinitialspace=skipinitialspace,
	)
	for row in csv_reader:
	yield {
	key: value if value not in replacements.keys() else replacements[value]
	for key, value in row.items()
	}

Catch littered status feedback in csv_parser #338

Catch littered status feedback in csv_parser #338

Conversation

swozniewski commented Feb 28, 2024

maxfischer2781 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

giffels Feb 29, 2024 • edited Loading

Choose a reason for hiding this comment

giffels left a comment

Choose a reason for hiding this comment

swozniewski commented Feb 29, 2024

maxfischer2781 left a comment • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Feb 29, 2024

Codecov Report

giffels left a comment

Choose a reason for hiding this comment

giffels Feb 29, 2024 •

edited

Loading

maxfischer2781 left a comment •

edited

Loading