Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch littered status feedback in csv_parser #338

Merged
merged 9 commits into from
Feb 29, 2024

Conversation

swozniewski
Copy link
Contributor

I observed a case where the ssh-executor running the squeue command returned slurm_status.stdout containing

Switch to work directory on corresponding scratch directory.
Switch to work directory on corresponding scratch directory.
Switch to work directory on corresponding scratch directory.
5545112||PENDING

where only the last line is expected to be returned from squeue. The other lines might be some system messages (?).
This PR adds a filter that only keeps lines with "|" before feeding them into the csv-Parser.

@giffels giffels requested review from a team, giffels and maxfischer2781 and removed request for a team February 28, 2024 14:08
@giffels giffels added the bug Something isn't working label Feb 28, 2024
Copy link
Member

@maxfischer2781 maxfischer2781 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the loop to a comprehension. For the other comment I don't have a strong opinion.

tardis/adapters/sites/slurm.py Outdated Show resolved Hide resolved
cleaned_stdout = []
for row in slurm_status.stdout.splitlines():
if "|" in row:
cleaned_stdout.append(row)
for row in csv_parser(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to directly use csv.DictReader here. This would avoid the useless roundtrip of creating an iterable of lines, joining that to newline separated lines, and then creating an iterable of lines from it again.
Not sure how straightforward that is nor whether I'm overlooking something, though.

I'm fine either way, please follow your preference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it uses csv.DictReader.

Parses CSV formatted input
:param input_csv: CSV formatted input
:type input_csv: str
:param fieldnames: corresponding field names
:type fieldnames: [List, Tuple]
:param delimiter: delimiter between entries
:type delimiter: str
:param replacements: fields to be replaced
:type replacements: dict
:param skipinitialspace: ignore whitespace immediately following the delimiter
:type skipinitialspace: bool
:param skiptrailingspace: ignore whitespace at the end of each csv row
:type skiptrailingspace: bool
"""
if skiptrailingspace:
input_csv = "\n".join((line.strip() for line in input_csv.splitlines()))
replacements = replacements or {}
with StringIO(input_csv) as csv_input:
csv_reader = csv.DictReader(
csv_input,
fieldnames=fieldnames,
delimiter=delimiter,
skipinitialspace=skipinitialspace,
)
for row in csv_reader:
yield {
key: value if value not in replacements.keys() else replacements[value]
for key, value in row.items()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should not touch this if there is no clear request from you side. So I leave it for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes should definitely go to

instead. This can also have impact on other site adapters using the csv_parser like Moab and HTCondor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, it would be nice to have a unittest for that issue as well. It should go to the following class

class TestCSVParser(TestCase):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought as well that the csv_parser could be modified to skip lines which don't contain the delimiter. But I wasn't sure if this is always intended or expected. Should we add another parameter skiplineswithoutdelimiter with some default to make it adjustable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the unittest be a separate test, which would duplicate some of the existing tests' code, or just an additional nonsense line in the input of one of the existing tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the unittest be a separate test, which would duplicate some of the existing tests' code, or just an additional nonsense line in the input of one of the existing tests?

Should be a separate member function of that class.

Copy link
Member

@giffels giffels Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought as well that the csv_parser could be modified to skip lines which don't contain the delimiter. But I wasn't sure if this is always intended or expected. Should we add another parameter skiplineswithoutdelimiter with some default to make it adjustable?

I think we have no use-case, where the delimiter is expected to be missing. To be save, I would suggest to require len(fieldnames)>1 in addtion, so the delimiter is expected to exist. There is still the possibility to run in this issue, if the delimiter is a space. However, I think this is something that is hard to avoid and in addition, our site adapters currently use either \t or | as delimiter and generally using a space as a delimiter in a csv is close to suicide ;-).

Co-authored-by: Max Fischer <maxfischer2781@gmail.com>
Copy link
Member

@giffels giffels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline comments.

@swozniewski
Copy link
Contributor Author

I believe I addressed all comments. Please check again.

maxfischer2781
maxfischer2781 previously approved these changes Feb 29, 2024
Copy link
Member

@maxfischer2781 maxfischer2781 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR-Reviewers wish you happy merges!

(Yam, missed waiting for the CI to run... 😓 )

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.92%. Comparing base (5251da3) to head (2c3d2aa).
Report is 11 commits behind head on master.

❗ Current head 2c3d2aa differs from pull request most recent head bf59168. Consider uploading reports for the commit bf59168 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #338      +/-   ##
==========================================
+ Coverage   98.87%   98.92%   +0.04%     
==========================================
  Files          55       55              
  Lines        2225     2226       +1     
==========================================
+ Hits         2200     2202       +2     
+ Misses         25       24       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@giffels giffels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your contribution. LGTM!

@giffels giffels changed the title Catch littered status feedback in slurm.py Catch littered status feedback in csv_parser Feb 29, 2024
@giffels giffels added this pull request to the merge queue Feb 29, 2024
Merged via the queue into MatterMiners:master with commit fda5a8c Feb 29, 2024
16 checks passed
@giffels giffels mentioned this pull request Feb 29, 2024
4 tasks
giffels added a commit to giffels/tardis that referenced this pull request Feb 29, 2024
giffels added a commit to giffels/tardis that referenced this pull request Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants