Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add lstrip() to lines for parsing #657

Merged
merged 1 commit into from
Jun 22, 2022
Merged

add lstrip() to lines for parsing #657

merged 1 commit into from
Jun 22, 2022

Conversation

Leimeroth
Copy link
Member

Some package/lammps version change or similar shifts the line
Step Temp PotEng TotEng Pxx Pxy Pxz Pyy Pyz Pzz Volume
which is used as a trigger for parsing by some whitespace, causing problems.
To generally prevent this lstrip() is applied to all lines read from log.lammps

@pmrv
Copy link
Contributor

pmrv commented Jun 22, 2022

I'm a bit unhappy with this whole function (not your contribution), because this reads the whole file into memory and iterates multiple times over all the lines. I'm just quickly checking whether I can come up with a single pass solution.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 2541304704

  • 1 of 1 (100.0%) changed or added relevant line in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.002%) to 69.11%

Totals Coverage Status
Change from base Build 2536526622: 0.002%
Covered Lines: 11690
Relevant Lines: 16915

💛 - Coveralls

@Leimeroth
Copy link
Member Author

It would be possible to simply iterate over the lines directly instead of using readlines by looping:
for l in f:
and only create a df whenever a trigger was found or even read in the whole str up to the end and parse everything using pandas instead of creating 1000s of dataframes.
But I think the logfile does not really matter too much compared to reading the dump file anyway concerning time and memory consumption

@Leimeroth
Copy link
Member Author

Untested draft: #658

@pmrv
Copy link
Contributor

pmrv commented Jun 22, 2022

I have definitely produced log.lammps files in the 100's of MB and the old version uses ~1GB of ram for the parse, so I think it's ok to have an eye on it. Thanks for you other PR, I'll benchmark it in a bit.

@pmrv pmrv mentioned this pull request Jun 22, 2022
@Leimeroth Leimeroth merged commit de47042 into master Jun 22, 2022
@delete-merged-branch delete-merged-branch bot deleted the lammps-parsing branch June 22, 2022 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants