Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add logging #2512

Merged
merged 5 commits into from
Apr 5, 2022
Merged

add logging #2512

merged 5 commits into from
Apr 5, 2022

Conversation

loriab
Copy link
Member

@loriab loriab commented Mar 29, 2022

Description

DDD in #1351 is nominally ready, but looking through the diff, I don't think I can answer close questioning on all the pieces. So I'm going to try to pick off some topics for coherent discussion.

This is No. 1 of the DDD series, #1351.

Todos

  • New psi4.set_output_file function to both set up a logger and call psi4.core.set_output_file
  • Added some brief logging in the driver. This doesn't look much like DDD, but it's something to work with.
  • output file naming has been changed up. outputs may have double extension (e.g., myfile.log.out) to cope with the oddity of two extensions. Mostly, something.in creates something.out and something.log.
  • docs

Below is an example of output set at DEBUG with an extra gradient() call added to tu1
>>> psi4 ../tests/tu1-h2o-energy/input.dat --loglevel=10 && cat ../tests/tu1-h2o-energy/input.log

2022-03-28:21:06:46,635 INFO     [psi4.driver.driver:603] Compute energy(): method=scf, basis=cc-pvdz, molecule=h2o, nre=9.168193296424349
2022-03-28:21:06:46,644 DEBUG    [psi4.driver.driver:604] {'elbl': array(['', '', ''], dtype='<U1'),
 'elea': array([16,  1,  1]),
 'elem': array(['O', 'H', 'H'], dtype='<U1'),
 'elez': array([8, 1, 1]),
 'fix_com': False,
 'fix_orientation': False,
 'fragment_charges': [0.0],
 'fragment_multiplicities': [1],
 'fragment_separators': [],
 'geom': array([ 0.        ,  0.        , -0.06577557,  0.        , -0.75906199,
        0.52195302,  0.        ,  0.75906199,  0.52195302]),
 'mass': array([15.99491462,  1.00782503,  1.00782503]),
 'molecular_charge': 0.0,
 'molecular_multiplicity': 1,
 'name': 'h2o',
 'provenance': {'creator': 'QCElemental', 'routine': 'qcelemental.molparse.from_string', 'version': 'v0.24.0'},
 'real': array([ True,  True,  True]),
 'units': 'Angstrom'}
2022-03-28:21:06:47,79 INFO     [psi4.driver.driver:606] Return energy(): -76.0266327350908
2022-03-28:21:06:47,83 INFO     [psi4.driver.driver:748] Compute gradient(): method=scf, basis=cc-pvdz, molecule=h2o, nre=9.168193296424349
2022-03-28:21:06:47,86 DEBUG    [psi4.driver.driver:749] {'elbl': array(['', '', ''], dtype='<U1'),
 'elea': array([16,  1,  1]),
 'elem': array(['O', 'H', 'H'], dtype='<U1'),
 'elez': array([8, 1, 1]),
 'fix_com': False,
 'fix_orientation': False,
 'fragment_charges': [0.0],
 'fragment_multiplicities': [1],
 'fragment_separators': [],
 'geom': array([ 0.        ,  0.        , -0.06577557,  0.        , -0.75906199,
        0.52195302,  0.        ,  0.75906199,  0.52195302]),
 'mass': array([15.99491462,  1.00782503,  1.00782503]),
 'molecular_charge': 0.0,
 'molecular_multiplicity': 1,
 'name': 'h2o',
 'provenance': {'creator': 'QCElemental', 'routine': 'qcelemental.molparse.from_string', 'version': 'v0.24.0'},
 'real': array([ True,  True,  True]),
 'units': 'Angstrom'}
2022-03-28:21:06:47,455 INFO     [psi4.driver.driver:751] Return gradient(): -76.02663273512434
2022-03-28:21:06:47,455 INFO     [psi4.driver.driver:752] [[-0.          0.         -0.01764163]
 [ 0.         -0.01243841  0.00882081]
 [-0.          0.01243841  0.00882081]]

Questions

  • What should default loglevel be? Py default is WARN=30. That presently will produce an empty log file since psi4 starts with INFO=20. Present setup is 20 for psithon and 30 for psiapi. ANSWER: going with 20

Checklist

Status

  • Ready for review
  • Ready for merge

@loriab loriab added this to the Psi4 1.6 milestone Mar 29, 2022
@JonathonMisiewicz
Copy link
Contributor

I'm okay with this (once tests pass) but do want to see the docs. I especially need to know how developers decide what's worth logging.

@loriab
Copy link
Member Author

loriab commented Mar 29, 2022

ok, docs and hopefully a fix for efp is up.

especially need to know how developers decide what's worth logging.

I don't have a good answer. On the DDD side, stuff goes in the log that no longer appends nicely to the output (because its run in a separate process), so logging is on the generous side.

In general, the stuff we want to log doesn't map evenly to the (critical=50, error=40, warn=30, info=20, debug=10) levels that Python sets up. Something more like https://www.ibm.com/docs/en/cognos-analytics/10.2.2?topic=SSEP7J_10.2.2/com.ibm.swg.ba.cognos.ug_rtm_wb.10.2.2.doc/c_n30e74.html with "debug-low" etc. might be better. After all, geometry of the molecule is pretty essential, but it's not "progress of the job", so I called it debug.

For the moment, log contents are a free-for-all, so I've tried to make that clear in the docs.

@JonathonMisiewicz
Copy link
Contributor

On the DDD side, stuff goes in the log that no longer appends nicely to the output (because its run in a separate process), so logging is on the generous side.

Can you elaborate on how DDD affects our ability to output to file?

@loriab
Copy link
Member Author

loriab commented Mar 29, 2022

Can you elaborate on how DDD affects our ability to output to file?

Sure. For everything but analytic single points (effectively, any composite, finite difference, or many-body expansion), the individual pieces are sent to qcengine as a qcschema atomicinput job that returns a qcschema atomicresult (that has the usual output file printout as a field). So all the usual output is collected, but some of the most important stuff (the atomicresult) is in a dict, which would be weird for an output file. Also, the psi4 output file isn't a wonderfully behaved object for being opened and closed and reset, potentially for dozens of jobs running at the same time. It looks like the individual pieces outputs are being accumulated in the outfile now in DDD (see below). But logging is a natural object for accumulating the results of potentially multiple libraries into one file or stream, so it seems the right trajectory to get started along for distributed calcs.

        gof = core.get_output_file()

        # EITHER ...
        # from psi4.driver import schema_wrapper
        # self.result = schema_wrapper.run_qcschema(self.plan())
        # ... OR ...
        self.result = qcng.compute(
            self.plan(),
            "psi4",
            raise_error=True,
            # local_options below suitable for continuous mode
            local_options={
                "memory": core.get_memory() / (2 ** 30),
                "ncores": core.get_num_threads(),
            },
        )
        # ... END

        core.set_output_file(gof, True)
        core.reopen_outfile()
        logger.debug(pp.pformat(self.result.dict()))
        core.print_out(_drink_filter(self.result.dict()["stdout"]))

Copy link
Contributor

@JonathonMisiewicz JonathonMisiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the right call is to get this in, and then once DDD is in, encourage devs to start experimenting with logging. We can try to standardize when the devs have more experience.

@hokru
Copy link
Member

hokru commented Mar 29, 2022

Generally I'd like the default log level to be more verbose and spam the log file instead of having something crucial missing and the user needs to repeat a calculation.

That said, I haven't used the internal python logger object and these high numbers are and multiple categories unexpected and for a compchem program a bit confusing to be honest.

Copy link
Contributor

@zachglick zachglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for breaking DDD into these smaller PRs!

--------------------------

A log file is generated alongside the output file and has extension
``.log``. Selected key information is sent to it. The log file contents
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything more that can be said about the contents of the .log file than "selected key information"? I understand that logging is new, and there might not be a concrete answer until DDD is fully merged.

However, I think it would be really good to differentiate between the purpose and contents of the .log and .out files: Are the two files redundant? What kind of information should users expect to find in each file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! I've added a little more, but both DDD and I are unclear on the answer. :-)

psi4/extras.py Outdated Show resolved Hide resolved
@loriab loriab merged commit 294c406 into psi4:master Apr 5, 2022
@loriab loriab deleted the intelmultipole branch April 5, 2022 14:09
zachglick pushed a commit to zachglick/psi4 that referenced this pull request Apr 18, 2022
* add logger

* logging and efp

* Apply suggestions from code review

* Update build_planning.rst
zachglick pushed a commit to zachglick/psi4 that referenced this pull request Apr 18, 2022
* add logger

* logging and efp

* Apply suggestions from code review

* Update build_planning.rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants