-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Amber parser should be improved #222
Comments
These all sound like good suggestions. As you point out, remaining backwards-compatible and not breaking existing code is important. I would encourage you to start with a PR that doesn't do everything but only minimal changes. That's a lot easier to review: less code and the purpose can be stated clearly. Presumably, the biggest change is (3). The conservative approach would be to first do (4), (1) and (2) for the current parser. These are all API-relevant changes. We might be able to get this into 1.0.0, which would be important for defining the API. We could then tackle (3), which keeps the API but changes the code underneath (and adds the new |
Hi! In my fork I already implemented all the changes, I found it easier to do it in one go, as the cose was heavily duplicated before in the two functions, so unifying them before making the changes allowed me to change just one function then. My modified fork currently pass all the present tests, I want to add more tests before making a PR, though, if I need to add more files to be tested, I should also make a PR in alchemtest? |
Yes, new tests go into alchemtest.
… Am 9/7/22 um 00:11 schrieb Domenico Marson ***@***.***>:
Hi! In my fork I already implemented all the changes, I found it easier to do it in one go, as the cose was heavily duplicated before in the two functions, so unifying them before making the changes allowed me to change just one function then.
My modified fork currently pass all the present tests, I want to add more tests before making a PR, though, if I need to add more files to be tested, I should also make a PR in alchemtest?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.
|
Thanks for this suggestion. Sorry I was in a conference.
I think this is a good idea.
This sounds good to me as well but I'm not an amber expert.
Agreed.
I think the user still needs to be warned. I mean people who know what they are doing will not care but novice users might need this.
I think the overall idea is good but we might need to discuss how to set the API up. The alchemlyb will want to read more than amber files and we want a uniformed interface across all MD engines. Given that some MD engine don't give u_nk. I don't think the API call of
This would mean that the file don't need to be read twice. This could also be expended to
To kind of preserve the original API call. If we have this infrastructure in place, we could also easily preserve the current call with a static method.
I also think this is a good idea. So I think 1/2/4 are both very good ideas and can be made into one PR, I would imagine that it could be reviewed and merged relatively quickly. The point 3 definitely deserve its own PR as we need to change the API call for all MD engines, document the changes, and fix all the offending lines and it will be a big PR needs several rounds of reviewing and would take a while. |
Sorry for the late replay! Regarding warning the user when MBAR is not read when just dHdl extraction is requested, I'm not so sure this is needed. Amber is quite different from other software, as in the same output file we could have all the information about both dHdl and MBAR. But if the user is requesting just dHdl values, he should not be warned about something he doesn't care about, while the warning should be displayed when both dHdl and u_nk are requested, returning maybe |
If I understand correctly, I could do a PR addressing 1/2/4, while to make the interface change that you proposed (which I find really interesting) we could have a discussion and then make a change in all the parsers, right? If my understanding is correct, could I please make the PR addressing 1/2/4 with the hybrid implementation that I have right now? I would just rename and hide the current Currently all present tests are OK, but I would like to implement some more test to check some missing part of the code. Regarding this, I noticed that currently some testing functions run on a collection of output file, testing for different inconsistencies, but in this way when a test fails the report is not such clear, and one have to go read what was wrong in each file. Could I rename the files and make one separate test for each of them? |
I have some comments on the specific points in #222 (comment)
We cannot make it optional for the other parsers. I'd rather keep the API the same across all parsers so that you can use the same code for analyzing, say, AMBER and GROMACS files.
Yes, adding a check is good.
I agree with @DrDomenicoMarson : If the user's intent is clearly that they only reading TI data (by using
I agree with @xiki-tempula that it still needs to be possible to parse AMBER files in the same way as other files and it should be transparent. Note that GROMACS can also save both TI and MBAR data in one file so we should come up with an approach that is general. I feel that this is a bigger question so a separate issue might be the better place. I just want to say that @xiki-tempula 's approach looks feasible but we would be effectively introducing a class
I agree with @xiki-tempula 's assessment. |
I don't think that doing a PR with (3) already included is a good idea. This will be too complicated to review. It is really important to keep development of alchemlyb focused. Reviewing PRs takes a lot of our time — time that we have to take somehow from our more than full schedules (we don't get paid for working on alchemlyb!), we need you to help us to make the process as easy as possible. Breaking down changes into independent parts is really important. It also makes for a much cleaner history (which is important when debugging). Additionally, we would like to get 1.0 out soon. It seems feasible to get (1), (2), (4) in in time, but (3) opens up a much bigger discussion. |
Could you give an example — ideally in a new issue? Better tests are always welcome. The best thing is to start with a small PR that fixes one thing.
I am not sure what files you're referring to. We really try to avoid renaming data files and data sets (i.e., files that are already in alchemtest) as this can break code elsewhere. If you want to change the |
I agree that (3) is not feasible so quickly, as it implies changing the behavior of other parsers as well. I was suggesting that I'll do a PR with the other issues solved, but from a code standpoint I will merge the two functions into one, exposing to the user the same functions that there are now, so for the user, nothing will change. As an example
I think this is a good idea because currently the code is mostly repeated twice, so the adjustment has to be made in different portions of the code, so the reviewing should not be much harder in this way, but even easier. Also, I'm trying to implement more tests. |
Regarding my question about tests, I raised an issue in alchemtest, as to keep the conversations more focused! |
If we make it optional the API will not change compared to the other parsers, the user can still call extract_u_nk(file, T) explicitly with T, but the difference for AMBER files will be that the user could also NOT specify T. If the user specifies a T that is different than the temperature at which the simulation has been run, the new parser will give a WARNING (I noticed for example that in the current tests, we extracted with T=300, while the simulations were run at 298.15 K, not a big issue obviously XD) EDIT: working on it a little bit further, I found that making T optional breaks the functionality of |
I didn't know you were trying to avoid holding state, and I think in this case it shouldn't be needed, for any parsers. We could just have a common hidden function that parses the file with some switch to select which information is requested, the function then will return only what was needed. Than there will be the two already existing parsing functions that will call the hidden function with the correct switch to read the proper values. Also, we could introduce a third function that will extract both dHdl and u_nk, calling the hidden function and requesting both values. The user in this way has the flexibility to read what he needs from the files, calling the proper function. |
If possible, we would want to avoid switches that are specific for one parser. The API should be as general as possible. Maybe the better choice is indeed as you propose to have a specific "read_u_nk_and_ti=True" switch only for the AMBER parser and then have other parsers ignore the kwarg. That's feasible. We could have a top level extractor that gets dHdl and u_nk in one function call. By default it uses the separate The basic approach is that the API should be as uniform as possible. The goal is that you should be able to write code with alchemlyb that is agnostic to where the data comes from. |
In all fairness, that should be changed eventually. Good catch. (EDIT: Make a separate issue, please. Don't fix as part of another PR.) |
I don't get where the problem is here, as with what I proposed the user is not exposed to any new switch, and the API stays the same as before with (I edited the pseudo-code a little bit)
|
I understand that everything has to be made in very small steps, I'm sorry I didn't understand this earlier, but I thought the AMBER file parsing was a little bit "left behind" here as there are many small different issues to be addressed. That's the reason 1 year ago or so I gave up on using alchemlyb to analyze my AMBER simulations as I hadn't the time to check what was wrong (I couldn't concatenate properly different files, I thought I could just fix (or try to :-) ) different things in one go, I'm sorry. I think it would be better to close this issue as "not planned", and I can re-open different issues for every simple thing I can address in small fixes. For now, I leave this open, but I opened two issues addressing points 1)/2), and 4), leaving this issue just for point 3). let me know if it's better to close it and open another dedicated issue! |
Thank you @DrDomenicoMarson for coming back to alchemlyb and starting to fix things — it's incredibly valuable to have contributors like you who work with the specific code. Reading your API proposal I agree with you that it can be done without breaking the existing API and instead introducing a new function Thank you for opening the other issues. That makes it a lot easier to break down the development into manageable chunks. |
I'm concerned with NAMD parser not having a dHdl reader so we cannot have |
I like the idea of an |
I like the idea of @xiki-tempula as well, it seems that it serves everything you wanted in a very elegant way! |
@orbeckst @xiki-tempula what are you thought about the introduction of the |
Yes, let's get |
Great! |
For other parsers just do
Then add a note saying that this is just a wrapper around extract_dHdl and extract_u_nk. |
Ok, I think I can do it without breaking anything :-) A bigger issue would be to create a proper test for each of the other parsers for this function I think! |
@DrDomenicoMarson just wanted to say a big thank you for all your recent work — well done. Please continue to contribute!!!! |
…ests (#240) Ok, I hope not to have broken anything with this PR :-) I added the extract(file, T) function to all the existing parsers, with a relative test for each parser. The only parser that was heavily hit by this PR is the AMBER parser, as was mentioned in issue #222 . Here I had to make different adjustments, on my machine all tests are passing through. I took the opportunity to make also some small, cosmetic, changes to the code in the amber parser (just a couple of them, like removing the unnecessary 'class SomeThing(Object').
Addressed with PR #240 |
Hello everyone, I think the Amber_parser needs some adjustment, other than issue #221 which was causing misreading of time information for both u_nk and dHdl parsers. Here I report the fixes that I think should be made, and I think the best thing should be a complete rewrite of the parser, properly tested:
Temperature is required in input for both parsers (extract_dHdl and extract_u_nk), but system temperature is already automatically read from the output file (reading
temp0
). We could make theT
variable optional in the parsers, and I suggest not removing it, not to break existing scripts from users. We should check that the requested temperatureT
is the same as reported by the AMBER output file and report eventual inconsistencies to the user.related to point 1), currently if
temp0
is not found we log that 'Non-constant temperature MD not currently supported.' I don't think that this is right, as temp0 is settled even in non-constant temperature MD, so I think that the check is meaningless. I agree that we could warn the users if the simulation was not performed at a constantT
, but that's not so easy to check, and I don't think a similar check exists for other parsers (the user should know the simulation have to be run in NPT/NVT ensemble).AMBER output file is one only with all the information for TI and MBAR reported every
ntpr
timesteps. Currently, I see two problems here:3a. If the user wants to extract both dHdl and u_nk values, the output file is completely parsed twice, which can waste quite some time.
3b. If the user only wants dHdl values, and the file doesn't have proper MBAR information, the user is needlessly warned that MBAR values are not read.
I think we should refactor the code to have a single run through the AMBER output file, extracting just what's needed by the user in one go. I imagine we could have the main function called
extract_dHdl_and_u_nk
, which defaults to returns both dHdl and u_nk, while maintaining the currentextract_dHdL
andextract_u_nk
which will simply call the main function with the optional switch to extract only what's needed.AMBER output can optionally report every
ntave
steps the average of dHdl and other parameters, right now parsing dHdl we are collecting these values (here we are also collecting different components reported by AMBER), but we are not using this information at all in the code... I think we could just skip the reading of the averages since we are collecting all the values (and then we'll perform better equilibration/subsampling analysis).I'm working on implementing those changes, please let me know if you think anything else should be changed (or if I should just not touch anything :-) )!
The text was updated successfully, but these errors were encountered: