-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new interface for samples and fba, fva solutions and workaround for #96 #98
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! These are really cool stuff.
I have some comments mostly on the efficiency of the proposed changes.
@@ -218,6 +218,27 @@ max_biomass_objective = fva_output[3] | |||
|
|||
The output of FVA method is tuple that contains `numpy` arrays. The vectors `min_fluxes` and `max_fluxes` contains the minimum and the maximum values of each flux. The vector `max_biomass_flux_vector` is the optimal flux vector according to the biomass objective function and `max_biomass_objective` is the value of that optimal solution. | |||
|
|||
```python | |||
fva_output_df = model.fva_to_df() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Do we still need the old method model.fva()
that changes the internal state of model? I suppose that one function is enough. Then the new one could be called fva()
it is more clear I think. And the old one should be made private. Actually it is even simpler if there exist only one function fva()
with the new interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comments hold for fva
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit tricky as the original fva
returns a set of 4 things that we use in other parts of the PolytopeSampler
class. Yet, I changes those parts with an _fva
and _fba
and we now have fva()
and fba()
to return the dataframes. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but as the name suggests _fva
and _fba
should be private, right?
Also, I still cannot see the need of two fva (and fba) functions. I think it complicates the interface. Is it possible to keep the old functions and then the user (if needed) compute a dataframe on demand (it is one line of code).
self._lb, | ||
self._ub, | ||
self._S, | ||
self._biomass_function, | ||
self._parameters["opt_percentage"], | ||
) | ||
self._min_fluxes = min_fluxes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need them to be stored in the model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial thought was about the sample_from_fva_output
function that used to get as arguments the min_fluxes
and the max_fluxes
.
At first, I thought of keeping those in the model to provide them in the function but then I thought it would be simpler if the fva is performed with the sample_from_fva_output
function.
Yet, maybe it's practical to keep them. No strong arguments on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was lost a bit. Are those class variables used even after that changes of this PR? As far as understand sample_from_fva_output function is not using them.
For example: | ||
|
||
```python | ||
initial_medium = model.medium |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be simplified avoiding to copy the entire dictionary (see #95 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check for my reply on the issue.
if you have any ideas for alternatives, let me know!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see so there is still an issue in setting the model correctly. Should we address it in a different PR?
@@ -184,8 +187,10 @@ def generate_steady_states( | |||
self._T = np.dot(self._T, Tr) | |||
self._T_shift = np.add(self._T_shift, Tr_shift) | |||
|
|||
return steady_states | |||
|
|||
steady_states_df = pd.DataFrame(steady_states, index = self._metabolic_network.reactions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to make that transformation by default? As far as I can understand that could be memory intensive (especially for large matrices). I think it is better to add a method that takes steady_states and create a df or simply write that one line of code in the doc or README.
@@ -350,6 +351,17 @@ def sample_from_fva_output( | |||
|
|||
return steady_states | |||
|
|||
@staticmethod | |||
def samples_as_df(model, samples): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
samples or steady states?
dingo/loading_models.py
Outdated
|
||
return lb, ub, S, metabolites, reactions, \ | ||
biomass_index, biomass_function, medium, inter_medium, exchanges, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be in two lines instead of three?
dingo/loading_models.py
Outdated
|
||
return lb, ub, S, metabolites, reactions, \ | ||
biomass_index, biomass_function, medium, inter_medium, exchanges, \ | ||
reactions_map, metabolites_map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should also update the documentation of the function with medium, ..., metabolites_map
model.set_slow_mode() | ||
|
||
# Check if script is running in GitHub action | ||
if os.getenv('CI', 'false').lower() == 'true': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool!
@@ -689,7 +689,7 @@ | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"execution_count": 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, notebooks cannot be reviewed in github... See https://github.com/orgs/community/discussions/12959
self._N_shift = [] | ||
self._T = [] | ||
self._T_shift = [] | ||
self._metabolic_network = copy.deepcopy(metabol_net) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think making a deep copy just shifting the problem (see #96 (comment))
Shouldn't the sampler only transform A and b and leave the model as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this has to do with the fast_remove_redundant_facets
.
I suggest we keep that as is for this PR so we have something working fine for now, and we open a new PR for the fast_remove_redundant_facets
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK let's go with the workaround for now, but please keep the issue open until there is a proper fix. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi again, sorry for late reply.
Let me comment to each change in this PR separately,
- Regarding dataframes, is it simpler just to let is to the user to create a dataframe if needed, it is just a call
pd.DataFrame(data, index)
at the end of the day. This way we avoid complicating the interface. - Regarding Redundant methods in MetabolicNetwork #95 and medium I propose to handle it in a separate PR to solves in fully (it seems that now it is a work in progress)
- Regarding Error using fva after sampling #96 I agree to use the deepcopy workaround (but please keep the issue open)
@@ -218,6 +218,27 @@ max_biomass_objective = fva_output[3] | |||
|
|||
The output of FVA method is tuple that contains `numpy` arrays. The vectors `min_fluxes` and `max_fluxes` contains the minimum and the maximum values of each flux. The vector `max_biomass_flux_vector` is the optimal flux vector according to the biomass objective function and `max_biomass_objective` is the value of that optimal solution. | |||
|
|||
```python | |||
fva_output_df = model.fva_to_df() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but as the name suggests _fva
and _fba
should be private, right?
Also, I still cannot see the need of two fva (and fba) functions. I think it complicates the interface. Is it possible to keep the old functions and then the user (if needed) compute a dataframe on demand (it is one line of code).
For example: | ||
|
||
```python | ||
initial_medium = model.medium |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see so there is still an issue in setting the model correctly. Should we address it in a different PR?
self._N_shift = [] | ||
self._T = [] | ||
self._T_shift = [] | ||
self._metabolic_network = copy.deepcopy(metabol_net) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK let's go with the workaround for now, but please keep the issue open until there is a proper fix. Thanks!
self._lb, | ||
self._ub, | ||
self._S, | ||
self._biomass_function, | ||
self._parameters["opt_percentage"], | ||
) | ||
self._min_fluxes = min_fluxes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was lost a bit. Are those class variables used even after that changes of this PR? As far as understand sample_from_fva_output function is not using them.
Aim of this PR is:
With respect to aim (1) instead of searching for the index of a reaction, the user gets a dataframe with the reaction ids as indices.
and
A relevant interface is available for fba and fva solutions too.
There is also a
reactions_map
to link a reaction's id to its complete name (quite useful in case of modelseed models).Regarding aim (2), an example of how to use the
medium
was added in the README file and the setter was fixed (#95), while usingdeepcopy
for getting the bounds addressed #96