Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Fix file-picker downstream service notification issues #3058

Merged
merged 36 commits into from
May 25, 2022

Conversation

GitHK
Copy link
Contributor

@GitHK GitHK commented May 20, 2022

What do these changes do?

  • file-picker propagated updates to downstream services due to a side effect (FIXED)
  • file-picker properly sends notifications when the file source changes
  • dynamic-sidecar properly purges directory when it downloads new data
  • dynamic-sidecar inputs downloads are now queued and can no longer be happening in parallel

Related issue/s

How to test

  1. create a jupyter-math:2.0.5
  2. connect a file-picker
  3. add a file to the file-picker
  4. see that the contes is displayed in the inputs_1 directory
  5. remove the file form the file-picker (reset button)
  6. see that the inputs_1 is directory is empty
  7. repeat again from step 3.

Checklist

  • Unit tests for the changes exist

@GitHK GitHK self-assigned this May 20, 2022
@GitHK GitHK added a:webserver issue related to the webserver service a:dynamic-sidecar dynamic-sidecar service changelog:🐛bugfix labels May 20, 2022
@GitHK GitHK added this to the Croissant milestone May 20, 2022
@codecov
Copy link

codecov bot commented May 20, 2022

Codecov Report

Merging #3058 (a0e1311) into master (4bd0ac7) will increase coverage by 5.7%.
The diff coverage is 72.2%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #3058     +/-   ##
========================================
+ Coverage    75.0%   80.8%   +5.7%     
========================================
  Files         715     716      +1     
  Lines       30849   30903     +54     
  Branches     4024    4032      +8     
========================================
+ Hits        23166   24974   +1808     
+ Misses       6820    5061   -1759     
- Partials      863     868      +5     
Flag Coverage Δ
integrationtests 66.3% <81.5%> (+0.1%) ⬆️
unittests 76.7% <68.6%> (+5.8%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...mcore_service_dynamic_sidecar/modules/nodeports.py 23.5% <13.3%> (+0.9%) ⬆️
...service_webserver/projects/projects_nodes_utils.py 90.4% <90.4%> (ø)
...mcore_service_webserver/projects/projects_utils.py 84.1% <93.7%> (+33.0%) ⬆️
...ervice_dynamic_sidecar/api/containers_extension.py 89.1% <100.0%> (ø)
...er/src/simcore_service_webserver/catalog_client.py 59.3% <100.0%> (+20.8%) ⬆️
...webserver/computation_comp_tasks_listening_task.py 76.5% <100.0%> (-2.9%) ⬇️
.../simcore_service_webserver/projects/projects_db.py 92.5% <100.0%> (+1.6%) ⬆️
...rvice_webserver/projects/projects_handlers_crud.py 87.7% <100.0%> (+0.1%) ⬆️
...mcore_service_webserver/garbage_collector_utils.py 79.4% <0.0%> (-3.0%) ⬇️
.../simcore_service_catalog/services/access_rights.py 78.7% <0.0%> (-2.5%) ⬇️
... and 101 more

@GitHK GitHK changed the title 🐛 Fix file-picker issues 🐛 Fix file-picker downstream service notification issues May 20, 2022
@GitHK GitHK marked this pull request as ready for review May 20, 2022 09:09
@GitHK GitHK requested review from pcrespov and sanderegg as code owners May 20, 2022 09:09
Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, but I have some doubts on a few things, please have a look, we can discuss these.

@@ -170,6 +170,76 @@ async def _get_data_from_port(port: Port) -> Tuple[Port, ItemConcreteValue]:
return (port, ret)


async def _download_files(
target_path: Path, download_tasks: Deque[Coroutine[Any, int, Any]]
) -> Tuple[dict[str, Any], ByteSize]:
Copy link
Member

@sanderegg sanderegg May 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Could you create a TypeDict at least, to know what is returned?
  2. check if you can use deque instead of Deque
  3. you can use tuple instead of Tuple
  4. if you bother on butting a deque here, why then not just pass a Sequence type? I mean the best and most secure/fast way is to pass a tuple because then you also pass the fact that the fct will not modify the argument, plus tuple is faster than deque... also I doubt a bit that going from list to deque brings so much of a difference... especially when downloading GBs...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I've found out we already have something similar defined OutputsDict. Will be using that.
  2. Sorry I don't understand the suggestion.
  3. 👍
  4. There is no life changing improvement. The deque is just a better suited for appending data than a list. The change from list to deuque brings no real life benefit. I do not agree on the use of tuples, since they are immutable ad when constructing the sequence you need to append data to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. deque is usable for typing

return data, ByteSize(transferred_bytes)


@run_sequentially_in_context()
async def download_target_ports(
port_type_name: PortTypeName, target_path: Path, port_keys: List[str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use list[str]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will decide that if I remove Dict or List from a function in a module from now on, I will also remove it from the entire module.
@pcrespov could we agree on this?

def _cast_outputs_store(dict_data: dict[str, Any]) -> None:
for data in dict_data.get("outputs", {}).values():
if "store" in data:
data["store"] = f"{data['store']}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: actually I think I would prefer going to int than to string... especially since the day it is fixed it will be an int for sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can work as well. String was selected since is the safer type cast between the two.

@@ -724,6 +811,10 @@ def _update_workbench(
.returning(literal_column("*"))
)
project: RowProxy = await result.fetchone()

for node_update_task in nodes_update_tasks:
fire_and_forget_task(node_update_task)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you check what happens if you have computational services running and stuff like that? you do not end up in a loop?

Copy link
Contributor Author

@GitHK GitHK May 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've checked, and sorry for the bad naming. This is only for frontend services. And for now of all the frontend services only the file picker will be used. I've changed the names to reflect this.

Computational services are not influences by this.

id="different keys but missing key and outputs do not trigger",
),
pytest.param(
{"key": "simcore/services/frontend/file-picker"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I get that correctly now this function will be called from comp_tasks_listening_task and also from PATCH /projects right?
is there any test for the case of computational services, and/or other dynamic services? or is this purely and only for file-picker? in which case please change the name of your functions to reflect that (if not this would be confusing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this affects comp_tasks_listening_task and PATCH /projects (aka, when the fronted saves the project)

The only affects services are frontend services. And also, for now only the file-picker is involved. I've renamed the functions to highlight this.

@GitHK GitHK requested a review from sanderegg May 23, 2022 07:28
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is urgent, i might let it go through but I am concerned about the circular dependency though. Will recheck again later and try to help with a solution.

# this function is called from:
# - `projects/projects_db.py`
# - `computation_comp_tasks_listening_task.py` (was originally here)
from . import projects_api
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really not good. We should have a look at it together. The more we delay solving this the worse it becomes
Please add an issue to follow up on this and make sure there is a FIXME here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added FIXME and created an issue
#3069

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and reduced a bit the dependency but there is still a loop that IMO reveals a design problem in the changes of your PR

image

The problem i see is that you are including plugin-level API (i.e. projects_api) in the implementation of db-API repository (i.e. projects_db) while it should be the other way around i.e. the plugin-level API uses the db-API.

More details on the code follow ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. How did you mange to find the loop, just by visually checking?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RTFM? :-D ... jokes aside, I did it both visually and following the code.

But there is also the option to find them automatically --show-cycles

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: Perhaps add an example of --show-cycles in the doc of the script?

if "store" in data:
data["store"] = int(data["store"])

if current_dict.get("key") == "simcore/services/frontend/file-picker":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, this implementation feel very hacky provided the generic name (find_changed_dict_keys) you gave to the function! Make sure it is only used in your specific case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MINOR please consider comment above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think the comment in the function is explaining what the issue is. I'd guess that we need a better name for the function since it was moved from its original module. Changing to find_changed_node_keys

@GitHK GitHK mentioned this pull request May 24, 2022
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the feedback i provided you together with the PR and consider re-designing your changes in projects/projects_db.py. Try re-designing this logic on top (i.e. using) of the project's db-API.
The circular import is a warning that we misusing the hierarchies of our design.

@@ -724,6 +739,10 @@ def _update_workbench(
.returning(literal_column("*"))
)
project: RowProxy = await result.fetchone()

for frontend_node_update_task in frontend_nodes_update_tasks:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main cause of the circular import discussed in https://github.com/ITISFoundation/osparc-simcore/pull/3058/files#r880401810

IMO the db-API layer for projects should know nothing about front-end nodes etc . The idea here is create a repository pattern (SEE e.g. https://www.cosmicpython.com/book/chapter_02_repository.html) whose responsibility is accessing the database and hiding all sql-alchemy specialized logic.

The operations you are doing here IMO belong outside projects_db, where the concept of e.g. front-end-node is well defined but not here, where the only things that are defined are the columns of the table etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to move it outside in the end. You are correct, those functions should have not been inside the projects_db since it has a different objective.

@GitHK GitHK requested a review from pcrespov May 24, 2022 13:49
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

if "store" in data:
data["store"] = int(data["store"])

if current_dict.get("key") == "simcore/services/frontend/file-picker":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MINOR please consider comment above

Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very neat, this will close some gaps. Thanks a lot for doing it!

from models_library.projects import ProjectID
from .projects_utils import get_frontend_node_outputs_changes
from servicelib.utils import fire_and_forget_task
from . import projects_api
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually a little bit weird that a utils calls the project_api... maybe that is one part of your issue with circular dependencies

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I've removed the circular dependency, this ok now. This is no longer called from the projects_db but from the projects_handlers_crud.py. PC actually pointed out the issue.

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.4% 0.4% Duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:dynamic-sidecar dynamic-sidecar service a:webserver issue related to the webserver service
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants