Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grass.script: Allow init to modify only specified environment #3438

Merged
merged 21 commits into from
Jun 19, 2024

Conversation

wenzeslaus
Copy link
Member

@wenzeslaus wenzeslaus commented Feb 19, 2024

The grass.script.setup.init function modifies os.environ. While os.environ is fit for many, if not most, cases, usage in tests and parallel processing is limited and in all cases, os.environ stays partially modified even after session is finished (we don't destroy the runtime environment, i.e., variables such as GISBASE). With this change, init takes env parameter specifying the environment to modify and it modifies that environment. When no env is provided, it still modifies os.environ, so the default behavior is not changed.

This required only few changes to the initialization code, but more changes were needed for the cleanup code. A lot of tests can now take advantage of this functionality and this PR updates some of them. I plan to update all where it is applicable, but will leave as is some others, namely those which use ctypes (situation is more complex there) and grass.jupyter (env parameter is not implemented for most of them).

I'm not decided whether this should go to 8.4 or wait for 8.5. There is plenty of tests covering it, but the cleanup part which needed most changes does not have any coverage. However, if someone gives it a careful read, it might be possible to have this for 8.4. I don't think there is much concern in terms of API because env parameters is what we now have in many functions. The only uncertainty with that is whether it should create its own copy or modify the provided environment.

This includes changes to g.message interface (python/grass/script/core.py) which I intent to submit in a separate PR (#3439). Also, python/grass/script/tests/conftest.py needs a doc fix (done).

Special thanks goes to @echoix for the advice about pytest's monkeypatch which made the test code possible (and simpler in other places).

The function create_location now works without a full session. Internally, it sets up a runtime environment to execute tools (modules) without the connection to a location (project) which is sufficient for executing g.proj.

The new test covers the new functionality, but to actually test it, the test needs to run by itself because grass.script.setup.init (still) creates a global session.
…modified and when it is not, copy is up to the user which makes it clear when that actually happens which is longer but more readable, env is keyword-only, gisbase path handling is documented
This makes the message, verbose and other functions consistent with other wrappers around run_command family calls. While not needed for multiple-mapset situations and parallelization, it is necessary when calls with messages are used without global environment being set and only custom environment available in grass.script.setup.

The PR aims at providing the interface, not updating all use cases (except the anticipated changes in grass.script.setup).

This does not have any test since the API for that is missing. This will be tested indirectly in the future.
@wenzeslaus wenzeslaus added this to the 8.4.0 milestone Feb 19, 2024
@wenzeslaus wenzeslaus self-assigned this Feb 19, 2024
@github-actions github-actions bot added Python Related code is in Python libraries labels Feb 19, 2024
@echoix
Copy link
Member

echoix commented Feb 19, 2024

I also have concerns about passing around an object that has all env vars populated. I know in CI, we need to be careful to avoid leaking them to software that doesn't require them, and to not show them in logs, as we wouldn't want to see the tokens and secrets.
I don't know what are the practices in a Python software, but I would like to make sure that we don't accidentally dump them in a test.

Is the CI-security class that sent a PR yesterday would be interested in that question, or have an answer for this?

@wenzeslaus
Copy link
Member Author

I also have concerns about passing around an object that has all env vars populated.

One thing is that the object is readily available already in Python:

import os
print(os.environ)

Another thing is that we are passing these objects around already to achieve various tasks such as data management and parallelization:

env = create_environment(...)
run_command(...env=env)

I have a WIP PR #2923 which aims at hiding some of that, but the env would be simply hidden in another object changing tyhe above to:

with gs.setup.init(...) as session:
    tools = Tools(session=session)
    tools.r_surf_gauss(output="surface")

I know in CI, we need to be careful to avoid leaking them to software that doesn't require them, and to not show them in logs, as we wouldn't want to see the tokens and secrets. I don't know what are the practices in a Python software, but I would like to make sure that we don't accidentally dump them in a test.

I don't know anything else to prevent them besides don't print them. The original variables for the process are always available on Linux with proc/{pid}/environ.

Is the CI-security class that sent a PR yesterday would be interested in that question, or have an answer for this?

Diving into this might too much of a rabbit trail, but I'll discuss it with them.

@wenzeslaus wenzeslaus changed the title grass.script: Create new location without a session grass.script: Allow init to modify only specified environment Feb 20, 2024
@echoix
Copy link
Member

echoix commented Feb 20, 2024

I'm not decided whether this should go to 8.4 or wait for 8.5. There is plenty of tests covering it, but the cleanup part which needed most changes does not have any coverage. However, if someone gives it a careful read, it might be possible to have this for 8.4. I don't think there is much concern in terms of API because env parameters is what we now have in many functions. The only uncertainty with that is whether it should create its own copy or modify the provided environment.

From what I'm aware of, we don't have a feature-freeze policy here, and we didn't decide to start the RC process yet. If you're committed to including it and think you'd still have the bandwidth to tune up in time for release, I don't see why it couldn't be there.

Otherwise, if these changes are considered too breaking for this release (I personally think that full-code wide changes of location to project is way more impactful than this), then part of the usage of mocking env vars in tests could be integrated in some way. The only test fixture was to set no env vars though, not setting them/replacing them (that would make pytest tear them down afterwards)

@github-actions github-actions bot added vector Related to vector data processing temporal Related to temporal data processing module labels Mar 1, 2024
wenzeslaus added a commit that referenced this pull request Mar 3, 2024
This makes the _message_, _verbose_ and other functions consistent with other wrappers around _run_command_ family calls. While not needed for multiple-mapset situations and parallelization, it is necessary when calls with messages are used without global environment being set and only a custom (local) environment is available which is the case in grass.script.setup (with #3438).

The PR aims at providing the interface, not updating all use cases (it will be applied for grass.script.setup in #3438).

This does not have any test since the current API does not allow for writing these test. This will be tested indirectly in the future (e.g. by #3438).

Additionally, it fixes documentation for couple other function where the parameter documentation was misleading or wrong.
@wenzeslaus wenzeslaus marked this pull request as ready for review May 3, 2024 13:40
@wenzeslaus
Copy link
Member Author

This touches a lot of files, but that's just the update to use the better API in tests. The changes are in python/grass/script/setup.py. The direct tests for this are in python/grass/script/tests/grass_script_setup_test.py.

@wenzeslaus
Copy link
Member Author

wenzeslaus commented May 3, 2024

The CI fails with GISBASE being required in certain context. The test code accounts for that already (or at least it should). Locally, I'm not able to reproduce the issue (just as expected for how the test code is written).

@wenzeslaus
Copy link
Member Author

Merging either of #3683 and #3688 should indirectly solve the the failing test.

@echoix
Copy link
Member

echoix commented May 6, 2024

I think #3683 is the less controversial of the two, let's start with it.

@wenzeslaus
Copy link
Member Author

I also have concerns about passing around an object that has all env vars populated.

This is using a mechanism what the Python subprocess package is using. We use subprocesses and we need to customize the environment to pass different GRASS-related settings. That's how the whole grass.script works. While Bandit warns about usage of subprocess package (dangerous in certain context) and specific usages of the subprocess package, I'm not aware of a Bandit warning related to usage of the env parameter.

I know in CI, we need to be careful to avoid leaking them to software that doesn't require them,...

All the subprocesses in general get all the variables from parent process. Having env actually allows users to sanitize the environment and let the GRASS subprocesses know only about what they want them to know.

I would like to make sure that we don't accidentally dump them in a test.

Printing env parameter or printing all parameters of run_command functions (which would include env) would be probably a bad idea.

@echoix
Copy link
Member

echoix commented May 9, 2024

With #3689 merged, this PR should be rebased to remove the duplicated changes (that has conflicts)

python/grass/script/setup.py Outdated Show resolved Hide resolved
python/grass/script/tests/grass_script_setup_test.py Outdated Show resolved Hide resolved
Co-authored-by: Ondrej Pesek <pesej.ondrek@gmail.com>
@wenzeslaus wenzeslaus merged commit 9564852 into OSGeo:main Jun 19, 2024
24 checks passed
@wenzeslaus wenzeslaus deleted the env-for-init branch June 19, 2024 12:28
cyliang368 pushed a commit to cyliang368/grass that referenced this pull request Jun 22, 2024
…3438)

The _grass.script.setup.init_ function modifies os.environ. While os.environ is fit for many, if not most, cases, usage in tests and parallel processing is limited and in all cases, os.environ stays partially modified even after session is finished (we don't destroy the runtime environment, i.e., variables such as GISBASE). With this change, _init_ takes _env_ parameter specifying the environment to modify and it modifies that environment. When no _env_ is provided, it still modifies os.environ, so the default behavior is not changed.

This required only few changes to the initialization code, but more changes were needed for the cleanup code. A lot of tests can now take advantage of this functionality and this PR updates some of them. I plan to update all where it is applicable, but will leave as is some others, namely those which use ctypes (situation is more complex there) and _grass.jupyter_ (_env_ parameter is not implemented for most of them).

There is plenty of tests covering it, but the cleanup part which needed most changes does not have any coverage. I don't think there is much concern in terms of API because _env_ parameters is what we now have in many functions. The only uncertainty with that is whether it should create its own copy or modify the provided environment. I go with modify because that makes the copy explicit in the call which is more clear (caller or reader is sure a copy is created) and it is consistent with how os.environ is modified.
@a0x8o a0x8o mentioned this pull request Jun 27, 2024
kritibirda26 pushed a commit to kritibirda26/grass that referenced this pull request Jun 29, 2024
…3438)

The _grass.script.setup.init_ function modifies os.environ. While os.environ is fit for many, if not most, cases, usage in tests and parallel processing is limited and in all cases, os.environ stays partially modified even after session is finished (we don't destroy the runtime environment, i.e., variables such as GISBASE). With this change, _init_ takes _env_ parameter specifying the environment to modify and it modifies that environment. When no _env_ is provided, it still modifies os.environ, so the default behavior is not changed.

This required only few changes to the initialization code, but more changes were needed for the cleanup code. A lot of tests can now take advantage of this functionality and this PR updates some of them. I plan to update all where it is applicable, but will leave as is some others, namely those which use ctypes (situation is more complex there) and _grass.jupyter_ (_env_ parameter is not implemented for most of them).

There is plenty of tests covering it, but the cleanup part which needed most changes does not have any coverage. I don't think there is much concern in terms of API because _env_ parameters is what we now have in many functions. The only uncertainty with that is whether it should create its own copy or modify the provided environment. I go with modify because that makes the copy explicit in the call which is more clear (caller or reader is sure a copy is created) and it is consistent with how os.environ is modified.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libraries module Python Related code is in Python temporal Related to temporal data processing vector Related to vector data processing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants