Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Change to another layout [src] #720

Open
CarliJoy opened this issue Oct 18, 2024 · 11 comments
Open

Feature: Change to another layout [src] #720

CarliJoy opened this issue Oct 18, 2024 · 11 comments

Comments

@CarliJoy
Copy link
Contributor

Follow up on the discussion in #488.

The MR suggested to change to a flat layout to activate typing easily (see #305).

@totallyzen brought the argument:

Since I wrote the refactor to the current core and modules, I can chime in a bit.

There is a very simple reason why it is core and modules -> it is because all major testcontainers language flavors use these. I don't think I'd want to deviate from that as it allows the TC mainline documentation to converge and easily match support across languages (automation can be built for it, and issues tracked across language flavours)

I wish we had better automation and support from the TC staff (I'm only a volunteer myself like @alexanderankin) but this is what we've got for now.

However, after looking at the official .NET project, I find this argument weak. The .NET project doesn't follow this structure—there is no core or modules, only src and tests folders.

The current layout introduces several issues:

  1. It is complex and hard to understand, as only with some Poetry-specific setup are the different folders combined.
  2. It makes running poetry pytest to test the project nearly impossible due to "import file mismatches."
  3. It suggests the existence of subprojects like testcontainers-core (which actually exists on PyPi, created by @tillahoffmann), adding confusion.
  4. Properly type-checking the project is difficult because mypy expects the entire project (not only parts) to be typed when importing, and it checks for the existence of py.typed.

I propose adopting the src-layout, which has been embraced by many modern Python packages. This would:

  • Clarify the difference between tests for the final package and tests in "development mode."
  • Allow running pytest from the project root, reducing the need for a complex Makefile just to run tests. Tests for different modules can still be organized into subfolders in the tests directory.

Most importantly, this change would reduce complexity, enabling Python developers to contribute more easily with less cognitive overhead.

Lastly, it would mirror the structure of the official .NET project. :)

@CarliJoy
Copy link
Contributor Author

CarliJoy commented Oct 18, 2024

If the maintainers decide to switch to the src layout I could create a PR. But only if there is support for it and it will be merged in time.
Probably it would good to work off other PRs before creating it.

Note: I created that issue because I am currently working on the docker in docker feature. For this I wanted to simply execute a subset of all tests marked with inside_docker_check to be executed inside docker. But it turned out to be overly complex due to the structure of the project.

@alexanderankin
Copy link
Member

if you want to execute a subset of tests, you can use standard pytest features? can you explain which functions you wanted to target and i can try to craft a pytest command for you

@CarliJoy
Copy link
Contributor Author

Nope I had to change the file names of the tests.

Simply try to run poetry run pytest to see for yourself that collection does not work at the moment due to:

____________ ERROR collecting modules/mongodb/tests/test_mongodb.py ____________
import file mismatch:
imported module 'test_mongodb' has this __file__ attribute:
  /workspace/modules/cosmosdb/tests/test_mongodb.py
which is not the same as the test file we want to collect:
  /workspace/modules/mongodb/tests/test_mongodb.py
HINT: remove __pycache__ / .pyc files and/or use a unique basename for your test file modules
___________ ERROR collecting modules/registry/tests/test_registry.py ___________
import file mismatch:
imported module 'test_registry' has this __file__ attribute:
  /workspace/core/tests/test_registry.py
which is not the same as the test file we want to collect:
  /workspace/modules/registry/tests/test_registry.py
HINT: remove __pycache__ / .pyc files and/or use a unique basename for your test file modules

I renamed the test files and got it working but the fact that this can be an issues shows the problems of the current layout.

@alexanderankin
Copy link
Member

hm, I supposed in a perfect world we would be able to run ci on all projects sometimes but it was a conscious decision to cut down and isolate ci runs of the individual modules (isolation helps also with dependency management)

in .NET the tooling situation is much better than python so I would say this is why the situation differs.

I have reproduced the problem and am going to fix it, but I will point out that we have a makefile that species how we run tests, so you can see there how we run the tests and it is not all in a row.

@CarliJoy
Copy link
Contributor Author

I might not understand the actions and poetry correctly but I don't see how the "isolation" is related to structure of the project.

The suggested structure would be something like

src
├─ testcontainers
│ ├─ core
│ │ ├─ auth.py
│ │ ├─ config.py
│ │ ├─ …
│ ├─ compose
│ ├─ aws
│ ├─ …
tests
├─ conftest.py
├─ core
│ ├─ test_auth.py
│ ├─ test_conifg.py
├─ compose
│ ├─ test_compose.py
├─ aws
│ ├─ test_aws.py
├─ …

You only would need to adopt the paths inside the CI to keep the same functionality as now.
Running pytest tests/core instead of pytest core/tests.
Only difference is that in ci-community in modules=$(echo "${{ steps.changed-files.outputs.all_changed_files }}" | jq '.[] | split("/") | first' | jq -s -c '. | unique') we might need to account for removing core to not test it twice.

Also in the end we build a monolithic package so I don't see that there are "different projects".
As also noted above: If you need a make file to run tests this is clearly an indicator that the project setup is too complex.

➡️ So what is advantage of the current layout?

Sidenote

If you want to isolate the "core" from "community" modules it might be better to express this with a different package structure, something like

src
├─ testcontainers
│ ├─ auth.py
│ ├─ compose.py
│ ├─ config.py
├─ testcontainercontrib
│ ├─ aws
│ ├─ …

But as a monolithic package is released this make not a lot of sense.

@alexanderankin
Copy link
Member

the hope is that one day it is no longer a monolithic package i think - lowers the barrier to parallel community efforts that does not cost testcontainers org maintainer hours

@CarliJoy
Copy link
Contributor Author

FYI the commit you made failed in the pipeline.

I understand that goal but still even if the goal is not having a monolithic package changing the layout won't prevent it.
It even helps as indicated by the missing __init__.py in src/testcontainers that testcontainers is a PEP 420 namespace package.

@alexanderankin
Copy link
Member

yes, the pipeline failed because the cosmos emulator is not super reliable but i did not change any code, so if those tests passed before, its not something preventable by me today.

I'll leave this open for discussion in case I am wrong about this above, the situation may have changed, etc... I have yet to look at the dotnet implementation, etc.

@CarliJoy
Copy link
Contributor Author

Btw. make tests does not work for me:

Image

For some reason it detects README.md as package?

@CarliJoy
Copy link
Contributor Author

Another reason why it might be nice to change to the src layout: It is easier to create patches.
For our local conda build of testcontainers I just was about to create a patch that includes the changes of #714.

In a source layout I could simply do git diff main src > testcontainers.patch but here I need to be aware that only module files are part of the .tar.gz file, so I ended up in

git diff core/testcontainers > testcontainers.patch

If there would have been changes (not only to tests) also to modules I would have needed to include each module subfolder manually. Being very careful not include changes of tests or docs etc.

@kiview
Copy link
Member

kiview commented Oct 25, 2024

Hi @CarliJoy, thanks for initiating this discussion. I'd like to chime in from the perspective of the overarching Testcontainers Project leadership. However, I am not an experienced Python developer at all, so I can't really have a strong opinion on the details of the decision and what makes most sense for a Python project.

Please note that Testcontainers for .NET is currently in the same category of official/community as is Testcontainers for Python (even if the Testcontainers org README says otherwise), since it only has community maintainers (namely Andre Hofmeister) working on it. The difference here is, that it used to be commercially supported while we were still operating through AtomicJar. The situation is different after the Docker acquisition, which leaves us only with Testcontainers for Java and Testcontainers for Go maintainers on staff. Hence, it is not useful, to further use the .NET implementation as an example for a reference implementation, Java and Go are much better suited for this. And independent of this, there is always a historic context around past decisions and the status quo of the different languages, so a 1:1 comparison is not always helpful.

Ultimately, the goal for every Testcontainers project (which is based on our extensive experience developing and maintaining Testcontainers for Java) should be to have independently testable and releasable modules per technology, and in addition a minimal core module, that is free of these external technology dependencies. I have no opinion or guidance with regards to which folder structure is well suited to achieve this, since it depends on programming language and build system.

However, another critical aspect we want to ensure across languages, is minimizing CI load, since CI resources are shares across the GitHub organization. There can again different technical means and language dependent mechanisms to achieve this, but the desired goal is to avoid running CI for modules, that are not affected by changes.

Lastly, I leave the decision on how to proceed with this to the current maintainers and given the nature of a community supported project, I can entirely understand if there is a lack of bandwidth to currently support a bigger migration effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants