-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discuss switching to a monorepo #192
Comments
@WordPress/openverse-developers can we discuss this async this week? |
Some initial thoughts about the cons.
We got some feedback from a community contributor that the current multiple repository approach is also confusing. A single repository with a single README that links out to the relevant sub-projects would make it easier to direct devs to the project without having to know exactly what kind of stuff they're looking for (JavaScript? Python but not Airflow? Python but Data stuff? etc).
The side of dependencies for any of the repositories vastly outweighs the size of the codebases themselves, doesn't it? For Airflow and API the various docker image dependencies are huge (several hundred MBs). It does significantly increase it from a statistical perspective though and probably from a practical perspective for some situations, after all we'd be basically making a single repository out of three small-to-medium sized projects and one very small one (
and
These don't seem so much as "cons" to me as roadblocks for implementation. Once the problem is solved then it's solved forever (or at least until we need to make other large code infrastructure changes). It's not like this is a long-term cost that we will continue to pay once monorepo-ization is finished. The pros...
This would be amazing, especially sharing it with production. I think @rbadillap's ongoing work in the API to deploy it with docker will reveal whether this is a reasonable possibility. I'm in favor of this change, but I don't want to trivialize the repository size considerations. There are things we can do to make the repository size more efficient and we can trim certain unused things from the history. |
Completely agree with everything you've said @sarayourfriend. I just talked to @dhruvkb about this a minute ago and we raised some similar points. Ongoing efforts to clean up dependencies will probably better serve users from the perspective of download sizes, and currently dealing with 5+ discrete Openverse repositories is potentially much more confusing than one larger repo. We also discussed the idea of The last thing we mentioned—the API is essentially currently a monorepo, and eliminating that prior to making Openverse a monorepo should make things even simpler. |
I'm all for this, and I love the points y'all have brought up. I'll also mention that another project I've worked on previously made a similar move and also had an RFC for their proposal: amundsen-io/rfcs#31 |
I have some questions about this movement, the main thing being that is not clear to me is what is really the need for changing to a monorepo? Pros
How would that look? I guess this is related to the next point and a bunch of new
Does this make deployments easier or more complicated? I'm not sure so I'd love the opinion of @rbadillap here.
IMO the linking feature of GitHub (<organization_or_user>/#<id_of_issue_or_pr>) works pretty well for this.
Fair! But I'm not sure if is worth the move.
What is this needed for? Do we want to check full cycles in every PR? Sounds like a heavy large process, prone to fail often when dividing the system into parts is much easier to debug. ConsFrom @sarayourfriend:
Is it an issue with the multiple repositories per se or more with lacking documentation? Because to me looks like we can already do that in this repo linking the rest. Honestly, I don't see the advantage in merging into a monorepo instead of adding explanations in docs (that will be required anyway). From @zackkrida:
What do you mean here? As we are ditching the analytics server until finding a better solution only the ingestion-server will remain as the extra service, and then that makes me think that probably makes more sense to try doing a monorepo between the API and the Catalog. That might be a good proof of concept for a final big Openverse monorepo. |
I love all the pros of a monorepo, and would be a 100% on board if there was a consensus to move ahead with it. I agree with @krysal about merging the two Pythonic repositories first and once the issues are ironed out, merging frontend with it too, instead of going all in. But for the sake of a balanced argument, here are some of my concerns against the move (in addition to @krysal's points above). NotificationsHere's a snippet of what my notifications looks like right now. If Openverse were a monorepo, the entire section would be a huge mix of issues from every part of the stack. DocumentationWe're currently not in a very developed/stable position with documentation. A monorepo can definitely exacerbate the problem to a whole new level. A combined documentation would be difficult to organise for the documenter and hard to navigate for the reader. SeparationWe move very fast. Given the rate at which we open, close and comment on issues and PRs, the issue and PR tabs will be always be in a stormy state. I'm pretty sure I'll not be able to find any issue I'm looking for without searching and filtering by labels. Not glanceable as it mostly is right now. Faux prosAlso I don't agree with some of the pros:
Do we really want this? Wouldn't it be better to focus on a part of it without running all the unrelated containers? I can see some utility here but most of the time, I'd be running the API on sample data or the frontend outside of Docker. Running all these containers all the time would be a power/RAM hog while providing limited utility.
Such a pipeline, though useful, could take very very long and be very very fragile. To speed it up we might skip steps based on the location of the changes but then that'll be very complicated workflow. Fun? yes. Challenge? Also yes.
Not really a pro, the location of the board isn't an issue because bookmarks exist 😄 Withdrawn in light of #199 (review).
We can do a better job of documenting the different repos in the |
No actually, the CODEOWNERS file supports per-directory owners, which would work great! https://satellytes.com/blog/monorepo-codeowner-github-enterprise/ |
Closing in favor of #205. It's clear we have enough interest to move forward on this RFC; and follow up on the points raised here can be addressed in the RFC. |
* Move storage module higher up Signed-off-by: Olga Bulat <obulat@gmail.com> * Deduplicate MediaStore tests Signed-off-by: Olga Bulat <obulat@gmail.com> * Use constants Signed-off-by: Olga Bulat <obulat@gmail.com> * Parametrize image tests Co-authored-by: Madison Swain-Bowden <bowdenm@spu.edu> * Pluralize table name lookup dictionary Signed-off-by: Olga Bulat <obulat@gmail.com> * Fix lint error Signed-off-by: Olga Bulat <obulat@gmail.com> Co-authored-by: Madison Swain-Bowden <bowdenm@spu.edu>
We likely need an RFC to evaluate the pros and cons of monorepo support. To help ease the creation of such an RFC, I think it would be first wise to discuss the pros and cons of moving to one. I'll kick things off with some quick ideas:
Pros
Cons
The text was updated successfully, but these errors were encountered: