Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate monorepos (e.g. OS and OSD plugins) to different repositories #2188

Closed
3 of 5 tasks
peterzhuamazon opened this issue Jun 10, 2022 · 40 comments
Closed
3 of 5 tasks
Labels
campaign Parent issues of OpenSearch release campaigns. v2.5.0 'Issues and PRs related to version v2.5.0'

Comments

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jun 10, 2022

As of now, plugins such as SQL, Reports, Notifications are using the same git repo for their OS and OSD plugin.
This cause a mass discrepancies in tags commit id differences.

In 2.0.0, we build OS and OSD artifact at the same time.
During sanity testing, OSD found some bugs in reports so we rebuild OSD.
This means, reports BE in OS and reports FE in OSD are using different commit id.
However, since they are both in the same repo, you can only tag 2.0.0 with one commit id.
We are forced to use the newer commit id in the tag.
This means for anyone to checkout the 2.0.0 tag in sql repo, will have a different commit id compares to the manifest.yml file bundled in OS tarball.

We need to start the discussion on the potential separation of these plugins into different repositories.

@bbarani
Copy link
Member

bbarani commented Jun 15, 2022

@anirudha @praveensameneni Please make a note of this campaign as it affects your repositories.

CC: @CEHENKLE @xinlamzn

@anirudha
Copy link
Contributor

This won't be resourced for 2.1 . lets review more details in depth

@anirudha
Copy link
Contributor

@peterzhuamazon "This cause a mass dependencies in tags commit id differences."
-> can you better explain the problem you are trying to solve

@anirudha
Copy link
Contributor

anirudha commented Jun 21, 2022

If the requirement is that a tag should not change after it associated with a commit-id , i.e a tag is supposed to be treated as an immutable identifier for the code. Then the build system must tag only after all release activities are completed.

Why is the build system tagging repos with a release-tag before we finalize the release ? if there is a bug after this, we can cut a patch version, its that right ?

Ideally all release activities, I would do on a branch and then tag is the final stamp of immutability. After a tag is cut, every bug should be a new version-patch.

moreover, you may also use tag formats and the ability run actions based on tag format to come up with a solution that best solves this problem for the build system.
GitHub Actions supports executing different actions based on the format of a tag.

@anirudha
Copy link
Contributor

anirudha commented Jun 21, 2022

As a developer, i would to like have a single place to build, test unit, ITs and validate all components of the ecosystem. A UI without a backend is meaningless as a release artifact

As a user, and a developer, I would like to have a single place to develop and file issues , PRs and pre-commit checks.

As a maintainer, i would like to easily maintain projects in 1 repo rather an 6 repos for SQL, 2 for Observability and 2 per plugin.

As a maintainer, user, developer simplicity is the key in long term sustainability and maintainability of the project.

this problem can be technically solved by the build system, mono-repo are a common place in the dev. eco system

@qreshi
Copy link
Contributor

qreshi commented Jun 21, 2022

Are we trending towards making this a soft rule that having these in a single repo can't happen?

I'm not necessarily heavily in favor of one repo format over another but there are some tradeoffs between the two, some of which Ani mentioned above.

I think what's important is that our long term goal for OpenSearch from a maintainer perspective (as I originally remembered it, please correct me if this changed) is not being just a single distribution where plugin owners can follow what they deem to be the most effective for their repo and will even manage the cadence for their releases. However, standardizing things like repo contents and structuring for infra is seemingly going in the opposite direction. I understand these decisions are so we can move and release faster in the short term while we're still one big distribution but we might want to think about what this looks like when we do split. We'd want to keep our mechanisms flexible to ease us into that eventuality.

On the other hand, if that isn't what we want, we can lean fully into all plugins being the same and do things like a mono-repo and lose the overhead of the build scripts and version bump/gradle changes, etc. that is scaling with each new repo being added. The long term plan should be clear for repo owners though so they have the context to recommend what they think is best from an implementation perspective.

@bbarani bbarani removed the bug Something isn't working label Jun 21, 2022
@bbarani
Copy link
Member

bbarani commented Jun 21, 2022

If the requirement is that a tag should not change after it associated with a commit-id , i.e a tag is supposed to be treated as an immutable identifier for the code. Then the build system must tag only after all release activities are completed.

Why is the build system tagging repos with a release-tag before we finalize the release ? if there is a bug after this, we can cut a patch version, its that right ?

Ideally all release activities, I would do on a branch and then tag is the final stamp of immutability. After a tag is cut, every bug should be a new version-patch.

moreover, you may also use tag formats and the ability run actions based on tag format to come up with a solution that best solves this problem for the build system. GitHub Actions supports executing different actions based on the format of a tag.

@anirudha I am not sure if I understand your question. The real issue is the fact that the tags are cut based on the latest commit corresponding to a branch used by a build and it makes complicated to differentiate between OSD and OS commit id since the code is present in the same repo. We cut the tags based on the latest commit id on that repo used on the build but its not the right thing to do since that commit id might be corresponding to Dashboards OR OpenSearch changes rendering the tag almost useless in this scenario.

CC: @dblock @peterzhuamazon @qreshi

@dblock
Copy link
Member

dblock commented Jun 21, 2022

The tl;dr is that we release 2 products: OpenSearch and OpenSearch Dashboards from the same repo, not at the same time. For example, a patch may be released for OpenSearch 2.0.1 at time X, then another patch will be released for OpenSearch Dashboards 2.0.1 at time X+1. What should the 2.0.1 tag be in this case for dashboards-reporting?

@CEHENKLE
Copy link
Member

My goal is to move us to a point where we can release all of the different components of the OpenSearch project separately -- Not just Dashboards and OpenSearch separately, but each plugin individually as well. We have a lot of water to get under a lot of bridges before that can happen (primarily around the dependency management described here opensearch-project/OpenSearch#2447) but I see the work described in this ticket as directional. And as dB points out, it's already solving a pain point we have today.

@anirudha, @praveensameneni When can you commit to doing this work?

@joshuali925
Copy link
Member

Is this to fix a technical limitation or to make it better conceptually?

However, since they are both in the same repo, you can only tag 2.0.0 with one commit id.
We are forced to use the newer commit id in the tag.

@peterzhuamazon Not sure if I understood but this sounds like a technical limitation on infra side, that if OS and OSD plugins are in the same repo, then infra can only use one commit id for both plugins. From what I understand the manifests for OS and OSD are separate and they takes refs (i.e. branches, tags, commits), I don't see where the limitation is coming from?
On our side we don't tag 2.0.0 in plugins. We tag 2.0.0.0 and patch fixes bumps it to 2.0.0.1 (not the process currently but should be the correct way).


The tl;dr is that we release 2 products: OpenSearch and OpenSearch Dashboards from the same repo, not at the same time. For example, a patch may be released for OpenSearch 2.0.1 at time X, then another patch will be released for OpenSearch Dashboards 2.0.1 at time X+1. What should the 2.0.1 tag be in this case for dashboards-reporting?

@dblock I understood this. From how I see it we are releasing reporting frontend and backend together:

  1. OpenSearch 2.0.1 is released with reporting backend 2.0.1.0 and tag 2.0.1.0
  2. Dashboards 2.0.1 needs to be released, we bump reporting frontend to 2.0.1.0 and bump tag to 2.0.1.1, because this is the second 2.0.1.x release of reporting plugins

Change between tags should be covered by release notes. If possible we can make tags more specific (e.g. 2.0.1.0-backend).

@qreshi
Copy link
Contributor

qreshi commented Jun 21, 2022

@CEHENKLE @dblock I understand the request but tags are a repo concept and it seems we're scoping them to be 1 to 1 for each artifact. Does that mean we're asking all component owners that they should have a repo per artifact regardless of scope? For example Notifications backend is technically one holistic component but has the core and general plugin producing two artifacts.

My short term reservations against increasing the number of repos is we currently have infra overhead that comes with them. Such as the bumping of versions, the way we're handling build scripts, all of the GitHub Actions that are not defined in a single place yet, etc. Just the alpha -> rc1 bump was quite a bit of work. Since plugins have to match up to the patch version anyway, I recall some suggestions being a gradle plugin that plugins could consume to get the version/qualifier info for the distributions (based on the version of OS/OSD being used) with an option to override. Is this being prioritized at the same level as asking component owners to take on more repos?

@praveensameneni
Copy link
Member

praveensameneni commented Jun 21, 2022

My goal is to move us to a point where we can release all of the different components of the OpenSearch project separately -

Furthermore reason for plugins to have a single repo so they can be autonomous in their releases.

We started with two repos and as we grew and learnt a bit or two along the way and the team came up with one repo (combining backend and front end, making it easier to maintain and deploy faster.

@dblock
Copy link
Member

dblock commented Jun 22, 2022

@dblock I understood this. From how I see it we are releasing reporting frontend and backend together:

  1. OpenSearch 2.0.1 is released with reporting backend 2.0.1.0 and tag 2.0.1.0
  2. Dashboards 2.0.1 needs to be released, we bump reporting frontend to 2.0.1.0 and bump tag to 2.0.1.1, because this is the second 2.0.1.x release of reporting plugins

So in (2) you re-release a second version 2.0.1.0, but only for some of the code, and tag it as 2.0.1.1? What are you going to do when it's OpenSearch 3.0 and OpenSearch Dashboards 8.0? How is anyone supposed to make sense of this?

@dblock
Copy link
Member

dblock commented Jun 22, 2022

We are forced to use the newer commit id in the tag.

@peterzhuamazon Not sure if I understood but this sounds like a technical limitation on infra side, that if OS and OSD plugins are in the same repo, then infra can only use one commit id for both plugins. From what I understand the manifests for OS and OSD are separate and they takes refs (i.e. branches, tags, commits), I don't see where the limitation is coming from?

There's no "limitation". Manifests support refs and many other things. However, infra automation that creates tags post release (so you don't have to) is currently not able to guess when it releases OpenSearch Dashboards 2.0 it should be tagging your plugin as 2.0.0.1 and not 2.0.0.0 like it does for the other 12 plugins, because you're a special case.

I think there are 3 choices:

  1. Split the 4 repos as proposed in this issue.
  2. Adjust infra automation to guess what tag to create when there's already an existing tag. Make sure to use that tag in the manifest. Document for anyone who looks at the 4 repos that 2.0.0.0 is OpenSearch 2.0, but 2.0.0.1 is Dashboards 2.0.0.0.
  3. Continue tagging and updating manifests post release manually. Document as in (2).

What do you think is best @praveensameneni @qreshi @joshuali925 ?

@dblock
Copy link
Member

dblock commented Jun 22, 2022

@qreshi On the 1 repo = 1 component, yes that is generally my personal preference. Repos are cheap. Having completely different CIs, languages, tools in the same repo burdens the developers in having to often install unnecessary pre-requisites, having to understand a much larger codebase, etc.

@anirudha
Copy link
Contributor

My goal is to move us to a point where we can release all of the different components of the OpenSearch project separately -

we can do this without forcing team to maintain multiple repos. The current proposed solution creates a lot of issues for us as described here.
#2188 (comment)

@CEHENKLE
Copy link
Member

@anirudha @praveensameneni @xinlamzn Can you do a deeper dive on tagging options w/pros and cons and report back here?

@peterzhuamazon
Copy link
Member Author

peterzhuamazon commented Jul 19, 2022

Hi All,

20220719 notes:

  1. We are thinking about tagging OS and OSD components with separate tags.
  2. Starting on v3.x we will tag all backend component with <version>-opensearch and frontend component with <version>-opensearch-dashboards. This includes both the split repos and the combined repos, core and components.
  3. As for v1.x and v2.x we will keep using 1 tag <version> for both components. We will work with component owners to make sure their changes are delivered in time to avoid having commit ID discrepancies.

Thanks.

@dblock dblock added v2.5.0 'Issues and PRs related to version v2.5.0' and removed v2.1.0 v3.0.0 labels Sep 27, 2022
@MaxKsyunz
Copy link
Contributor

@dblock Do we have a decision for this issue? I would like to understand the next steps so we can be better prepared from build tooling perspective for 3.x releases.

Yes. We decided that we will split repos for frontend and backend plugins. The current plugin owners with joint repos (@anirudha and @praveensameneni) will start the work to split repos after OpenSearch 2.4 release, and target completion by 2.5. The plugin teams will figure out a way to run integration tests at PR level for both frontend and backend components.

Where will JDBC/ODBC drivers, opensearchsql-cli, and Tableau/PowerBI connectors go?

@dblock
Copy link
Member

dblock commented Oct 24, 2022

@MaxKsyunz The intention of this issue was to split front-end and back-end plugins, for those other things I'd defer to the team that understands the code dependencies better. Maybe open another issue on those? How often do they release and what are their dependencies?

@dblock
Copy link
Member

dblock commented Nov 28, 2022

I've added #2188 for dashboards-maps.

@prudhvigodithi
Copy link
Member

One more change required with this separation is with gradle updateVersion task (Task to auto increment to the next development iteration) is to remove the reference of package.json files to update the version number, since with the separation, the backend plugins would not have a json files to increment hence should be excluded from the updateVersion task
Example: considering for reporting now after separation of the dashboard plugin following lines needs to be excluded, the lines that increment the version for dashboard plugin.

@dblock dblock changed the title Separate OS and OSD plugins to different repositories Separate monorepos (e.g. OS and OSD plugins) to different repositories Jan 11, 2023
@penghuo
Copy link
Contributor

penghuo commented Jan 13, 2023

Add existing SQL plugin issues, all of these issues can be categorized as release process issues

  1. [BUG] SQL Plugin main branch BWC test faiiled OpenSearch#5815
  2. [AUTO] Increment version to 2.6.0-SNAPSHOT sql#1248
  3. [AUTO] Increment version to 2.4.2-SNAPSHOT sql#1175

As I commented, Integrate OpenSearch plugin into OpenSearch repo could solve all these problem automatically.

The pros are:

  1. Plugin owner do not need to bump OpenSearch version. Plugin could directly depend on version defined in same repo.
  2. Plugin owner do not need to change OpenSearch build mainfest. Release team change mainfest one time for each release.
  3. Plugin owner do not need to cut new branch. Release team cut branch one time for each release.
  4. Plugin BWC test on main could be executed for PR directly on 2.x branch without waiting for snapshot build. solved [BUG] SQL Plugin main branch BWC test faiiled OpenSearch#5815
  5. Plugin could directly depend on other plugin at project level, instead of maven distribution. solve [AUTO] Increment version to 2.6.0-SNAPSHOT sql#1248 and [AUTO] Increment version to 2.4.2-SNAPSHOT sql#1175

The cons are:
Not very clear now, As I explore Apache Spark, Apache Flink and Trino. All these projects use monorepos approach.

@bbarani
Copy link
Member

bbarani commented Jan 30, 2023

Closing this issue as we have completed splitting OpenSearch and OpenSearch dashboards monolithic repos in to individual repos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
campaign Parent issues of OpenSearch release campaigns. v2.5.0 'Issues and PRs related to version v2.5.0'
Projects
None yet
Development

No branches or pull requests