-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with the size of the API surface of the models package #129
Comments
Hi @breml This size and tooling performance aspect is something we're aware of and working to improve. I'll share a lengthy reply about the context and our efforts here. ContextThis SDK is automatically generated from metadata using kiota. The original metadata is a CSDL populated by all the service teams that live under Microsoft Graph (one for v1, one for beta). The CSDL eventually gets converted to an OpenAPI description which is quite large (1k+ endpoints, 1.5k models....). Because of the size of the API, it would not be feasible to handcraft the SDK for the full API surface. We've thought for while about "slicing" the SDK in multiple submodules (one for files, one for emails etc...) to make it easier for people to get only the thing they care about. In fact this is what we've done with PowerShell. But due to the nature of a "graph" (all the models are somewhat related to one another) and the diversity of applications that get built, the slicing is never "right" for anyone (either too large, or too small, or leads to duplication of models...) For those reasons we've decided to ship "full SDKs" as one of our offerings. It might not be ideal for everybody, but we feel like there's a population of Go developers that "just want and SDK to build applications" (as opposed to the second option I'll describe below). Go shortcomingsThrough my explorations of Go, I've noticed a few shortcomings. At this point I'm not sure whether this is because our project/packages are not setup properly or because of limitations for Go and large projects:
To me it feels like the cost of building a large project with lots of sub-packages should only be paid "once per session" unless dependencies are updated, cache is evicted or code changes. Please feel free to outline any optimizations in our project configuration/structure that could help with that front. And if there are ways to engage with the Go community (people building the go compiler etc...) to provide that kind of feedback, I'll be more than happy to spend time doing so. I know our project is kind of an odd duck compared to most go packages out there due to its size. Improvements to comeWe already have two improvements queued up to reduce the amount of code we generate at the first place:
Please go ahead and review those, and if you find other patterns in the generated code that are redundant/could be replaced by something smaller, don't hesitate to mention it on those issues. Right size sdksLastly, we recognize that having a full SDK with all the endpoints is not going to be suitable for everybody for a variety of reasons. We're working to enable a new "right-sized self-service SDK experience" where any API user could generate an SDK that looks and feels just like this one, but with only the endpoints/models they care about for their application instead of the full API surface.
At this point we're working to document all those steps, and streamline them (maybe compressing steps 4-5). The great thing about this approach is that steps 5 through 7 can work with any OpenAPI described API you want to call, not just Microsoft Graph. I hope this lengthy post clarifies where we're heading and getting some more feedback from the Go community on all those aspects would really help! |
Hi @baywet Thanks for your detailed feedback. I can not go into all the details you mentioned due to lack of time, but I still want to share some of my thoughts with you:
Finally, I would be happy to help you with the improvement of the Go SDK, but this is not something, that I can do in my spare time nor as part of my current day job. |
Interesting, I might reach out to the team to understand what's going on, the documentation for the root package seems to be working properly https://pkg.go.dev/github.com/microsoftgraph/msgraph-sdk-go#GraphServiceClient.Admin
Thanks for sharing this! I want us to do our homework (improvements I outlined) before we share our issues with that audience.
Understood, we've noticed the behaviour in both situations (GH actions and local dev loop).
Shouldn't the linter only look at your code and not dependent packages? Additionally, shouldn't the compiler treeshake/trim unused types?
Noted, reducing the amount of code we're generating shouldn't hurt anyway.
How do Go devs typically handle GraphQL APIs then?
Do you have specifics about patterns that don't feel idiomatic and what API we could offer instead?
To be specific I'm not asking for you to contribute (PRs), but to keep providing feedback like you have so far. This is instrumental to us correcting course before the general availability of the language, thanks a lot for the feedback you've already provided! |
My assumption is, that there is some sort of hard limit on https://pkg.go.dev, that prevents the documentation from rendering. Some quick stats:
A quick search showed me https://github.com/golang/pkgsite/blob/master/internal/godoc/dochtml/dochtml.go#L92-L95, which defines a limit of 10 MB for the resulting documentation. Locally it is possible to render the documentation (with the official
As you wish. In order to just get some pointers about where to look, I would still recommend to reach out to the
I am not entirely sure about the internal working of these tools, but almost all of them depend on https://pkg.go.dev/golang.org/x/tools/go/analysis for the analysis of the Go code. I do not know, if and to what extend treeshake or trim is used in this process and if these techniques could speed up the process. But because Go is a statically typed language, I expect that during the process of linting and compiling, all the types must be known (not only of the code that is linted, but also from all its dependencies).
I am not a frequent user of GraphQL API. I am aware of the fact, that the dynamic nature of GraphQL can make it tricky with Go. One solution, that I think is worth looking at is: https://github.com/Khan/genqlient That being said, the Microsoft Graph, at least in my understanding, is not a GraphQL API but a highly inter-weaved REST API.
This would take some more time, maybe I can followup on this during the week.
|
just tried a
30 minutes on a 8 core Intel Core i9 with 16 GB RAM 😳 docker multiarch builds (arm, arm64, amd64) takes around 2:30:00 on GitHub and is nearly reaching the timeout limit. |
With |
Again, at this point we're really focusing on API coverage and reliability more than build time performance due to a shortage of staff. If you're watching the repo, you'll see that we get more issues about missing API segments or serialization than anything else for the time being. That being said, I've exposed our short-terms plans on this thread (use of generics to reduce the amount of code we generate) and long-terms plans (ability to generate your own right sized, subset SDK), both of which are still on the roadmap before GA. Now, I'm trying to understand why you're running go get -u in your CI? And why do you have to run it for every architecture instead of once for all the architectures, and then a go build? FYI the build of the SDK takes about ~7 minutes with the GitHub actions agent For more insights on our plans, you can watch the brand new Microsoft Build recordings for free Deep dive into Microsoft Graph SDKs |
I haven't given this a deeper look, so these are drive-by comments, but I wanted to provide some general clarifications.
These are largely not true.
At a minimum,
I'll assume you mean All that is to say that if you see seemingly spurious rebuilds, it is worth investigating why these rebuilds are happening; they're not a fundamental limitation of the Go build system. |
Thanks for chiming in @dominikh ! As mentioned previously I'd like for us to investigate the performance improvements first (amount of code we generate, generics...) before engaging with the Go "core" community. That being said, if you notice something that's obviously wrong in our repos, don't hesitate to point it out. I'm often working with "replace" (local links to the dependency tree) and it seems that any change in the package will "mark it as stale" and rebuild the whole tree of dependent packages, which might have impacted my perception. (other ecosystems have a more fine grained tracking of those things, and are able to track these things at a class/type level). Would this make sense? Also |
I also can't reproduce the extreme build times reported in this issue.
I've tested a fresh build of this repository on a 16 core 3950X and got this:
And while 43.6 seconds isn't great, it's a far cry from 30 minutes, and especially from 4 hours. Even if you build for multiple GOOS x GOARCH combinations, it would take way too many combinations to reach these times. |
Yes, Go's caching is on a per-package level, not per-class/type. If you change anything about a package, then that invalidates the whole package, and all packages that transitively depend on it.
The compiler in general is modular, compiling one package at a time, and using cached data to reuse the work done for dependencies. There is no practical difference between building each package and building all packages. |
Thanks for the additional information. In a scenario where we're building a console app which depends on this library (no replace, most basic case), should the cost of building this package one be paid once assuming dependencies are not changing? |
@baywet I can confirm, that I observed the same as you are describing and this worries me as well. I had a small conversation on Gopher Slack about this issue (this is why @dominikh chimed in). In this conversation @josharian suggested to file an issue with Go to inspect this behavior in more depth, citation:
Therefore, as mentioned before, I suggest to get in touch with the Go maintainers by filling an issue to
|
Thanks for the suggestion. From a Go community etiquette perspective, do you think it'd be ok to file that issue BEFORE we investigate the use of generics (to reduce the amount of generate code) on our end? |
@baywet Yes, I am pretty sure, that the Go team would like to understand why we see these kind of build times regardless of the fact, if the code uses generics or not. As you might know, generics only landed a few months ago and they are not expected to be completely optimized in regards to performance. In fact, if this issue gets investigated by the Go team, I would even expect them to be interested if a refactoring towards generics improves the situation. |
Yes! Feel free to tell them that I sent you. :) An excellent issue would go something like:
|
That's roughly what I would expect, yes. There is of course still a cost to using such a large package, even if it's cached. The cached export data needs to be loaded and used, the final binary needs to be linked, etc. But one would expect these costs to be much smaller than the cost of compiling the package the first time. So far I've only tested compiling the package itself, not compiling a dependent. It would be interesting to compare the two. |
If the code is minimal, loading import data could in theory end up being just as costly as doing the initial compilation. Particularly if you have to load the import data N times, and compile only once. The compiler importer has gone through roughly three generations: (1) Simple, text-based, and expensive, (2) cheaper binary but still non-lazy, and (3) binary and lazy. With the lazy importer, in theory, the compilation cost of importing a very large package ought to be proportional to the symbols you use from that package. But maybe that's not the case. (Maybe the compiler is reading the entire import data instead of mmap'ing it? Maybe the index of symbols itself is large enough to impact compilation time?) That's the sort of interesting, fixable thing one can learn from cases like this, and why it's worth filing an issue, if you can reliably reproduce and nail down what is slow (without worrying about exactly why). |
Thanks everyone! I posted an issue with a clear repro don't hesitate to jump on there and ask or provide additional information if you think I missed something. |
I left this issue a long time ago, but for some reason forgot to unsubscribe. I feel the pain most users have here so I wanted to share an alternative https://github.com/friedrichg/go-msgraph-example |
Hi everyone,
With those changes in place, we're seeing an additional 20% build time performance improvements. This is the last set of changes we're making on that front since we've exhausted all avenues to improve build times which do not lead to a feature trade-off and since build performance is now acceptable in most development environments. If build time performance or asset size are still challenging for your scenario, you might want to consider kiota the generator that powers this SDK. You'll be able to generate a client with the same development experience, but with only the endpoints your application needs. Since the early days, we're embedded the search experience for API descriptions and the filtering capabilities directly within kiota thanks to your (and others') feedback. In terms of future plans, we're still working to fix our metadata to remove references between models when it actually doesn't apply. We're also working to embed Kiota (or the Microsoft Graph self-serve story) in GUI experiences (nothing to disclose at this time). I'll leave this issue open for another couple of days, and then close it depending on the replies. To finish: The whole team would like to thank all of you on this thread for your feedback and helping shape the Go SDK! You truly had an impact on design decisions, and the final experience. |
We've had a layer over the msgraph-sdk-go, which used some code that was copied directly from the library itself. As per microsoftgraph/msgraph-sdk-go#129 (comment), it seems that this library is now improved enough that it doesn't bloat up our binary size and builds fast This PR removes all the adapter code and bumps the msgraph-sdk to the latest production (0.56) version The rest is mostly changing struct names where needed
This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment. |
I am no longer using this package, so I am not in a good position to provide meaningful feedback on this, sorry. |
@baywet Thanks for you summary. I understand optimizing this issue is no simple feat. At the company I work at we will have to heavily rely on Graph API and therefore apprechiate your efforts.
If in any case build times will be too high we will consider using Kiota too. Maybe, this fact is important enough to state in the projects readme or documentation... |
Thanks for the suggestion. We have some work to simplify the self-serve story before it's ready to be broadly shared in the docs, but yes, it's part of the plan. |
hi @baywet, first, thanks for the work y'all have put into investigating this issue. Unfortunately, the package is still not really in a usable place for us. From my previous comment, the initial build is down from 7m to 3m on my local machine (better! but still pretty bad for a single dependency). Rebuilds (measuring compiler linking time only) actually looks like it's gone up from 8-12s to 18-20s (baseline 4-6 seconds without this package). I'm using the v0.50.0 beta release. |
Thanks for the numbers. Could you try with 56 or 57, we've implemented additional improvements as I was mentioning earlier |
Sorry 55 in case of beta |
Sorry, I misspoke in my previous comment. These results are with the v0.55.0 beta release. Here's my upgrade diff: - github.com/microsoft/kiota-abstractions-go v0.15.1
- github.com/microsoft/kiota-authentication-azure-go v0.5.0
- github.com/microsoft/kiota-serialization-json-go v0.7.2
- github.com/microsoftgraph/msgraph-beta-sdk-go v0.50.0
- github.com/microsoftgraph/msgraph-sdk-go-core v0.31.0
+ github.com/microsoft/kiota-abstractions-go v0.17.2
+ github.com/microsoft/kiota-authentication-azure-go v0.6.0
+ github.com/microsoft/kiota-serialization-json-go v0.8.2
+ github.com/microsoftgraph/msgraph-beta-sdk-go v0.55.0
+ github.com/microsoftgraph/msgraph-sdk-go-core v0.34.1 |
Hi, We've noticed this is still an issue today. Specifically, the We'd love if there was a supported way to generate a subset of APIs, or to generate this API in such a way that the compiler could more effectively remove the pieces we aren't using (e.g., by more effectively modularizing the API). |
The way, this Go module and especially the models package are constructed, leads to severe issues with the go tooling (and linting).
We have a simple Go application where we added the functionality to send email notifications though the Microsoft Graph API using Mail.Send. So we added the necessary packages, namely:
for this purpose.
The final code did what was expected (sending the mail), but it also made the duration for
go test
to jump from 7s to 8m 50s and the linting withgolangci-lint
to jump from 36s to 7m 13s in our Github Action pipeline.We then changed the Graph API request to a "plain HTTP Post request" using the
net/http
package from the standard library (while still usinggit.luolix.top/Azure/azure-sdk-for-go/sdk/azidentity
andgit.luolix.top/microsoft/kiota-authentication-azure-go
for the authentication) and we are back at the normal times for tests and linting.Additional pointers for the excessive size of the
github.com/microsoftgraph/msgraph-sdk-go/models
package are the official documentation refuses to display the API because it is too large and also Github only shows the first 1000 files when listing the content of said package folder.So in my opinion, the API surface of the package
github.com/microsoftgraph/msgraph-sdk-go/models
is way to broad and this should be refactored massively.The text was updated successfully, but these errors were encountered: