Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with the size of the API surface of the models package #129

Closed
breml opened this issue Apr 13, 2022 · 103 comments
Closed

Issue with the size of the API surface of the models package #129

breml opened this issue Apr 13, 2022 · 103 comments
Assignees
Labels
question Further information is requested

Comments

@breml
Copy link

breml commented Apr 13, 2022

The way, this Go module and especially the models package are constructed, leads to severe issues with the go tooling (and linting).

We have a simple Go application where we added the functionality to send email notifications though the Microsoft Graph API using Mail.Send. So we added the necessary packages, namely:

import (
	"github.com/Azure/azure-sdk-for-go/sdk/azidentity"
	authentication "github.com/microsoft/kiota-authentication-azure-go"
	msgraphsdk "github.com/microsoftgraph/msgraph-sdk-go"
	"github.com/microsoftgraph/msgraph-sdk-go/me/sendmail"
	"github.com/microsoftgraph/msgraph-sdk-go/models"
)

for this purpose.

The final code did what was expected (sending the mail), but it also made the duration for go test to jump from 7s to 8m 50s and the linting with golangci-lint to jump from 36s to 7m 13s in our Github Action pipeline.

We then changed the Graph API request to a "plain HTTP Post request" using the net/http package from the standard library (while still using github.com/Azure/azure-sdk-for-go/sdk/azidentity and github.com/microsoft/kiota-authentication-azure-go for the authentication) and we are back at the normal times for tests and linting.

Additional pointers for the excessive size of the github.com/microsoftgraph/msgraph-sdk-go/models package are the official documentation refuses to display the API because it is too large and also Github only shows the first 1000 files when listing the content of said package folder.

So in my opinion, the API surface of the package github.com/microsoftgraph/msgraph-sdk-go/models is way to broad and this should be refactored massively.

@ghost ghost added the Needs Triage 🔍 label Apr 13, 2022
@breml breml changed the title Issue with the size of the models package Issue with the size of the API surface of the models package Apr 13, 2022
@baywet baywet self-assigned this Apr 13, 2022
@baywet baywet added question Further information is requested Needs author feedback and removed Needs Triage 🔍 labels Apr 13, 2022
@baywet
Copy link
Member

baywet commented Apr 13, 2022

Hi @breml
Thanks for trying the Go SDK and for the detailed feedback.

This size and tooling performance aspect is something we're aware of and working to improve. I'll share a lengthy reply about the context and our efforts here.

Context

This SDK is automatically generated from metadata using kiota. The original metadata is a CSDL populated by all the service teams that live under Microsoft Graph (one for v1, one for beta). The CSDL eventually gets converted to an OpenAPI description which is quite large (1k+ endpoints, 1.5k models....). Because of the size of the API, it would not be feasible to handcraft the SDK for the full API surface.

We've thought for while about "slicing" the SDK in multiple submodules (one for files, one for emails etc...) to make it easier for people to get only the thing they care about. In fact this is what we've done with PowerShell. But due to the nature of a "graph" (all the models are somewhat related to one another) and the diversity of applications that get built, the slicing is never "right" for anyone (either too large, or too small, or leads to duplication of models...)

For those reasons we've decided to ship "full SDKs" as one of our offerings. It might not be ideal for everybody, but we feel like there's a population of Go developers that "just want and SDK to build applications" (as opposed to the second option I'll describe below).

Go shortcomings

Through my explorations of Go, I've noticed a few shortcomings. At this point I'm not sure whether this is because our project/packages are not setup properly or because of limitations for Go and large projects:

  1. go build often rebuilds sub packages that haven't changed and for which dependencies have not changed.
  2. go test often rebuilds things, even when go build was run right before. Why is it not relying on some sort of cache?
  3. same issue with go lint.

To me it feels like the cost of building a large project with lots of sub-packages should only be paid "once per session" unless dependencies are updated, cache is evicted or code changes.

Please feel free to outline any optimizations in our project configuration/structure that could help with that front. And if there are ways to engage with the Go community (people building the go compiler etc...) to provide that kind of feedback, I'll be more than happy to spend time doing so. I know our project is kind of an odd duck compared to most go packages out there due to its size.

Improvements to come

We already have two improvements queued up to reduce the amount of code we generate at the first place:

  1. Move to go 1.18 and leverage generics to remove a lot of for loops used to recast slices
  2. Remove redundant nil/error checks in the generated code

Please go ahead and review those, and if you find other patterns in the generated code that are redundant/could be replaced by something smaller, don't hesitate to mention it on those issues.

Right size sdks

Lastly, we recognize that having a full SDK with all the endpoints is not going to be suitable for everybody for a variety of reasons. We're working to enable a new "right-sized self-service SDK experience" where any API user could generate an SDK that looks and feels just like this one, but with only the endpoints/models they care about for their application instead of the full API surface.
We're really early in those efforts right now, but happy to get feedback on those nonetheless. Here is what the procedure would roughly look like:

  1. Create a new go project/identify an existing project.
  2. Add the kiota dependencies or msgraph-sdk-go-core (which pulls the Kiota dependencies and adds a few additional things)
  3. Go to Graph explorer select the resources you need (left panel, second tab, ..., "add to collection").
  4. Click on preview collection and export it as a postman collection.
  5. Use hidi with the postman collection and the full OpenAPI description I shared earlier to generate a "filtered" OpenAPI description.
  6. Use Kiota to generate your Go client for Microsoft Graph in your project.
  7. Start calling the API.

At this point we're working to document all those steps, and streamline them (maybe compressing steps 4-5). The great thing about this approach is that steps 5 through 7 can work with any OpenAPI described API you want to call, not just Microsoft Graph.
Again, this last offering is still in its infancy, feel free to try it out and provide feedback at the different places.

I hope this lengthy post clarifies where we're heading and getting some more feedback from the Go community on all those aspects would really help!

@breml
Copy link
Author

breml commented Apr 14, 2022

Hi @baywet

Thanks for your detailed feedback.

I can not go into all the details you mentioned due to lack of time, but I still want to share some of my thoughts with you:

  • The current state of the generated SDK is not very attractive for me as a Go programmer for several reasons:
    • If the official place, were Go developers lookup package documentation (https://pkg.go.dev/github.com/microsoftgraph/msgraph-sdk-go/models) does not render the API documentation, there is surely something wrong and this will clearly hinder adoption.
    • If my build, test and linting times spike by factor > 10 (and in the case of tests even by close to 100), there is clearly something wrong and again, this will hinder adoption.
    • Just to be fair, the times I shared are from my Github Actions pipeline, where compiling of the project does not benefit from the Go build cache in the same extend as on my development machine, but still the spike is just way to big for the case, where I use I single endpoint of the API.
  • If you have proof for the Go shortcomings you mentioned, especially the part about "unnecessary" rebuilds, then I assume the Go core team will happily have a look at this. I recommend to file an issue on https://github.com/golang/go, ask on https://groups.google.com/g/golang-nuts respectively https://groups.google.com/g/golang-dev or get in touch on Gopher Slack (https://invite.slack.golangbridge.org/).
  • For the specific case in our pipeline, all the packages are in fact different between go build and go test due to the fact, that we calculate the code coverage with go test -cover. In this case, the whole code base is instrumented with the necessary code to measure the test coverage, which then leads to recompiling the code. That being said, the jump from 7s to 8m 50s is still insanely big.
  • In regards to the linting, the tool that I am talking about is golangci-lint (https://golangci-lint.run/, https://github.com/golangci/golangci-lint) which at this stage is one of the most if not the most used tool for linting in the Go ecosystem. Even though it is a meta-linter (combines a wide variety of linters), special care is taken, that the analysis of the linted Go code is only done once. I have the impression, that the main problem with the linting is, that the models package has such a huge amount of types (even though only a small fraction of it is used).
  • I am not sure, if the improvements you are suggesting, really improve the situation, because I am not sure which dimension really hurts the compile (and code analysis time in the linter) that much. It is my feeling, that this is more rooted in the number of types than in some redundant code (err checks or loops).
  • The right size sdk approach is surely an interesting idea, but I am not sure, if this will help adoption in general, because it moves a significant amount of maintenance effort (update of the dependency) to the user. I work with Go for ~7 years now and I can not remember anyone doing such an approach.
  • The usage of the whole API does not feel very Go-ish and it reminds me at the first version of the AWS SDK. They managed to improve significantly with version 2.

Finally, I would be happy to help you with the improvement of the Go SDK, but this is not something, that I can do in my spare time nor as part of my current day job.

@baywet
Copy link
Member

baywet commented Apr 14, 2022

does not render the API documentation

Interesting, I might reach out to the team to understand what's going on, the documentation for the root package seems to be working properly https://pkg.go.dev/github.com/microsoftgraph/msgraph-sdk-go#GraphServiceClient.Admin

I recommend to file an issue on

Thanks for sharing this! I want us to do our homework (improvements I outlined) before we share our issues with that audience.

the times I shared are from my Github Actions pipeline

Understood, we've noticed the behaviour in both situations (GH actions and local dev loop).

that the models package has such a huge amount of types (even though only a small fraction of it is used

Shouldn't the linter only look at your code and not dependent packages? Additionally, shouldn't the compiler treeshake/trim unused types?

It is my feeling, that this is more rooted in the number of types than in some redundant code

Noted, reducing the amount of code we're generating shouldn't hurt anyway.
As per types we have 2 types per model (i.e. User struct and Userable interface). That's to "support upcast". Eg. when querying /groups/{id}/members the endpoint returns directory objects, which are in fact users, service principals, and other types that derive from directory object. Because the SDK automatically deserializes to the most specialized type, we've had to introduce interfaces. The technical reason behind it being we can't upcast structs (i.e. DirectoryObject(myUserInstance) or myUserInstancePtr.(*DirectoryObject)) because they are not the same size in memory. If we had a way to upcast structs we could divide the number of types by two, which should improve build times.

I can not remember anyone doing such an approach

How do Go devs typically handle GraphQL APIs then?

The usage of the whole API does not feel very Go-ish

Do you have specifics about patterns that don't feel idiomatic and what API we could offer instead?

I would be happy to help you with the improvement of the Go SDK, but this is not something, that I can do in my spare time nor as part of my current day job.

To be specific I'm not asking for you to contribute (PRs), but to keep providing feedback like you have so far. This is instrumental to us correcting course before the general availability of the language, thanks a lot for the feedback you've already provided!

@breml
Copy link
Author

breml commented Apr 19, 2022

does not render the API documentation

Interesting, I might reach out to the team to understand what's going on, the documentation for the root package seems to be working properly https://pkg.go.dev/github.com/microsoftgraph/msgraph-sdk-go#GraphServiceClient.Admin

My assumption is, that there is some sort of hard limit on https://pkg.go.dev, that prevents the documentation from rendering. Some quick stats:

  • 2541 .go files
  • 2325 type definitions
  • 22281 functions

A quick search showed me https://github.com/golang/pkgsite/blob/master/internal/godoc/dochtml/dochtml.go#L92-L95, which defines a limit of 10 MB for the resulting documentation.

Locally it is possible to render the documentation (with the official godoc tool), and the resulting file is ~24 MB.
That being said, this (locally generated) documentation is also rather useless, the index alone is ~450 full screen browser pages long.

I recommend to file an issue on

Thanks for sharing this! I want us to do our homework (improvements I outlined) before we share our issues with that audience.

As you wish. In order to just get some pointers about where to look, I would still recommend to reach out to the #tools channel on Gopher slack. In this channel, all topics around tooling around go (official but also community maintained like golangci-lint) are discussed there and some of the Go maintainers are hanging out there as well and sharing insights.

the times I shared are from my Github Actions pipeline

Understood, we've noticed the behaviour in both situations (GH actions and local dev loop).

that the models package has such a huge amount of types (even though only a small fraction of it is used

Shouldn't the linter only look at your code and not dependent packages? Additionally, shouldn't the compiler treeshake/trim unused types?

I am not entirely sure about the internal working of these tools, but almost all of them depend on https://pkg.go.dev/golang.org/x/tools/go/analysis for the analysis of the Go code.
Additionally also gopls, the official Go language server, depends on the above mentioned package for all the Go AST and types processing.
What this means is, that these packages are really widely used and they are maintained by the Go core team. So I expect them to be of very high quality and optimized for performance.

I do not know, if and to what extend treeshake or trim is used in this process and if these techniques could speed up the process. But because Go is a statically typed language, I expect that during the process of linting and compiling, all the types must be known (not only of the code that is linted, but also from all its dependencies).

It is my feeling, that this is more rooted in the number of types than in some redundant code

Noted, reducing the amount of code we're generating shouldn't hurt anyway. As per types we have 2 types per model (i.e. User struct and Userable interface). That's to "support upcast". Eg. when querying /groups/{id}/members the endpoint returns directory objects, which are in fact users, service principals, and other types that derive from directory object. Because the SDK automatically deserializes to the most specialized type, we've had to introduce interfaces. The technical reason behind it being we can't upcast structs (i.e. DirectoryObject(myUserInstance) or myUserInstancePtr.(*DirectoryObject)) because they are not the same size in memory. If we had a way to upcast structs we could divide the number of types by two, which should improve build times.

I can not remember anyone doing such an approach

How do Go devs typically handle GraphQL APIs then?

I am not a frequent user of GraphQL API. I am aware of the fact, that the dynamic nature of GraphQL can make it tricky with Go. One solution, that I think is worth looking at is: https://github.com/Khan/genqlient

That being said, the Microsoft Graph, at least in my understanding, is not a GraphQL API but a highly inter-weaved REST API.

The usage of the whole API does not feel very Go-ish

Do you have specifics about patterns that don't feel idiomatic and what API we could offer instead?

This would take some more time, maybe I can followup on this during the week.

I would be happy to help you with the improvement of the Go SDK, but this is not something, that I can do in my spare time nor as part of my current day job.

To be specific I'm not asking for you to contribute (PRs), but to keep providing feedback like you have so far. This is instrumental to us correcting course before the general availability of the language, thanks a lot for the feedback you've already provided!

@mblaschke
Copy link

mblaschke commented May 20, 2022

just tried a go get -u (via zsh with enabled REPORTTIME):

go: downloading github.com/prometheus/client_golang v1.12.2
go: downloading github.com/Azure/go-autorest/autorest v0.11.27
go: downloading github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.0.0
go: downloading gopkg.in/yaml.v3 v3.0.0-20220512140231-539c8e751b99
go: downloading github.com/Azure/azure-sdk-for-go v64.1.0+incompatible
go: downloading github.com/microsoft/kiota/serialization/go/json v0.0.0-20220331211134-ada6b745f15a
go: downloading github.com/microsoftgraph/msgraph-sdk-go v0.24.0
go: downloading github.com/microsoft/kiota v0.1.3
go: downloading golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a
go: downloading golang.org/x/crypto v0.0.0-20220518034528-6f7dac969898
go: downloading github.com/spf13/cast v1.5.0
go: downloading github.com/Azure/go-autorest/autorest/adal v0.9.20
go: downloading github.com/prometheus/common v0.34.0
go: downloading github.com/Azure/azure-sdk-for-go/sdk/azcore v1.0.0
go: downloading github.com/golang-jwt/jwt/v4 v4.4.1
go: downloading github.com/microsoftgraph/msgraph-sdk-go-core v0.25.0
go: downloading github.com/yosida95/uritemplate/v3 v3.0.2
go: downloading github.com/microsoft/kiota/http/go/nethttp v0.0.0-20220331211134-ada6b745f15a
go: downloading github.com/Azure/azure-sdk-for-go/sdk/internal v1.0.0
go: downloading github.com/AzureAD/microsoft-authentication-library-for-go v0.5.0
go: downloading golang.org/x/net v0.0.0-20220520000938-2e3eb7b945c2
go: downloading github.com/microsoft/kiota-abstractions-go v0.7.0
go: downloading github.com/microsoft/kiota-serialization-json-go v0.4.0
go: downloading github.com/microsoft/kiota-serialization-text-go v0.2.0
go: downloading github.com/microsoft/kiota-http-go v0.4.0
go: downloading github.com/microsoft/kiota-serialization-text-go v0.3.0
go: downloading github.com/microsoft/kiota-http-go v0.4.1
go: downloading github.com/kr/pretty v0.3.0
go: upgraded github.com/Azure/azure-sdk-for-go v63.1.0+incompatible => v64.1.0+incompatible
go: upgraded github.com/Azure/azure-sdk-for-go/sdk/azcore v0.21.0 => v1.0.0
go: upgraded github.com/Azure/azure-sdk-for-go/sdk/azidentity v0.13.2 => v1.0.0
go: upgraded github.com/Azure/azure-sdk-for-go/sdk/internal v0.9.1 => v1.0.0
go: upgraded github.com/Azure/go-autorest/autorest v0.11.24 => v0.11.27
go: upgraded github.com/Azure/go-autorest/autorest/adal v0.9.18 => v0.9.20
go: upgraded github.com/AzureAD/microsoft-authentication-library-for-go v0.4.0 => v0.5.0
go: upgraded github.com/cjlapao/common-go v0.0.18 => v0.0.19
go: upgraded github.com/golang-jwt/jwt v3.2.1+incompatible => v3.2.2+incompatible
go: upgraded github.com/golang-jwt/jwt/v4 v4.2.0 => v4.4.1
go: added github.com/microsoft/kiota-abstractions-go v0.7.0
go: added github.com/microsoft/kiota-http-go v0.4.1
go: added github.com/microsoft/kiota-serialization-json-go v0.4.0
go: added github.com/microsoft/kiota-serialization-text-go v0.3.0
go: upgraded github.com/microsoft/kiota/abstractions/go v0.0.0-20220309144454-31e5897b295c => v0.0.0-20220331211134-ada6b745f15a
go: upgraded github.com/microsoft/kiota/authentication/go/azure v0.0.0-20220309144454-31e5897b295c => v0.0.0-20220331211134-ada6b745f15a
go: upgraded github.com/microsoft/kiota/http/go/nethttp v0.0.0-20220308162731-fb6ab0cd5ea2 => v0.0.0-20220331211134-ada6b745f15a
go: upgraded github.com/microsoft/kiota/serialization/go/json v0.0.0-20220308162731-fb6ab0cd5ea2 => v0.0.0-20220331211134-ada6b745f15a
go: upgraded github.com/microsoftgraph/msgraph-sdk-go v0.13.0 => v0.24.0
go: upgraded github.com/microsoftgraph/msgraph-sdk-go-core v0.0.14 => v0.25.0
go: upgraded github.com/pkg/browser v0.0.0-20210115035449-ce105d075bb4 => v0.0.0-20210911075715-681adbf594b8
go: upgraded github.com/prometheus/client_golang v1.12.1 => v1.12.2
go: upgraded github.com/prometheus/common v0.32.1 => v0.34.0
go: upgraded github.com/spf13/cast v1.4.1 => v1.5.0
go: upgraded github.com/yosida95/uritemplate/v3 v3.0.1 => v3.0.2
go: upgraded golang.org/x/crypto v0.0.0-20220411220226-7b82a4e95df4 => v0.0.0-20220518034528-6f7dac969898
go: upgraded golang.org/x/net v0.0.0-20220127200216-cd36cc0744dd => v0.0.0-20220520000938-2e3eb7b945c2
go: upgraded golang.org/x/sys v0.0.0-20220319134239-a9b59b0215f8 => v0.0.0-20220520151302-bc2c85ada10a
go: upgraded google.golang.org/protobuf v1.27.1 => v1.28.0
go: upgraded gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b => v3.0.0-20220512140231-539c8e751b99
go get -u  26.55s user 31.55s system 3% cpu 30:01.08 total

30 minutes on a 8 core Intel Core i9 with 16 GB RAM 😳

docker multiarch builds (arm, arm64, amd64) takes around 2:30:00 on GitHub and is nearly reaching the timeout limit.

@mblaschke
Copy link

mblaschke commented May 24, 2022

With v0.24.0 build time reaches 4 hours or more (maximum seen: 5 hours)

@baywet
Copy link
Member

baywet commented May 25, 2022

Again, at this point we're really focusing on API coverage and reliability more than build time performance due to a shortage of staff. If you're watching the repo, you'll see that we get more issues about missing API segments or serialization than anything else for the time being.

That being said, I've exposed our short-terms plans on this thread (use of generics to reduce the amount of code we generate) and long-terms plans (ability to generate your own right sized, subset SDK), both of which are still on the roadmap before GA.

Now, I'm trying to understand why you're running go get -u in your CI? And why do you have to run it for every architecture instead of once for all the architectures, and then a go build? FYI the build of the SDK takes about ~7 minutes with the GitHub actions agent

For more insights on our plans, you can watch the brand new Microsoft Build recordings for free Deep dive into Microsoft Graph SDKs

@dominikh
Copy link

I haven't given this a deeper look, so these are drive-by comments, but I wanted to provide some general clarifications.

  • go build often rebuilds sub packages that haven't changed and for which dependencies have not changed.
  • go test often rebuilds things, even when go build was run right before. Why is it not relying on some sort of cache?
  • same issue with go lint.

These are largely not true. go build and go test use content-addressed caches (see the contents of go env GOCACHE) that take into consideration the source code of the package, its dependencies, and the relevant build environment. If they rebuild something, then something ought to have changed.

go test often rebuilds things, even when go build was run right before

At a minimum, go test will have to rebuild the package that is being tested, to include *_test.go files in the build. Conceptually, the package and the package while being tested are two different packages, made up of different files.

same issue with go lint

I'll assume you mean go vet here. Vet does reuse work made by go build and go test, namely it uses the export data that was cached. Of course this only helps with dependencies; the actual package being vetted needs to be loaded from source, to be analyzed. Vet also caches facts it has computed for API in dependencies.

All that is to say that if you see seemingly spurious rebuilds, it is worth investigating why these rebuilds are happening; they're not a fundamental limitation of the Go build system.

@baywet
Copy link
Member

baywet commented May 25, 2022

Thanks for chiming in @dominikh !

As mentioned previously I'd like for us to investigate the performance improvements first (amount of code we generate, generics...) before engaging with the Go "core" community. That being said, if you notice something that's obviously wrong in our repos, don't hesitate to point it out.

I'm often working with "replace" (local links to the dependency tree) and it seems that any change in the package will "mark it as stale" and rebuild the whole tree of dependent packages, which might have impacted my perception. (other ecosystems have a more fine grained tracking of those things, and are able to track these things at a class/type level). Would this make sense?

Also go get -u seems to trigger multiple rebuilds for each package it updates instead of updating all the packages and doing one single rebuild, would that be possible?

@dominikh
Copy link

I also can't reproduce the extreme build times reported in this issue.

30 minutes on a 8 core Intel Core i9 with 16 GB RAM

With v0.24.0 build time reaches 4 hours or more (maximum seen: 5 hours)

I've tested a fresh build of this repository on a 16 core 3950X and got this:

$ git clone https://github.com/microsoftgraph/msgraph-sdk-go
$ cd msgraph-sdk-go
$ time go build -v
go build -v  796.29s user 151.10s system 2172% cpu 43.614 total

And while 43.6 seconds isn't great, it's a far cry from 30 minutes, and especially from 4 hours. Even if you build for multiple GOOS x GOARCH combinations, it would take way too many combinations to reach these times.

@dominikh
Copy link

dominikh commented May 25, 2022

I'm often working with "replace" (local links to the dependency tree) and it seems that any change in the package will "mark it as stale" and rebuild the whole tree of dependent packages, which might have impacted my perception. (other ecosystems have a more fine grained tracking of those things, and are able to track these things at a class/type level). Would this make sense?

Yes, Go's caching is on a per-package level, not per-class/type. If you change anything about a package, then that invalidates the whole package, and all packages that transitively depend on it.

Also go get -u seems to trigger multiple rebuilds for each package it updates instead of updating all the packages and doing one single rebuild, would that be possible?

The compiler in general is modular, compiling one package at a time, and using cached data to reuse the work done for dependencies. There is no practical difference between building each package and building all packages.

@baywet
Copy link
Member

baywet commented May 25, 2022

Thanks for the additional information. In a scenario where we're building a console app which depends on this library (no replace, most basic case), should the cost of building this package one be paid once assuming dependencies are not changing?
In other words, if I setup my project (init & go get this library), shouldn't the subsequent go build commands only take a few milliseconds because only building the app.go and retrieving everything else from the cache?
The reason I'm asking the question is because it's not the behaviour I'm seeing locally (or that the rest of my team is seeing) and I'm worried something is not configured properly.

@breml
Copy link
Author

breml commented May 25, 2022

@baywet I can confirm, that I observed the same as you are describing and this worries me as well. I had a small conversation on Gopher Slack about this issue (this is why @dominikh chimed in).

In this conversation @josharian suggested to file an issue with Go to inspect this behavior in more depth, citation:

Years ago, when I last paid close attention, the go-vim types package used to dominate the compile juju benchmark. It had a massive exported surface area and was also a bottleneck in the compilation graph. The export data has been overhauled since then, but it is easy to imagine new bugs have been introduced, particularly with the IR generics work.
Definitely worth filing a Go issue about (golang.org/issue/new).
At least in the days when I was contributing, multi-minute build times were a top priority to fix.

Therefore, as mentioned before, I suggest to get in touch with the Go maintainers by filling an issue to

  1. confirm / rule out the Go compiler / tooling / analysis package
  2. get more hints on where to focus on in regards to the planned refactoring / optimizations.

@baywet
Copy link
Member

baywet commented May 25, 2022

Thanks for the suggestion. From a Go community etiquette perspective, do you think it'd be ok to file that issue BEFORE we investigate the use of generics (to reduce the amount of generate code) on our end?

@breml
Copy link
Author

breml commented May 25, 2022

@baywet Yes, I am pretty sure, that the Go team would like to understand why we see these kind of build times regardless of the fact, if the code uses generics or not.

As you might know, generics only landed a few months ago and they are not expected to be completely optimized in regards to performance.
Due to the lack of generics in the past, there is lots of automatically generated code in the Go ecosystem so this is not something unusual. And the overwhelming part of the existing Go code is still without generics.

In fact, if this issue gets investigated by the Go team, I would even expect them to be interested if a refactoring towards generics improves the situation.

@josharian
Copy link

josharian commented May 25, 2022

From a Go community etiquette perspective, do you think it'd be ok to file that issue BEFORE we investigate the use of generics

Yes!

Feel free to tell them that I sent you. :)

An excellent issue would go something like:

cmd/compile: slow compilation of a package with a very large API

We maintain a package with a very large API surface area that is very slow to compile. We have some ideas for how to streamline the package using generics, but @josharian also requested that we file an issue about the existing compilation time as well, in case there are any improvements available that might benefit others. To reproduce: insert precise reproduction instructions, including exact git remotes and hashes and exact go commands run and the timing you see for them.

@dominikh
Copy link

Thanks for the additional information. In a scenario where we're building a console app which depends on this library (no replace, most basic case), should the cost of building this package one be paid once assuming dependencies are not changing? In other words, if I setup my project (init & go get this library), shouldn't the subsequent go build commands only take a few milliseconds because only building the app.go and retrieving everything else from the cache? The reason I'm asking the question is because it's not the behaviour I'm seeing locally (or that the rest of my team is seeing) and I'm worried something is not configured properly.

That's roughly what I would expect, yes. There is of course still a cost to using such a large package, even if it's cached. The cached export data needs to be loaded and used, the final binary needs to be linked, etc. But one would expect these costs to be much smaller than the cost of compiling the package the first time.

So far I've only tested compiling the package itself, not compiling a dependent. It would be interesting to compare the two.

@josharian
Copy link

josharian commented May 25, 2022

The cached export data needs to be loaded and used, the final binary needs to be linked, etc. But one would expect these costs to be much smaller than the cost of compiling the package the first time.

If the code is minimal, loading import data could in theory end up being just as costly as doing the initial compilation. Particularly if you have to load the import data N times, and compile only once.

The compiler importer has gone through roughly three generations: (1) Simple, text-based, and expensive, (2) cheaper binary but still non-lazy, and (3) binary and lazy. With the lazy importer, in theory, the compilation cost of importing a very large package ought to be proportional to the symbols you use from that package. But maybe that's not the case. (Maybe the compiler is reading the entire import data instead of mmap'ing it? Maybe the index of symbols itself is large enough to impact compilation time?) That's the sort of interesting, fixable thing one can learn from cases like this, and why it's worth filing an issue, if you can reliably reproduce and nail down what is slow (without worrying about exactly why).

@baywet
Copy link
Member

baywet commented May 26, 2022

Thanks everyone! I posted an issue with a clear repro don't hesitate to jump on there and ask or provide additional information if you think I missed something.

@friedrichg
Copy link

friedrichg commented Jan 19, 2023

I left this issue a long time ago, but for some reason forgot to unsubscribe. I feel the pain most users have here so I wanted to share an alternative https://github.com/friedrichg/go-msgraph-example

@baywet
Copy link
Member

baywet commented Mar 3, 2023

Hi everyone,
With v0.56.0 we've shipped two performance related improvements:

Note: we're still working on the release for Beta, but we anticipate these improvements will yield even better results since this version has many more models.

With those changes in place, we're seeing an additional 20% build time performance improvements.

This is the last set of changes we're making on that front since we've exhausted all avenues to improve build times which do not lead to a feature trade-off and since build performance is now acceptable in most development environments.

If build time performance or asset size are still challenging for your scenario, you might want to consider kiota the generator that powers this SDK. You'll be able to generate a client with the same development experience, but with only the endpoints your application needs. Since the early days, we're embedded the search experience for API descriptions and the filtering capabilities directly within kiota thanks to your (and others') feedback.

In terms of future plans, we're still working to fix our metadata to remove references between models when it actually doesn't apply. We're also working to embed Kiota (or the Microsoft Graph self-serve story) in GUI experiences (nothing to disclose at this time).

I'll leave this issue open for another couple of days, and then close it depending on the replies.

To finish: The whole team would like to thank all of you on this thread for your feedback and helping shape the Go SDK! You truly had an impact on design decisions, and the final experience.

preslavgerchev added a commit to mondoohq/cnquery that referenced this issue Mar 6, 2023
We've had a layer over the msgraph-sdk-go, which used some code that was
copied directly from the library itself.

As per
microsoftgraph/msgraph-sdk-go#129 (comment),
it seems that this library is now improved enough that it doesn't bloat
up our binary size and builds fast

This PR removes all the adapter code and bumps the msgraph-sdk to the
latest production (0.56) version

The rest is mostly changing struct names where needed
@ghost
Copy link

ghost commented Mar 7, 2023

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.

@breml
Copy link
Author

breml commented Mar 7, 2023

I am no longer using this package, so I am not in a good position to provide meaningful feedback on this, sorry.

@fabiante
Copy link

fabiante commented Mar 8, 2023

@baywet Thanks for you summary. I understand optimizing this issue is no simple feat. At the company I work at we will have to heavily rely on Graph API and therefore apprechiate your efforts.

If build time performance or asset size are still challenging for your scenario, you might want to consider kiota the generator that powers this SDK.

If in any case build times will be too high we will consider using Kiota too. Maybe, this fact is important enough to state in the projects readme or documentation...

@baywet
Copy link
Member

baywet commented Mar 8, 2023

Thanks for the suggestion. We have some work to simplify the self-serve story before it's ready to be broadly shared in the docs, but yes, it's part of the plan.

@stephenwan-opal
Copy link

hi @baywet, first, thanks for the work y'all have put into investigating this issue.

Unfortunately, the package is still not really in a usable place for us. From my previous comment, the initial build is down from 7m to 3m on my local machine (better! but still pretty bad for a single dependency). Rebuilds (measuring compiler linking time only) actually looks like it's gone up from 8-12s to 18-20s (baseline 4-6 seconds without this package). I'm using the v0.50.0 beta release.

@baywet
Copy link
Member

baywet commented Mar 10, 2023

Thanks for the numbers. Could you try with 56 or 57, we've implemented additional improvements as I was mentioning earlier

@baywet
Copy link
Member

baywet commented Mar 10, 2023

Sorry 55 in case of beta

@stephenwan-opal
Copy link

Sorry 55 in case of beta

Sorry, I misspoke in my previous comment. These results are with the v0.55.0 beta release. Here's my upgrade diff:

-  github.com/microsoft/kiota-abstractions-go v0.15.1
-  github.com/microsoft/kiota-authentication-azure-go v0.5.0
-  github.com/microsoft/kiota-serialization-json-go v0.7.2
-  github.com/microsoftgraph/msgraph-beta-sdk-go v0.50.0
-  github.com/microsoftgraph/msgraph-sdk-go-core v0.31.0
+  github.com/microsoft/kiota-abstractions-go v0.17.2
+  github.com/microsoft/kiota-authentication-azure-go v0.6.0
+  github.com/microsoft/kiota-serialization-json-go v0.8.2
+  github.com/microsoftgraph/msgraph-beta-sdk-go v0.55.0
+  github.com/microsoftgraph/msgraph-sdk-go-core v0.34.1

@swenson
Copy link

swenson commented Nov 29, 2023

Hi,

We've noticed this is still an issue today. Specifically, the msgraph-sdk-go dependency adds approximately 92 MiB to our binary size, 27 MiB of which is in the executable code itself (all on darwin/arm64). This is a problem because there is effectively a 128 MiB limit on executable code on ARM64, so 21% of the entire code budget is used by this one dependency.

We'd love if there was a supported way to generate a subset of APIs, or to generate this API in such a way that the compiler could more effectively remove the pieces we aren't using (e.g., by more effectively modularizing the API).

@baywet
Copy link
Member

baywet commented Nov 29, 2023

@swenson checkout kiota the generator behind this SDK which also allows you to select only the path segments/operations you need, drastically reducing the binary size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests