-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llvm-symbolizer not present in base queue #11631
Comments
Hi Kunal, we will get on this. @hoyosjs do you know if this just comes built in with llvm? lldb 3.9 is already being installed on the base ubuntu.1804 queues. Do you need a different version? This is the test queue, so I don't think it would be an issue to upgrade that to something newer, but I'd like to check before making any major changes. |
Do you know why 3.9? And llvm sounds good. |
I do not know why 3.9. Possibly historic reasons? @MattGal it looks like we set our lldb version to 3.9 back in 2020. Do you know why we're using that? Edit Oh, actually, we set this in 2019. Edit: that is also a lie. I am still digging to how long ago we chose 3.9 and never updated it. |
Probably for diagnostics... |
Yeah. I think that's also what's on the docker images that y'all are using and upgrading to something more modern is also breaking things. I worry updating that will break y'all |
@kunalspathak we support several different linux distros, not all of which may have a usable version of llvm-symbolizer. Would it be acceptable if this were only added to Ubuntu Helix machines, or do you need it everywhere? Odds are it's not going to work with some of our more unusual linuxes. |
@hoyosjs - what do you think? |
Updating the queues the runtime uses directly would be the first priority:
We'll have to evaluate the helix containers, but those are much easier to update and we've even built the toolset in some of the containers historically. |
@MattGal do you know where the symbolizer might not be available? cc: @jkoritzinsky since this might be interesting for your *SAN work |
Offhand I'd venture it might not be available on old SLES or Mariner. It's one of those things we don't know until we try. |
Those don't tend to impact our priority scenario - the PR analysis checks |
PR to add them to the two linux based queues: https://dev.azure.com/dnceng/internal/_git/dotnet-helix-machines/pullrequest/27535 I think for OSX, we're going to have to get ddfun involved |
Opened https://portal.microsofticm.com/imp/v3/incidents/details/349676322/home to get llvm added to the OSX queue. |
(Moved to tracking while we wait for DDFun to update the systems) |
@michellemcdaniel do we know the time estimate until DDFun to update the system? |
I do not. I know it's been assigned, but I haven't seen any movement on it. I will ping the ICM |
In general, it takes 1-2 weeks to get this many systems updated (100ish machines), and next week is Thanksgiving, so it's likely going to be at the longer end of that estimate. |
Does this rollout |
We did not have a rollout last week due to the US holiday. The linux changes should rollout this week. |
Heads up: DDFun says the OSX queue has been updated to have llvm on them |
I tried this out but seems there is still some issue. Test Infrastructure Failure: System.ComponentModel.Win32Exception (2): An error occurred trying to start process 'llvm-symbolizer' with working directory '/private/tmp/helix/working/ADD7099B/w/A75E0909/e'. No such file or directory |
@kunalspathak the job was executed in the queue osx.1200.amd64.open but the request was to install llvm in OSX.1200.ARM64 so it is expected for it to not be available in the amd64 queue. In which queue do you need it? |
I just noticed this from @hoyosjs . I think we also need it for OSX x64, right @hoyosjs ?
|
Yes, sorry - it would be needed on |
Thanks @hoyosjs and @JulieLeeMSFT. To be clear, this isn't blocking builds or preventing releases, but it is making it hard to diagnose test failures. Is there anything else we should know to help set priority? (Unfortunately our Ops team has a rather large backlog right now and we need to be very crisp to be sure we're handling issues in the best order.) |
These three and #11868 are queues where we can't enable blocking on build analysis for runtime easily, since no crash info will be available for those. |
Is it correct that the |
Ah, I misunderstood. I see that As for the state of the MacOS queues... I'll have to dig a bit deeper there. |
We are blocking all PR merge on red from 3/19 in dotnet/runtime. It will be a big pain to developers if they don't get traces to debug the failure and unblock themselves to merge on green. We have worked on this feature for almost 2 years, and this is the last piece that needs to be in place to ensure smooth developer experience when we enforce merge on green on 3/19.
|
On ubuntu that's likely enough for now. But for macOS it's likely very different :) |
@hoyosjs I'm not seeing any results from this query. Should it still be working? |
I don't have bandwidth to take up this issue yet, but in an effort to speed things up a bit I've opened a request to DDFUN asking them to check on the MacOS systems in question. I'll follow-up here with the results. -- ICM 479938683 |
@hoyosjs DDFUN spot checked a few machines in the MacOS queues and have confirmed that llvm-symbolizer is installed and should be available on the path. I asked them for the specific path to the bins and they found these: AMD64: Does this match what you're seeing in your builds? |
Are these on the path? I still see hits on runs from today:
|
They've confirmed the right path is listed in I've extracted a random sample of failing machines and asked for those to be checked to rule out an inconsistent configuration. Your query gives a good view of failing cases but I wonder if we can establish if there have been any successful cases. Do you know of a message that would be printed if it was successful? |
I tried looking - I see no successful invocations of it on macOS. On linux containers it looks like:
|
Instead of symbolizer, macOS has
|
DDFUN confirmed that llvm-symbolizer is callable from the home directory, so it must be present on the path. There must be something different about the build, but without looking through YAML or debugging an actual build, I'm at a loss. Does anyone have any other suggestions on what to check? |
For recordkeeping, I've taken |
Opened IcM about this: https://portal.microsofticm.com/imp/v5/incidents/details/512138082/summary?tmpl=21c3we |
DDFUN resolved the IcM. Please follow up on the IcM if this issue persists. |
Build
https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-77578-merge-965165820fec43e19e/JIT.Stress/1/console.f7c5d70b.log?helixlogtype=result
https://dev.azure.com/dnceng-public/public/_build/results?buildId=82793&view=ms.vss-test-web.build-test-results-tab&runId=1731386&resultId=102137&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab
Pull Request
dotnet/runtime#77578
Action required for the engineering services team
Additional information about the issue reported
To triage this issue (First Responder / @dotnet/dnceng):
In dotnet/runtime#77578, we are trying to generate the crash stacktrace using
llvm-symbolizer
. While it is present in containers, the base Linux and macOS queues doesn't have it and we see error using it. See the logs I referenced in the issue. Can we get it and lldb installed on base image?CC: @hoyosjs @JulieLeeMSFT
Release Note Category
Release Note Description
Add llvm and llvm-symbolizer to Ubunut.1804.Amd64 and RedHat.7.Amd64
The text was updated successfully, but these errors were encountered: