-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spurious CI error for MSVC #81890
Comments
The cargo test
So I guess something locks the file after it is written so it can't be overwritten shortly thereafter. Can the test be updated not to do this? |
@hkratz I'm looking into fixing that test (and doing some other collision stuff). I'm also trying to dig into Just on the off chance we can find someone who wants to help: @rustbot ping windows This issue is causing significant trouble on CI, and I'm having a hard time making progress on it. A huge help would be if someone can find some way to reproduce it. I've tried running |
Hey Windows Group! This bug has been identified as a good "Windows candidate". cc @arlosi @danielframpton @gdr-at-ms @kennykerr @luqmana @lzybkr @nico-abram @retep998 @rylev @sivadeilra @wesleywiser |
@ehuss Great! I wrote a little script to grep through the CI logs and of the last 30 failed msvc runs which had LNK1201 in it all failed at The script for reference (excuse my Python, not a Python native): import github
import requests
from github import Github
def main():
token = "<replace with token>"
g = Github(token)
# github.enable_console_debug_logging()
repo_name = "rust-lang-ci/rust"
repo = g.get_repo(repo_name)
workflow_runs = repo.get_workflow_runs(status="failure", branch="auto")
for run in workflow_runs:
jobs_url = run.jobs_url + "?per_page=100"
response = requests.get(jobs_url, headers={"Authorization": f"token {token}"})
response.raise_for_status()
for job in response.json()["jobs"]:
if job["conclusion"] == "failure" and "msvc" in job["name"]:
log_url = f"https://api.github.com/repos/{repo_name}/actions/jobs/{ job['id']}/logs"
log_response = requests.get(
log_url, headers={"Authorization": f"token {token}"}
)
log_response.raise_for_status()
log = log_response.text
if "LNK1201" in log:
print(log_url)
print("\n\n")
print(log) |
Fix some tests with output collisions. This fixes some tests which run afoul of creating colliding outputs (tracked in #6313). In particular, these tests are creating duplicate pdb files on Windows because they have a binary and a library (dylib) with the same name. This is causing significant issues on rust-lang's CI (rust-lang/rust#81890) where the MSVC linker is failing with a mysterious LNK1201 error. Presumably two LINK.exe processes are trying to write to the same PDB file at the same time, which causes it to fail. Ideally this shouldn't happen, but I don't really have any ideas on how to resolve it, as the name of the PDB has some importance. I have not been able to reproduce the LNK1201 error. My hope is that this change will help alleviate the issue, though. I updated the `doc_all_member_dependency_same_name` test to illustrate that it is hitting a collision, which is a fundamental part of that test (and something we should probably figure out how to resolve in the future).
I talked with an engineer internally and they said that LNK1201 most often occurs because some program has a lock on the file. I don't know what the configuration of the GH runners looks like but perhaps Windows Defender (or similar AV/AM software) is scanning the file right after the linker finishes writing it the first time? Regardless, rust-lang/cargo#10137 looks like the correct approach to resolving this to me. Thanks @ehuss! |
I tried to upgrade the toolchain to rust Interestingly, this issue occurs only when running tests using |
@wesleywiser, did you mean [rust-lang/cargo#10137] is a fix only to the CI error mentioned in this issue, or is it supposed to fix the transient LNK1201 errors (due to something, most likely MsDefender, locking the artifacts) for all cases? I assume you meant the former, right? Do we have an issue tracking the LNK1201 that happens (occasionally) even though there is not colliding artifacts? |
I'm going to close as I do not think we have been hitting LNK1201 errors in a while. I don't know what happened with the |
There have been a few instances of this error happening on CI:
Or some variant of that. See:
expr_method_call
in derive(Ord,PartialOrd,RustcEncode,RustcDecode) #81411 (comment)fewer_names
inuncached_llvm_type
" #80122 (comment)(and probably many more)
The text was updated successfully, but these errors were encountered: