Skip to content
This repository has been archived by the owner on Feb 13, 2023. It is now read-only.

Guest hangs after 5 minutes of inactivity on macOS Big Sur host #2154

Closed
mcroyle opened this issue Mar 10, 2021 · 19 comments
Closed

Guest hangs after 5 minutes of inactivity on macOS Big Sur host #2154

mcroyle opened this issue Mar 10, 2021 · 19 comments

Comments

@mcroyle
Copy link

mcroyle commented Mar 10, 2021

Issue Type

  • Bug Report / Support Request

Your Environment

Vagrant 2.2.14
Virtualbox 6.1.18r142142

Your OS

  • macOS (Big Sur 11.2.3)

Summary

I'm using the "Drupal VM as a Composer Dependency" setup for a project. Multiple colleagues are experiencing issues where when the VM is left idle (no interaction in the guests terminal or accessing a site via the web) for approximately 5 minutes, everything hangs. Commands such as vagrant ssh from the host gets you into the guest as far as showing you the last login but no prompt, interacting within the guest just hangs, or accessing the website yields a timeout. I am running the same project on my machine with the same versions of Vagrant and Virtualbox, however macOS (Catalina 10.15.7). I have not experienced any of these issues.

Running vagrant halt && vagrant up gets us back to a functional state again until these periods of inactivity. However, running vagrant suspend && vagrant up does not and it remains non-functional.

My focus in troubleshooting / issue searching has been around NFS timeouts but this only turns up results for Windows machines. I've compared macOS Catalina /etc/exports to the /etc/exports in macOS Big Sur with no differences, I've permitted VIrtualbox to have Full Disk access, and reviewed the Virtualbox logs but nothing stands out.

At this point I'm unsure of what else I can check / test and hoping for some ideas or direction.

@purplecones
Copy link

I also get this same exact issue. Also running the same vagrant and virtualbox versions but still on version 11.2.1 of MacOS. Running vagrant reload fixes it but it's annoying because it will hang again if I don't interact with the VM within a few minutes.

@gregj0
Copy link

gregj0 commented Mar 19, 2021

Multiple people on our team are seeing something similar on these combinations
Your Environment
Vagrant 2.2.14
Virtualbox 6.1.16r140961 (we have other issues with 6.1.18)
Your OS
macOS (Big Sur 11.2.3)

Seeing it on
drupal-vm 6.0.2/Centos7
and
drupal-vm 6.0.3/Centos8

@geerlingguy
Copy link
Owner

Strange; it seems like something that could be a bug in Vagrant, but haven't seen any bug reports in that issue queue that would help give any clues.

@oxyc
Copy link
Collaborator

oxyc commented Mar 19, 2021

I have a colleague with the same problem and we couldn't figure it out :/ I think it was the NFS mount that became unavailable.

Could anyone verify if it's possible to run commands inside the VM while you're outside of /var/www/drupalvm?

vagrant ssh
cd ~/
# wait for it to hang in the browser...

# Check if commands work in the home directory
ls -la

# Check if entering the NFS mount hangs the shell or not
cd /var/www/drupalvm
ls -la

In my case everything runs fine on macOS 11.0.1 with the latest vagrant/virtualbox/etc so I think it might have to do with the latest macos releases but that's a guess..

@mcroyle
Copy link
Author

mcroyle commented Mar 20, 2021

Could anyone verify if it's possible to run commands inside the VM while you're outside of /var/www/drupalvm?

After the five minutes pass you can continue to access files from within the VM. As soon as you attempt to access any files that are from the NFS mount, it locks up.

I was able to switch the users to use the Virtualbox shares and everything behaves as expected.

@jldust
Copy link

jldust commented Mar 30, 2021

I have been having this exact issue for weeks since I updated to Big Sur. Switching the synced folder type to virtualbox kind of worked for me. It still seems to time out but I just open another terminal window and it works again.

vagrant_synced_folders:
  - local_path: .
    destination: /var/www/docroot
    type: virtualbox
    create: true

Environment
Vagrant 2.2.14
Virtualbox 6.1.18

OS
macOS (Big Sur 11.2.3)

Drupal VM Version
6.0.3

@mcroyle
Copy link
Author

mcroyle commented Mar 30, 2021

@jldust The following is what I've had my users switch to until we can figure this issue out.

vagrant_synced_folders:
    - local_path: .
      destination: /var/www/docroot
      type: virtualbox
      create: true
      mount_options: ["dmode=775", "fmode=664"]
        owner: "vagrant"
        group: "www-data"

@jeffreysmattson
Copy link

jeffreysmattson commented Apr 22, 2021

I am also having this issue running Big Sur. Don't know what else to try, I have tried everything in this thread. Some seems to work for a while, but eventually fails. Within 10 minutes at the most. Locks up. Only thing that fixes it is a vagrant reload.

Update:

vagrant_synced_folders:
  - local_path: ./
    destination: /vagrant
    id: vagrant
    type: virtualbox
    create: true

  - local_path: ../webfolder
    destination: /var/www/webfolder
    id: sites
    type: virtualbox
    create: true
    mount_options: [ "dmode=777", "fmode=777" ]
    owner: "vagrant"
    group: "www-data"

Because I had a script in the /vagrant folder I used often, it would hang whenever I ran 'ls' on this folder. When I looked in the virtualBox GUI I saw that the '/vagrant' folder wasn't listed as shared. Then I discovered this line in the vagrantfile:

# VirtualBox.
  config.vm.provider :virtualbox do |v, override|
    v.linked_clone = true
    v.name = vconfig['vagrant_hostname']
    v.memory = vconfig['vagrant_memory']
    v.cpus = vconfig['vagrant_cpus']
    v.customize ['modifyvm', :id, '--natdnshostresolver1', 'on']
    v.customize ['modifyvm', :id, '--ioapic', 'on']
    v.gui = vconfig['vagrant_gui']

    nfsPath = "."
    if Dir.exist?("/System/Volumes/Data")
        nfsPath = "/System/Volumes/Data" + Dir.pwd
    end
    #override.vm.synced_folder nfsPath, "/vagrant", type: "nfs"
  end

Commenting out the 'override.vm.synced_folder nfsPath, "/vagrant, type: "nfs"' allowed the folder to be shared in the way I was asking it to in the 'vagrant_synced_folders' array above. So far this works.

@MokDevelopment
Copy link

MokDevelopment commented May 4, 2021

As I don't have time for waiting for a proper fix by the maintainers nor do I have the time for investigations I came up with the cheapest solution I can think of :)

  • A dumb bash script, that writes constant data inputs to the file system by adding a session string to a temporary text file every 10 seconds to keep the virtualBox file system stream up and running.

In the moment I start the bash script in a second terminal window. In theory the script can also be startet via the vagrant file. But again... it is just a nasty workaround.

bashloop.zip

counter=1
> bashloop_output.txt
echo "Starting session:" >> bashloop_output.txt
while [ $counter -le 100000000000 ]
  do
    echo 'writing to file '$counter' times...'
    echo "$counter" >> bashloop_output.txt
    ((counter++))
    sleep 10
  done
echo 'End of Space & Time'

@oxyc
Copy link
Collaborator

oxyc commented May 4, 2021

Haha I love that solution! Thanks for sharing. Once I upgrade macos and hit this bug (I'm still on an older version partly because Im avoiding this bug), I would just make it into a system service on the VM that runs as soon as it boots up.

Does it work if you run it from within the VM too? Eg by writing to /vagrant/.bashloop_output

@susannecoates
Copy link
Contributor

susannecoates commented May 4, 2021

Seeing the same problem here.
Host OS: Mac OSX Big Sur (11.2.1)
Guest OS: Ubuntu 18.04.5 LTS
Drupal VM: 2.0.10
Vagrant: 2.2.5
Ansible: 2.6.4

@susannecoates
Copy link
Contributor

susannecoates commented May 4, 2021

Inspired by @MokDevelopment script and @oxyc suggestion of a service I created a simple script that will touch then remove a file at file 30 second intervals.

These scripts are run in the VM

keepalive.sh

#!/bin/bash
while :
  do
    touch /var/www/umdarhuweb/keepalive.tmp
    sleep 30
    rm /var/www/umdarhuweb/keepalive.tmp
    sleep 30
  done

NOTE: The file path for the file that will be touched and removed must be located in the NFS mounted volume.

The file to go in /etc/systemd/service:

vagrant-keepalive.service

[Unit]
Description=Workaround for vagrant NFS inactivity bug
StartLimitIntervalSec=0
[Service]
Type=simple
Restart=always
RestartSec=1
User=www
ExecStart=/var/www/umdarhuweb/scripts/workarounds/keepalive.sh

[Install]
WantedBy=multi-user.target

NOTE: you will need to change the User to be someone who has appropriate priviledges for running the script and writing to the location for the temporary fileand change the path for ExecStart to match where ever you install the script.

once you've made the changes you should be able to run the script as a service as follows:

sudo systemctl start vagrant-keepalive.service

To set the service to run when the machine boots do this:

sudo systemctl enable vagrant-keepalive

@geerlingguy
Copy link
Owner

I'd be okay with committing a fix into Drupal VM that sets up something like this (maybe a role like vagrant-nfs-fix or something?).

@susannecoates
Copy link
Contributor

@geerlingguy Although it's kind of a kluge, I don't see another solution at present. I agree with your earlier assessment that it's a vagrant bug.

@susannecoates
Copy link
Contributor

susannecoates commented May 4, 2021

Probably worth exploring if there is another lower impact operation(s) instead of touch and rm that could be used in the script. I just went for those because I needed a quick solution.

oxyc added a commit to oxyc/drupal-vm that referenced this issue May 4, 2021
…ssues on macOS

Co-authored-by: MokDevelopment
Co-authored-by: Susanne Coates <scoates@susannecoates.net>
geerlingguy added a commit that referenced this issue May 8, 2021
Issue #2154: Add vagrant-nfs-fix role to fix NFS timeout issues on macOS
@geerlingguy
Copy link
Owner

Marking as an upstream bug, but since we now have a workaround (thanks especially to @susannecoates and @oxyc!) in #2162, once the next release is out the fix will be adding the following line in your config.yml:

vagrant_nfs_fix_enabled: true

@davidtrainer
Copy link

Noting here for posterity that I had an issue which was similar but not quite the same. I set up my drupalvm-based project after upgrading to Big Sur, and everything provisioned successfully, but when I tried vagrant ssh there was no shell prompt. Ctrl-C did nothing. Drush commands using the local alias outside the VM would just hang. All I could do is close the terminal tab. vagrant_nfs_fix_enabled: true in box/config.yml fixed it.

@Oscaner
Copy link

Oscaner commented Oct 10, 2021

Thanks for everyone, vagrant_nfs_fix_enabled: true work fine for me.

And I found additional solution is set nfs_udp: true, issue from NFS TCP stuck.

# Provide the path to the project root to Vagrant.
vagrant_synced_folders:
  # Set the local_path for the first synced folder to `.`.
  -
    local_path: .
    # Set the destination to the Acquia Cloud subscription machine name.
    destination: /var/www/drupalvm
    type: nfs
    nfs_udp: true

TCP stuck at FIN_WAIT2 in virtualbox, and CLOSE_WAIT at macOS.

image

@uberjay
Copy link

uberjay commented Mar 6, 2022

@Oscaner I wonder if you've found any additional information here. Using UDP would be an ok workaround, except for recent linux kernels have disabled UDP support by default, due to risk of data corruption. Building a custom kernel to enable UDP nfs support is an option, but it's not a particularly great one.

(In case it's not clear, I'm having this issue as well -- it's still present in macOS Monterey. I have a suspicion that the biggest consumers of the macOS nfsd are vagrant users, but who knows!!? 😉)

So far, the best workaround I've found is to use SMB/CIFS instead of NFS, which is not awesome, but it does sidestep the buggy macOS nfs server. Sigh.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests