Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: runtime error: index out of range #49

Closed
bilco105 opened this issue Aug 6, 2019 · 32 comments
Closed

panic: runtime error: index out of range #49

bilco105 opened this issue Aug 6, 2019 · 32 comments

Comments

@bilco105
Copy link
Contributor

bilco105 commented Aug 6, 2019

Hi @sevagh

First of all, really appreciate the work you've put into this tool - we use it a lot.

Recently though, I've randomly started encountering a problem that I've not seen before, specifically this error:

INFO[0000] Running goat for ebs
INFO[0000] WELCOME TO GOAT
INFO[0000] 1: COLLECTING EC2 INFO
INFO[0000] Establishing metadata client
INFO[0000] Retrieved metadata successfully               instance_id=i-0a403f852293bd8e3
INFO[0000] Using metadata to initialize EC2 SDK client   az=eu-west-1a instance_id=i-0a403f852293bd8e3 region=eu-west-1
INFO[0000] 2: COLLECTING EBS INFO
INFO[0000] Searching for EBS volumes
INFO[0000] Classifying EBS volumes based on tags
INFO[0000] 3: ATTACHING EBS VOLS
INFO[0000] 4: MOUNTING ATTACHED VOLS
panic: runtime error: index out of range

goroutine 1 [running]:
main.prepAndMountDrives(0xc0004388e0, 0x5, 0xab95d0, 0x0, 0x0)
	/home/shanssian/go/src/github.com/sevagh/goat/ebs.go:42 +0x1350
main.GoatEbs(0x7d1b00, 0x7cd503, 0x7)
	/home/shanssian/go/src/github.com/sevagh/goat/ebs.go:35 +0x256
main.main()
	/home/shanssian/go/src/github.com/sevagh/goat/main.go:66 +0x4c

The EBS volumes attach fine, however, they fail to mount. This looked the same as #47 initially, however, I've upgraded to 1.0.2 and the error persists.

The tagging all appears to be correct, GOAT-IN:FsType, GOAT-IN:MountPath, GOAT-IN:NodeId, GOAT-IN:Prefix, & GOAT-IN:VolumeName are all set, and I've verified the IAM permissionns are correct.

Any ideas?

@sevagh
Copy link
Owner

sevagh commented Aug 6, 2019

I'm glad the tool has been useful.

It could be that the disks are somehow already attached, in which case my code ignores them - I believe that's a corner case that I hadn't thought through, I could proceed to the next steps (Mount, etc.) even if the disks are already attached.

How do you create EC2 machines + EBS vols? Terraform?

Do you have an example of a log from a successful run stored somewhere? I believe the emptiness between steps 2-3 and 3-4 could indicate the "disks already attached therefore my code doesn't add them to the array of disks to mount" hypothesis.

@bilco105
Copy link
Contributor Author

bilco105 commented Aug 6, 2019

Thanks for the speedy response.

I've just downgraded to 0.6.0 which seems to have got things working again.

That's certainly quite possible. Goat is running at boot, so it may well have failed to mount because of some other reason, after having already attached all the volumes, then died.

I'm then running goat ebs a second time, which is where I'm seeing the error above.

I'll try upgrading back to 1.0.2 shortly and running it with no volumes attached to see what happens there. Sounds like I'm potentially running into two different issues

@bilco105
Copy link
Contributor Author

bilco105 commented Aug 6, 2019

Ok, with everything detached, I get a different problem

FATA[0020] Drive /dev/xvdl doesn't exist after attaching - checked with stat 10 times  vol_id=vol-0c17eb2e0d145df15 vol_name=export

Interestingly, that volume took longer to attach than goat was willing to wait.

@sevagh
Copy link
Owner

sevagh commented Aug 6, 2019

Can you test this hotfix? https://github.com/sevagh/goat/releases/tag/1.0.3-rc0

@bilco105
Copy link
Contributor Author

bilco105 commented Aug 6, 2019

@sevagh That appears to have resolved the original issue 👍

I'm now getting this:

FATA[0000] Couldn't mount: no such device                vol_name=archive vols="[{vol-06ca06283f3290a4e archive -1 1 /dev/xvdb /var/www/html ext4}]"

We're using nvme, which is probably why - it should be /dev/nvme1n1

@sevagh
Copy link
Owner

sevagh commented Aug 6, 2019

I should be getting that attached device name from the AWS API: https://docs.aws.amazon.com/sdk-for-go/api/service/ec2/#VolumeAttachment

@bilco105
Copy link
Contributor Author

bilco105 commented Aug 9, 2019

@sevagh Do you want me to raise a new issue for the above? then you can close this one off

@sevagh
Copy link
Owner

sevagh commented Aug 9, 2019

This same issue is OK. I'll get back to you soon.

@sevagh
Copy link
Owner

sevagh commented Aug 9, 2019

Can you print the logs surrounding the above error message? There are some additional functions related to nvme (which another user helped contribute) - I'm not very well versed in Nitro and nvme devices.

@bilco105
Copy link
Contributor Author

bilco105 commented Aug 11, 2019

Sure, here you go:

INFO[0000] Running goat for ebs
INFO[0000] WELCOME TO GOAT
INFO[0000] 1: COLLECTING EC2 INFO
INFO[0000] Establishing metadata client
INFO[0000] Retrieved metadata successfully               instance_id=i-000000000000
INFO[0000] Using metadata to initialize EC2 SDK client   az=eu-west-1a instance_id=i-000000000000 region=eu-west-1
INFO[0000] 2: COLLECTING EBS INFO
INFO[0000] Searching for EBS volumes
INFO[0000] Classifying EBS volumes based on tags
INFO[0000] 3: ATTACHING EBS VOLS
INFO[0000] Volume is unattached, picking drive name      vol_id=vol-000000000000000 vol_name=archive
INFO[0000] Executing AWS SDK attach command              vol_id=vol-000000000000000 vol_name=archive
INFO[0000] {
  AttachTime: 2019-08-11 10:52:23.463 +0000 UTC,
  Device: "/dev/xvdf",
  InstanceId: "i-000000000000",
  State: "attaching",
  VolumeId: "vol-000000000000000"
}  vol_id=vol-000000000000000 vol_name=archive
INFO[0014] 4: MOUNTING ATTACHED VOLS
INFO[0014] Label already exists, jumping to mount phase  vol_name=archive vols="[{vol-000000000000000 archive -1 1 /dev/xvdf /var/www/html ext4}]"
INFO[0014] Checking if something already mounted at %s/var/www/html  vol_name=archive vols="[{vol-000000000000000 archive -1 1 /dev/xvdf /var/www/html ext4}]"
INFO[0014] Appending fstab entry                         vol_name=archive vols="[{vol-000000000000000 archive -1 1 /dev/xvdf /var/www/html ext4}]"
INFO[0014] Now mounting                                  vol_name=archive vols="[{vol-000000000000000 archive -1 1 /dev/xvdf /var/www/html ext4}]"
FATA[0014] Couldn't mount: no such device                vol_name=archive vols="[{vol-000000000000000 archive -1 1 /dev/xvdf /var/www/html ext4}]"

The issue with nvme volumes is that the attachment info shows as /dev/xvdb, but presents as /dev/nvme*n1.

The volume ID is embedded in the serial for the device identification, so that might be the easiest way of determining it, but would require having the utility installed:

[root@ip-10-11-16-200 ~]# nvme list -o json
{
  "Devices" : [
    {
      "DevicePath" : "/dev/nvme0n1",
      "Firmware" : "1.0",
      "Index" : 0,
      "ModelNumber" : "Amazon Elastic Block Store",
      "ProductName" : "Non-Volatile memory controller: Amazon.com, Inc. Device 0x8061",
      "SerialNumber" : "vol000000000000000",
      "UsedBytes" : 0,
      "MaximiumLBA" : 33554432,
      "PhysicalSize" : 17179869184,
      "SectorSize" : 512
    },
  ]
}
[root@ip-10-11-16-200 ~]#

@bilco105
Copy link
Contributor Author

bilco105 commented Aug 11, 2019

Actually, looking at the code, there is already some nvme logic using go-ebsnvme - which seems to try and find the correct nvme device using the attachment devicename.

Potentially better sticking with that, as it doesn't require an external package (nvme-cli).

@bilco105
Copy link
Contributor Author

Just had a quick look and have a feeling that the issue is potentially here:

goat/ebs.go

Lines 52 to 53 in 81f1aa3

if filesystem.DoesDriveExist("/dev/disk/by-label/GOAT-" + volName) {
driveLogger.Info("Label already exists, jumping to mount phase")

Basically, the logic to determine the nvme device name is skipped over completely when the device already has a label, which will always be the case on the second mounting attempt

@sevagh
Copy link
Owner

sevagh commented Aug 11, 2019

Nice sleuthing - it looks like I should put a call to this function, GetActualBlockDeviceName, somewhere:

// GetLocalBlockDeviceName returns the actual name of the block device seen

Maybe when writing the fstab entry - but I'm having trouble discovering where the link between the label and nvme device name exists.

@sevagh
Copy link
Owner

sevagh commented Aug 11, 2019

I'm building a 1.0.3-rc1

@sevagh
Copy link
Owner

sevagh commented Aug 11, 2019

https://github.com/sevagh/goat/releases/tag/1.0.3-rc1

The gist is that I put the "GetActualBlockDeviceName" call early in the code to have the real nvme name in the volume struct rather than substituting it in the future.

@sevagh
Copy link
Owner

sevagh commented Aug 11, 2019

What's funny is that I think all of this workaround comes from me choosing /dev/xvd{b,c,d,e,f,g,...} as a random device name for both non-NVME and NVME devices. So this "rediscovery" of /dev/nvme01 from /dev/xvdb is correcting a mistaken initial assumption in my code. But it's a bit entrenched and for now I want your bug to be solved.

@bilco105
Copy link
Contributor Author

bilco105 commented Aug 11, 2019

Made some progress, it's now failing to mount, but using the correct device name in the log message:

FATA[0000] Couldn't mount: no such device                vol_name=media vols="[{vol-000000000 media -1 1 /dev/nvme4n1 /var/www/html ext4}]"

/dev/nvme4n1 does exist, so presumably it's still trying to mount the old assumed /dev/xvd* behind the log message

@sevagh
Copy link
Owner

sevagh commented Aug 11, 2019

If you run ls -l /dev/disk/by-label, perhaps the old LABEL still points to the bad name.

@bilco105
Copy link
Contributor Author

bilco105 commented Aug 11, 2019

Nope, they all point to the correct nvme devices. Bear in mind the labels were created some time ago

@sevagh
Copy link
Owner

sevagh commented Aug 11, 2019

I might have found a leftover place where I don't correct the AttachedName for nvme. Hold tight for rc2...

@sevagh
Copy link
Owner

sevagh commented Aug 11, 2019

https://github.com/sevagh/goat/releases/tag/1.0.3-rc2

I'm cutting rcs frequently since it's a safe way for us to share trusted binaries (vs me emailing you a binary).

@bilco105
Copy link
Contributor Author

Yeah, no worries.

Same error, although interestingly, it has now managed to mount two volumes before bombing out.

It's only put one of the above volumes into /etc/fstab as well

@bilco105
Copy link
Contributor Author

Also, as a side note, it bombs out if something is already mounted, even though potentially it was Goat that mounted it in the first place (i.e. on a second run).

Feels like it should just break out of the loop for that paticular volume, rather than exiting completely.

Admitadly I guess it being run multiple times (as opposed to just at boot) wasn't an originally intended use-case

@sevagh
Copy link
Owner

sevagh commented Aug 11, 2019

Idempotence is a good goal to have - it's strange for a tool to panic because of something it did a few minutes ago.

Unfortunately I don't use AWS (and haven't, since writing this tool) so I can't easily iterate over code to solve this issue.

@bilco105
Copy link
Contributor Author

I'll have a play locally and see if I can get it working, then submit a PR back

@bilco105
Copy link
Contributor Author

Submitted a PR for this. Not ideal, but it works.

FWIW, we're fairly heavy users of Goat, and an AWS consultancy. We'd be happy to take over maintenance if you're not using AWS at your current place.

@sevagh
Copy link
Owner

sevagh commented Aug 12, 2019

Thanks for PR - re: taking over, that's a good idea. What's the best way to denote the change? Perhaps a note in the description that https://github.com/steamhaus/goat is official?

descr

And I could archive this one.

@bilco105
Copy link
Contributor Author

Sure.

I'm pretty sure within the repo settings, there's a 'transfer' option. That then keeps a redirect in place from segagh/goat to steamhaus/goat.

I'd need to get rid of the fork first though

@bilco105
Copy link
Contributor Author

I've removed the fork, so feel free to do the transfer if you'd like

@sevagh
Copy link
Owner

sevagh commented Aug 12, 2019

I'd prefer not to do a transfer. Would you be OK with continuing main development on your fork? Like I said, I'll point to it from this repo.

@bilco105
Copy link
Contributor Author

Sure, can do that if you'd prefer. Will fork it again later this evening

@sevagh
Copy link
Owner

sevagh commented Aug 12, 2019

Thanks @bilco105 - I'll go ahead and archive goat for now. If you have any questions about the build process, etc. let me know (but make docker_build should build all the artifacts and put it in ./pkg).

@sevagh sevagh closed this as completed Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants