Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host USB devices changes when sample runs #2

Open
dmc5179 opened this issue Aug 12, 2019 · 1 comment
Open

Host USB devices changes when sample runs #2

dmc5179 opened this issue Aug 12, 2019 · 1 comment

Comments

@dmc5179
Copy link

dmc5179 commented Aug 12, 2019

I'm running your device plugin in OpenShift 3.11 which has kubernetes under the hood. I realize you might not have done any testing with OCP but figured you might be able to help. Here is the setup:

  • Physical host (with edge tpu usb device attached)
  • OCP cluster virtual machines

I am able to connect the TPU to the physical host and run the python demo code to show it works. I can even assign the USB device to a compute node VM and in the VM run the python demo code to show that the VM sees the device and can talk to it.

Before I do anything with the daemonset I see this on the physical host:

$ lsusb
Bus 002 Device 005: ID 1a6e:089a Global Unichip Corp. 

I then use your yaml to deploy the daemonset. One of the pods in the daemonset shows:

oc logs -f edgetpu-device-plugin-52sjt
I0812 16:17:10.264373       1 plugin.go:98] Started gRPC service on plugin socket
I0812 16:17:10.264399       1 plugin.go:101] Started monitoring devices
I0812 16:17:10.264404       1 plugin.go:49] gRPC server started.
I0812 16:17:10.264607       1 plugin.go:118] Opened connection to kubelet socket
I0812 16:17:10.268002       1 server.go:56] Start watching devices
I0812 16:17:10.268025       1 server.go:66] Update a device list
I0812 16:17:10.268092       1 plugin.go:132] Registered device plugin
I0812 16:17:15.369094       1 server.go:150] Edge TPU became active.
I0812 16:17:15.369137       1 server.go:66] Update a device list

So far that all looks good. I then deploy the sample with your yaml file and it comes back with:

oc logs -f edgetpu-demo-9cb92

ERROR: Failed to retrieve TPU context.
ERROR: Node number 0 (edgetpu-custom-op) failed to prepare.

Failed in Tensor allocation, status_code: 1

And then if I go back to the physical host:

lsusb
Bus 002 Device 006: ID 18d1:9302 Google Inc. 

It changed from 002:005 to 002:006. It is like the physical host thinks the USB device was disconnected and reconnected. I have see this before I started using your code where I'd run a container on the VM and it would fail, see the device changed on the host, readd device to VM, and run container on VM again...and it works.

Would you have any insight into why talking to the device or somehow assigning it to a container causes this name change? Thank you.

@dmc5179
Copy link
Author

dmc5179 commented Aug 12, 2019

Update: As long as the device doesn't flip on me then your device plugin works great and I'm able to run your sample code. What is really strange is the device doesn't just change bus IDs it changes vendor IDs and name. Its almost like the first time you talk to it udev gets more info and sees it as a new device....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant