Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grid Load Balancing Not Working #1673

Closed
alexkogon opened this issue Feb 19, 2016 · 22 comments
Closed

Grid Load Balancing Not Working #1673

alexkogon opened this issue Feb 19, 2016 · 22 comments

Comments

@alexkogon
Copy link

Hi,

I have long suspected that something in the load balancing for the Grid is not working, as I see it using the same nodes over and over. I did some investigation and indeed it seems to be the case. The code in:

selenium/java/server/src/org/openqa/grid/internal/ProxySet.java

says:

public List getSorted() {
List sorted = new ArrayList<>(proxies);
Collections.sort(sorted, proxyComparator);
return sorted;
}

private Comparator proxyComparator = new Comparator() {
@OverRide
public int compare(RemoteProxy o1, RemoteProxy o2) {
double p1used = (o1.getTotalUsed() * 1.0) / o1.getTestSlots().size();
double p2used = (o2.getTotalUsed() * 1.0) / o2.getTestSlots().size();

  if (p1used == p2used) return 0;
  return p1used < p2used? -1 : 1;
}

};

public TestSession getNewSession(Map<String, Object> desiredCapabilities) {
// sort the proxies first, by default by total number of
// test running, to avoid putting all the load of the first
// proxies.
List sorted = getSorted();
log.info("Available nodes: " + sorted);

for (RemoteProxy proxy : sorted) {
  TestSession session = proxy.getNewSession(desiredCapabilities);
  if (session != null) {
    return session;
  }
}
return null;

}

says it will sort them, but it doesn't seem to work, as you can see under "Actual Behavior" below it has 8 nodes but keeps using 4 over and over.

// sort the proxies first, by default by total number of
// test running, to avoid putting all the load of the first
// proxies.

I've already got a fix in place as you can see from the "Expected Behavior", I will submit it unless anyone has a better idea @dima-groupon

It just uses a linked list to make sure Nodes being used go to the back of the list

Expected Behavior -

19:58:44.397 INFO - Registered a node http://myhost:5552
19:58:44.946 INFO - Registered a node http://myhost:5553
19:58:45.659 INFO - Registered a node http://myhost:5554
19:58:46.346 INFO - Registered a node http://myhost:5555
19:58:46.996 INFO - Registered a node http://myhost:5556
19:58:47.487 INFO - Registered a node http://myhost:5557
19:58:47.984 INFO - Registered a node http://myhost:5558
19:58:48.792 INFO - Registered a node http://myhost:5551
19:59:16.451 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
19:59:16.452 INFO - Available nodes: [http://myhost:5551, http://myhost:5558, http://myhost:5557, http://myhost:5556, http://myhost:5555, http://myhost:5554, http://myhost:5553, http://myhost:5552]
19:59:16.452 INFO - Trying to create a new session on node http://myhost:5551
19:59:16.452 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
19:59:16.744 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
19:59:16.744 INFO - Available nodes: [http://myhost:5558, http://myhost:5557, http://myhost:5556, http://myhost:5555, http://myhost:5554, http://myhost:5553, http://myhost:5552, http://myhost:5551]
19:59:16.745 INFO - Trying to create a new session on node http://myhost:5558
19:59:16.745 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
19:59:16.872 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
19:59:16.873 INFO - Available nodes: [http://myhost:5557, http://myhost:5556, http://myhost:5555, http://myhost:5554, http://myhost:5553, http://myhost:5552, http://myhost:5551, http://myhost:5558]
19:59:16.873 INFO - Trying to create a new session on node http://myhost:5557
19:59:16.873 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
19:59:16.899 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
19:59:16.900 INFO - Available nodes: [http://myhost:5556, http://myhost:5555, http://myhost:5554, http://myhost:5553, http://myhost:5552, http://myhost:5551, http://myhost:5558, http://myhost:5557]
19:59:16.900 INFO - Trying to create a new session on node http://myhost:5556
19:59:16.901 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
19:59:37.672 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
19:59:37.673 INFO - Available nodes: [http://myhost:5555, http://myhost:5554, http://myhost:5553, http://myhost:5552, http://myhost:5551, http://myhost:5558, http://myhost:5557, http://myhost:5556]
19:59:37.673 INFO - Trying to create a new session on node http://myhost:5555
19:59:37.673 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
19:59:38.922 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
19:59:38.922 INFO - Available nodes: [http://myhost:5554, http://myhost:5553, http://myhost:5552, http://myhost:5551, http://myhost:5558, http://myhost:5557, http://myhost:5556, http://myhost:5555]
19:59:38.922 INFO - Trying to create a new session on node http://myhost:5554
19:59:38.922 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
19:59:39.053 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
19:59:39.054 INFO - Available nodes: [http://myhost:5553, http://myhost:5552, http://myhost:5551, http://myhost:5558, http://myhost:5557, http://myhost:5556, http://myhost:5555, http://myhost:5554]
19:59:39.054 INFO - Trying to create a new session on node http://myhost:5553
19:59:39.054 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
19:59:39.386 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
19:59:39.386 INFO - Available nodes: [http://myhost:5552, http://myhost:5551, http://myhost:5558, http://myhost:5557, http://myhost:5556, http://myhost:5555, http://myhost:5554, http://myhost:5553]
19:59:39.386 INFO - Trying to create a new session on node http://myhost:5552
19:59:39.387 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
20:01:16.613 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
20:01:16.614 INFO - Available nodes: [http://myhost:5551, http://myhost:5558, http://myhost:5557, http://myhost:5556, http://myhost:5555, http://myhost:5554, http://myhost:5553, http://myhost:5552]
20:01:16.614 INFO - Trying to create a new session on node http://myhost:5551
20:01:16.614 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
20:01:16.755 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
20:01:16.755 INFO - Available nodes: [http://myhost:5558, http://myhost:5557, http://myhost:5556, http://myhost:5555, http://myhost:5554, http://myhost:5553, http://myhost:5552, http://myhost:5551]
20:01:16.755 INFO - Trying to create a new session on node http://myhost:5558
20:01:16.756 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
20:01:16.790 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
20:01:16.790 INFO - Available nodes: [http://myhost:5557, http://myhost:5556, http://myhost:5555, http://myhost:5554, http://myhost:5553, http://myhost:5552, http://myhost:5551, http://myhost:5558]
20:01:16.790 INFO - Trying to create a new session on node http://myhost:5557
20:01:16.790 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
20:01:16.865 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
20:01:16.866 INFO - Available nodes: [http://myhost:5556, http://myhost:5555, http://myhost:5554, http://myhost:5553, http://myhost:5552, http://myhost:5551, http://myhost:5558, http://myhost:5557]
20:01:16.866 INFO - Trying to create a new session on node http://myhost:5556
20:01:16.866 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
20:01:38.037 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
20:01:38.037 INFO - Available nodes: [http://myhost:5555, http://myhost:5554, http://myhost:5553, http://myhost:5552, http://myhost:5551, http://myhost:5558, http://myhost:5557, http://myhost:5556]
20:01:38.037 INFO - Trying to create a new session on node http://myhost:5555
20:01:38.038 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
20:01:38.400 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
20:01:38.400 INFO - Available nodes: [http://myhost:5554, http://myhost:5553, http://myhost:5552, http://myhost:5551, http://myhost:5558, http://myhost:5557, http://myhost:5556, http://myhost:5555]
20:01:38.401 INFO - Trying to create a new session on node http://myhost:5554
20:01:38.401 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
20:01:38.503 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
20:01:38.503 INFO - Available nodes: [http://myhost:5553, http://myhost:5552, http://myhost:5551, http://myhost:5558, http://myhost:5557, http://myhost:5556, http://myhost:5555, http://myhost:5554]
20:01:38.504 INFO - Trying to create a new session on node http://myhost:5553
20:01:38.504 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
20:01:38.584 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
20:01:38.585 INFO - Available nodes: [http://myhost:5552, http://myhost:5551, http://myhost:5558, http://myhost:5557, http://myhost:5556, http://myhost:5555, http://myhost:5554, http://myhost:5553]
20:01:38.585 INFO - Trying to create a new session on node http://myhost:5552
20:01:38.585 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}

Actual Behavior -

18:48:13.143 INFO - Registered a node http://myhost:5551
18:48:13.688 INFO - Registered a node http://myhost:5552
18:48:14.319 INFO - Registered a node http://myhost:5553
18:48:15.025 INFO - Registered a node http://myhost:5554
18:48:15.692 INFO - Registered a node http://myhost:5555
18:48:16.345 INFO - Registered a node http://myhost:5556
18:48:16.821 INFO - Registered a node http://myhost:5557
18:48:17.339 INFO - Registered a node http://myhost:5558
18:48:50.905 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:48:50.906 INFO - Available nodes: [http://myhost:5551, http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558]
18:48:50.906 INFO - Trying to create a new session on node http://myhost:5551
18:48:50.906 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:48:51.171 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:48:51.171 INFO - Available nodes: [http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551]
18:48:51.172 INFO - Trying to create a new session on node http://myhost:5552
18:48:51.172 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:48:51.202 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:48:51.203 INFO - Available nodes: [http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552]
18:48:51.203 INFO - Trying to create a new session on node http://myhost:5553
18:48:51.203 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:48:51.350 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:48:51.350 INFO - Available nodes: [http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552, http://myhost:5553]
18:48:51.350 INFO - Trying to create a new session on node http://myhost:5554
18:48:51.351 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:49:12.802 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:49:12.803 INFO - Available nodes: [http://myhost:5551, http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558]
18:49:12.803 INFO - Trying to create a new session on node http://myhost:5551
18:49:12.803 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:49:13.486 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:49:13.487 INFO - Available nodes: [http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551]
18:49:13.487 INFO - Trying to create a new session on node http://myhost:5552
18:49:13.487 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:49:13.594 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:49:13.595 INFO - Available nodes: [http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552]
18:49:13.595 INFO - Trying to create a new session on node http://myhost:5553
18:49:13.595 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:49:17.307 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:49:17.307 INFO - Available nodes: [http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552, http://myhost:5553]
18:49:17.307 INFO - Trying to create a new session on node http://myhost:5554
18:49:17.307 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:49:39.365 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:49:39.365 INFO - Available nodes: [http://myhost:5551, http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558]
18:49:39.365 INFO - Trying to create a new session on node http://myhost:5551
18:49:39.366 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:49:39.937 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:49:39.938 INFO - Available nodes: [http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551]
18:49:39.938 INFO - Trying to create a new session on node http://myhost:5552
18:49:39.938 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:49:39.970 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:49:39.972 INFO - Available nodes: [http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552]
18:49:39.972 INFO - Trying to create a new session on node http://myhost:5553
18:49:39.972 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:49:39.982 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:49:39.982 INFO - Available nodes: [http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552, http://myhost:5553]
18:49:39.982 INFO - Trying to create a new session on node http://myhost:5554
18:49:39.983 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:50:01.086 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:50:01.087 INFO - Available nodes: [http://myhost:5551, http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558]
18:50:01.087 INFO - Trying to create a new session on node http://myhost:5551
18:50:01.087 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:50:01.962 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:50:01.963 INFO - Available nodes: [http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551]
18:50:01.963 INFO - Trying to create a new session on node http://myhost:5552
18:50:01.963 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:50:01.969 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:50:01.970 INFO - Available nodes: [http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552]
18:50:01.970 INFO - Trying to create a new session on node http://myhost:5553
18:50:01.970 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:50:02.082 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:50:02.083 INFO - Available nodes: [http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552, http://myhost:5553]
18:50:02.083 INFO - Trying to create a new session on node http://myhost:5554
18:50:02.083 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:51:16.983 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:51:16.983 INFO - Available nodes: [http://myhost:5551, http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558]
18:51:16.983 INFO - Trying to create a new session on node http://myhost:5551
18:51:16.984 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:51:17.028 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:51:17.029 INFO - Available nodes: [http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551]
18:51:17.029 INFO - Trying to create a new session on node http://myhost:5552
18:51:17.029 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:51:17.037 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:51:17.037 INFO - Available nodes: [http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552]
18:51:17.037 INFO - Trying to create a new session on node http://myhost:5553
18:51:17.038 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:51:17.111 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:51:17.112 INFO - Available nodes: [http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552, http://myhost:5553]
18:51:17.112 INFO - Trying to create a new session on node http://myhost:5554
18:51:17.112 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:51:38.576 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:51:38.577 INFO - Available nodes: [http://myhost:5551, http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558]
18:51:38.577 INFO - Trying to create a new session on node http://myhost:5551
18:51:38.577 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:51:38.666 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:51:38.667 INFO - Available nodes: [http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551]
18:51:38.667 INFO - Trying to create a new session on node http://myhost:5552
18:51:38.667 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:51:39.137 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:51:39.137 INFO - Available nodes: [http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552]
18:51:39.137 INFO - Trying to create a new session on node http://myhost:5553
18:51:39.137 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}
18:51:39.208 INFO - Got a request to create a new session: Capabilities [{browserName=firefox, version=, platform=ANY}]
18:51:39.209 INFO - Available nodes: [http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552, http://myhost:5553]
18:51:39.209 INFO - Trying to create a new session on node http://myhost:5554
18:51:39.209 INFO - Trying to create a new session on test slot {seleniumProtocol=WebDriver, browserName=firefox, maxInstances=1, platform=LINUX}

Steps to reproduce -

Create a hub
Connect 8 nodes to it
Run a bunch of tests 4 at a time

@lukeis
Copy link
Member

lukeis commented Feb 22, 2016

I just created a test that starts up a grid with 8 nodes, 3 test slots each of chrome. The test then starts up 8 sessions, checks to make sure all 8 nodes have 1 session. Starts up 8 more sessions, checks they then all have 2 sessions. Starts up one more session and makes sure only one node has all 3 test slots used.

670c42f

It would appear this is functioning correctly. Can you please provide a complete test case to show it not working?

@alexkogon
Copy link
Author

I have eight nodes on my grid, and run four tests at a time continuously. In the current implementation, the first four nodes are used over and over and over and the other four are never used. In the fix I made all eight nodes are used. You can see this in the output I posted above.

@lukeis
Copy link
Member

lukeis commented Feb 22, 2016

if you're only using 4 sessions at a time, this is expected behavior. If you want to change the default behavior to also sort nodes by least recently used then we can do that, but I would not strip away the code that sorts the nodes by current load.

Also you could write your own scheduler (and override the existing)

@dima-groupon
Copy link
Contributor

I tried to fix this by spreading the load as much as possible... that is if
you have 2 nodes with 1 test per node, we try to attach to the 3rd then 4th
node etc... I'm not sure if that got reverted out

On Mon, Feb 22, 2016 at 5:08 PM, Luke Inman-Semerau <
notifications@github.com> wrote:

if you're only using 4 sessions at a time, this is expected behavior. If
you want to change the default behavior to also sort nodes by least
recently used then we can do that, but I would not strip away the code that
sorts the nodes by current load.

Also you could write your own scheduler (and override the existing)


Reply to this email directly or view it on GitHub
#1673 (comment)
.

Dima Kovalenko
(540)435-6112
@dimacus

@alexkogon
Copy link
Author

there is something wrong with your implementation @dima-groupon dima, it doesn't use all the nodes for some reason though it looks like it should, there is obviously something happening with the sorting algorithm (the comparator) that is causing the list to not use some of the nodes. instead of figuring it out and fixing it i just put in something simple with a queue, where the nodes are put at the end of the queue when they are used.

@lukeis it would probably be better to lock the whole iteration in the synchronized block it is true, because the for(:) loop may have a problem if the queue is modified while another thread is using it, and also the current code may have a very unlikely race condition in the proxy accept DesiredCapabilities method as if that method is being executed by two threads at the same time it may have a problem but I don't know...this would be fixed by locking it, I will do so. if you don't want the code that is fine with me it works for what i need :)

@freynaud34
Copy link

The load balancer functionality works as expected. It sorts by nb of test currently running and ignores what happened in the past.

However the code was originally planned for being extended and it's not possible at the moment because of the
double p1used = (o1.getTotalUsed() * 1.0) / o1.getTestSlots().size(); logic.

It shouldn't be sorted by getTotalUsed / size, but use the getUsedResource from https://github.com/SeleniumHQ/selenium/blob/master/java/server/src/org/openqa/grid/internal/RemoteProxy.java#L193
That method is public and meant to be extended by the custom proxies.
The detault getUsedResource matches the current sort so that fix will not change the expected behavior but will allow extensions.

@alexkogon
Copy link
Author

Hi Francois,

I'm not sure if it worked before but in its current state it was not working for us, reusing the same nodes over, and over, and over and never using many of them. Going for a quick solution I replaced the sorted list with a queue, which is probably more efficient than sorting the list every time a sessions is requested (especially as the number of nodes grows large) and is certainly quite easy to implement.

In general iterating through every node until we find one that matches the DesiredCapabilities may also not be extremely efficient, probably some kind of map that would return the list of nodes which match the Capabilities requested would be more efficient, but the code seems to work for the most part, so no need to over engineer it (and that map could be somewhat complicated), we just are having the problem that it is not spreading the load across the grid so I came up with this simple solution.

All of this is in response to the fact that our nodes (from the Docker Selenium project here) are getting stuck sometimes for no apparent reason running some of our tests; the tests run fine for a while then some of them just sit there doing nothing. It only happens to one of our groups and I need to figure out what is causing it (I ran a simulation just opening browsers, loading Google, and closing the browser for a few days and it never hung, but their test runs can hang quite quickly), but when I noticed the load balancing wasn't working I fixed that first. Unfortunately using the build system seems to be insanely inefficient; just changing one line of Java seems to require a full recompile of all the code to test, which takes nearly an hour on the machine I was using to do this...

@freynaud34
Copy link

I would still recommend to fix the underlying cause, and use the getResource used instead of hacking for your specific environment.
Every single company uses grid differently, so the default is normally a naive implementation, and people can tailor their grid extending proxies.
I would like to avoid putting to much logic in the default implementation.

@alexkogon
Copy link
Author

To be clear, here is the existing behavior; as you can see, it looks like it works until we get past node #4, when suddenly it goes back to the beginning, with 5551, 5552, 5553, and 5554 being used over and over, and 5555, 5556, 5557, and 5558 never coming to the front of the list. That tells me there is a bug with the sorting algorithm.

18:48:50.906 INFO - Available nodes: [http://myhost:5551, http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558]
18:48:51.171 INFO - Available nodes: [http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551]
18:48:51.203 INFO - Available nodes: [http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552]
18:48:51.350 INFO - Available nodes: [http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552, http://myhost:5553]
18:49:12.803 INFO - Available nodes: [http://myhost:5551, http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558]
18:49:13.487 INFO - Available nodes: [http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551]
18:49:13.595 INFO - Available nodes: [http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552]
18:49:17.307 INFO - Available nodes: [http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552, http://myhost:5553]
18:49:39.365 INFO - Available nodes: [http://myhost:5551, http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558]
18:49:39.938 INFO - Available nodes: [http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551]
18:49:39.972 INFO - Available nodes: [http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552]
18:49:39.982 INFO - Available nodes: [http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552, http://myhost:5553]
18:50:01.087 INFO - Available nodes: [http://myhost:5551, http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558]
18:50:01.963 INFO - Available nodes: [http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551]
18:50:01.970 INFO - Available nodes: [http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552]
18:50:02.083 INFO - Available nodes: [http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552, http://myhost:5553]
18:51:16.983 INFO - Available nodes: [http://myhost:5551, http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558]
18:51:17.029 INFO - Available nodes: [http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551]
18:51:17.037 INFO - Available nodes: [http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552]
18:51:17.112 INFO - Available nodes: [http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552, http://myhost:5553]
18:51:38.577 INFO - Available nodes: [http://myhost:5551, http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558]
18:51:38.667 INFO - Available nodes: [http://myhost:5552, http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551]
18:51:39.137 INFO - Available nodes: [http://myhost:5553, http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552]
18:51:39.209 INFO - Available nodes: [http://myhost:5554, http://myhost:5555, http://myhost:5556, http://myhost:5557, http://myhost:5558, http://myhost:5551, http://myhost:5552, http://myhost:5553]

@alexkogon
Copy link
Author

I don't believe I am hacking for my specific environment--I am replacing a fancy solution which sorts the entire list of nodes using a comparator for every request to find the least used one (which is O(n log n) I believe, and a lot of sorting if the number of nodes is big, but is probably not a big deal), and replacing it with a really stupid one that puts the nodes into a queue, finds the first node that has the capabilities, and then puts it at the end of the queue (O(n)). I was merely trying to get something done quickly, which fixes the problem shown above where many of the nodes are never used, and didn't want to get into debugging why the sort was not working.

The current solution would be fine with me if it actually worked (I'm not coding for fun here, especially with the time it takes to build), but my stupid queue solution works and I'm not exactly sure what the getTotalUsed() @dima-groupon Dima implemented does:

public int compare(RemoteProxy o1, RemoteProxy o2) {
double p1used = (o1.getTotalUsed() * 1.0) / o1.getTestSlots().size();
double p2used = (o2.getTotalUsed() * 1.0) / o2.getTestSlots().size();

if (p1used == p2used) return 0;
return p1used < p2used? -1 : 1;
}
}

but it is clearly not working. I don't know why as in the case above, after four test runs getTotalUsed() should be 0 for nodes 5 to 8 and >0 for nodes 1 to 4, so 5 to 8 should be sorted first in the list, but they never are.

I could fix this but using the queue was an easier solution and I'm not quite convinced the other solution is better even if it did work. However if it did work I wouldn't be bothering to fix it :)

@barancev
Copy link
Member

Why is the suggested solution better that the existing one?
(sorting by last use time vs sorting by the current load)
What problem is it intended to solve?

@alexkogon
Copy link
Author

The problem, as I have clearly indicated twice now, is that the current solution does not work

The output above is from using a grid with 8 nodes running 4 tests at a time. The first four nodes are used over, and over, and over, ad infinitem, as the logs pasted above clearly show.

@alexkogon
Copy link
Author

@freynaud34 if you are going to invest time in fixing something I'd vote for whatever is causing the nodes to hang :) I'm not sure if that is a Docker Selenium problem or if it is still part of the node...

@alexkogon
Copy link
Author

Reviewing Dima's code again, it looks like this looks for current usage:

  double p1used = (o1.getTotalUsed() * 1.0) / o1.getTestSlots().size();

/**

  • Returns the total number of test slots used on this node.
    *

  • @return the total number of test slots in use.
    */
    int getTotalUsed();

    /**

  • Each test running on the node will occupy a test slot. A test slot can either be in use (have a session) or be

  • available for scheduling (no associated session). This method allows retrieving the total state of the node,

  • both test slots in use and those unused.
    *

  • @return the test slots.
    */
    List getTestSlots();

and pays no attention to total usage. This would indeed be better replaced probably with Francois' suggestion:

/**

  • Return how much resources are currently used on the proxy. Default implementation is runningTests / maxTests
  • on the proxy. For a proxy with more knowledge about its resources, a finer implementation can also take into
  • account CPU usage, RAM usage etc.
  • @return the percentage of the available resource used. Can be greater than 100 if the grid is under heavy load.
    */
    float getResourceUsageInPercent();

which would apparently give a more accurate measurement of current load on the Node at the time, and not require Integer to Double math.

I guess the question here is more philosophical--should the Grid use all of its nodes, or always reuse the same ones over and over if they are not running jobs? The current code simply measures how many jobs are running on the Node at the current time and picks it based on that. If there is a node that is early in the list (based on registry), it will always be used if nothing is running on it, and if there is a node late in the list it will never be used unless all of the other nodes are being used. This concentrates usage in a certain number of the nodes.

This would theoretically not be a problem, but due to the fact that there is some kind of bug in the Node software which causes it to lock up over usage (or at least that has always been my experience), having a balancer which distributes usage more evenly across all the nodes in the grid is beneficial.

It would seem to me to make sense that we try to balance the load across all the nodes, which the current implementation does not do, but apparently that is by design. I would vote that we improve the algorithm to distribute load across all the nodes in the grid based on their total usage.

@barancev
Copy link
Member

Please define "does not work".

@alexkogon
Copy link
Author

"Does not work" being defined as does not spread the load across all the Nodes in the Grid. I (and everyone I've ever worked with) assumed that would be part of the design, but apparently it is not.

Upon reviewing the code now, it seems it is by design perfectly happy to put most of the load on a few Nodes in the Grid. I personally would expect that you would want to distribute the load across the Nodes, but apparently not. Were it not for the fact that using the nodes eventually causes them to hang this would be less of a problem, but in the current reality I think it is less than desirable.

@alexkogon
Copy link
Author

Anyhow I will just use my version for now and concentrate on fixing the other problem that causes the Nodes to hang. I would still think that a solution that distributes usage across the nodes would be better than the current solution.

@barancev
Copy link
Member

I'd say we have to find and eliminate the cause of the hangs. Or, detect "cursed" nodes and "heal" them. No load balancing schema resolve hangs. May be it reduces probability to pick up a hanging node, but it is still the same problem.

@alexkogon
Copy link
Author

I agree entirely, however I do think it makes sense that the Hub distributes the load around the Nodes it shouldn't matter if using the Nodes doesn't degrade them. For the moment distributing the load helps as using the nodes makes them stop working.

@alexkogon
Copy link
Author

@barancev 's idea about detecting dead nodes seems to make a lot of sense. Healing them would be best, but minimally the Grid should know when a node stops working and kick it out of the usage I would think.

@lukeis lukeis closed this as completed in a74cfe8 Feb 24, 2016
@lukeis
Copy link
Member

lukeis commented Feb 24, 2016

with this change the nodes are chosen by looking at both current load and by least recently used (priority given to current node utilization).

I agree that we should look at kicking out sick nodes (something that's on my backlog at work too), but that is essentially a separate issue than this as originally reported

@alexkogon
Copy link
Author

yes that is what i was looking to do--make sure the same nodes aren't used over and over when other nodes just sit there idle. to me that would fall under load balancing.

@lock lock bot locked and limited conversation to collaborators Aug 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants