Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crosstalk between WiFi clients. Multiple clients assigned same socket number. #176

Open
bobfromdallas opened this issue May 6, 2021 · 17 comments
Labels
topic: code Related to content of the project itself type: imperfection Perceived defect in any part of project

Comments

@bobfromdallas
Copy link

I observed crosstalk between clients. For example, a response to a local FTP command was seen on a client doing an HTTP GET.

This crosstalk occurred after a loss of WiFi and subsequent reconnection.

I have limited understanding of the WiFiNINA software and even less understanding of the associated firmware. In reviewing that code, it seemed that the observed crosstalk could happen if multiple WiFi clients were using the same socket to communicate with the WiFi module. The sketch below shows that duplicate assignment of sockets can occur.

#include "arduino_secrets.h"     // defines SECRET_SSID and SECRET_PASS

#include <Arduino_ConnectionHandler.h>
WiFiConnectionHandler conMan(SECRET_SSID, SECRET_PASS);

//  'printf' style macro
#define dbprintf(fmt, ...) _dbprintf(fmt, ##__VA_ARGS__)
#define DB_BUF_LEN 256
static char dbBufIn[DB_BUF_LEN] = "";
void _dbprintf(const char *fmt, ...) {
    va_list ap;
    va_start(ap, fmt);
    vsnprintf(dbBufIn, DB_BUF_LEN, fmt, ap);
    Serial.write(dbBufIn, strlen(dbBufIn));
    va_end(ap);
}

#define M2MS 60000  // minutes to ms
uint32_t eventTime[5] {1*M2MS, 2*M2MS, 3*M2MS, 4*M2MS, 5*M2MS};   // events at 1, 2, 3, 4, 5 minutes
bool eventDone[5] {false, false, false, false, false};

/////////////// WiFi users
WiFiClient gclient;
constexpr char gHost[] {"script.google.com"};
constexpr uint16_t gPort {443};

WiFiClient yclient;
constexpr char yHost[] {"yahoo.com"};
constexpr uint16_t yPort {80};
///////////////

void setup() {
    Serial.begin(115200);
    delay(2000);  //    routinely takes 1.5 s to begin Serial so just wait 2 s
    if (Serial) 
        dbprintf("\nSuccessfully started serial connection\n");
            
// part of Arduino_DebugUtils included by Arduino_ConnectionHandler
    setDebugMessageLevel(DBG_WARNING); // equals Debug.setDebugLevel
    Debug.timestampOn();

    conMan.addCallback(NetworkConnectionEvent::CONNECTED, onNetworkConnect);
    conMan.addCallback(NetworkConnectionEvent::DISCONNECTED, onNetworkDisconnect);
    conMan.addCallback(NetworkConnectionEvent::ERROR, onNetworkError);

    // start WiFi
    for (uint8_t i=0; i<5; i++) {
        delay(500);
        conMan.check();
    }

    // start the WiFi clients
    gclient.connect(gHost, gPort);
    dbprintf("\t\t\tbegan gclient with socket %u at %u ms\n", gclient._sock, millis());
    yclient.connect(yHost, yPort);
    dbprintf("\t\t\tbegan yclient with socket %u at %u ms\n", yclient._sock, millis());
 
    printSockets();
}  // setup()


void loop() {
    if (!eventDone[0] && millis() > eventTime[0]) {
        eventDone[0] = true;
        dbprintf("\tending WiFi to simulate loss of connection at %u ms\n", millis());
        // WiFi.end();
        WiFi.disconnect();
        delay(5000);
        printSockets();
    }
    if (!eventDone[1] && millis() > eventTime[1]) {
        eventDone[1] = true;
        dbprintf("\nreconnect (1) clients in reverse order\n");
        yclient.connect(yHost, yPort);
        dbprintf("\t\t\tbegan yclient with socket %u at %u ms\n", yclient._sock, millis());
        gclient.connect(gHost, gPort);
        dbprintf("\t\t\tbegan gclient with socket %u at %u ms\n", gclient._sock, millis());
        printSockets();
    }

    conMan.check();
} // loop


void printSockets() {
    dbprintf("gclient using socket %u at %6u. Status: %d.\n", gclient._sock, millis(), gclient.status());
    dbprintf("yclient using socket %u at %6u. Status: %d.\n", yclient._sock, millis(), yclient.status());
}


void onNetworkConnect() {
    Serial.println(">>>>> WiFi CONNECTED to network");
    RGB(0, 0, 8);
}

void onNetworkDisconnect() {
    Serial.println(">>>>> WiFi DISCONNECTED from network");
    Serial.print(">>>>> DISCONNECT reason code: ");
    Serial.println(WiFi.reasonCode());
    RGB(32, 16, 8);
}

void onNetworkError() {
    Serial.println(">>>>> WiFi ERROR");
    Serial.print(">>>>> ERROR reason code: ");
    Serial.println(WiFi.reasonCode());
    RGB(32, 0, 0);
}

void RGB(uint8_t r, uint8_t g, uint8_t b) {
    WiFiDrv::pinMode(25, OUTPUT);  //GREEN
    WiFiDrv::pinMode(26, OUTPUT);  //RED
    WiFiDrv::pinMode(27, OUTPUT);  //BLUE
    
    WiFiDrv::analogWrite(25, g);
    WiFiDrv::analogWrite(26, r);
    WiFiDrv::analogWrite(27, b);
}

This sketch was executed on a MKR WiFi 1010 using version 1.8.8 of the WiFiNINA library. To see what sockets were being assigned the "_sock" variable in WiFiClient.h was changed from private to public.

Here's an output from the sketch.

Successfully started serial connection
[ 3383 ] Current WiFi Firmware: 1.4.4
>>>>> WiFi CONNECTED to network
				began gclient with socket 0 at 7201 ms
				began yclient with socket 1 at 7575 ms
gclient using socket 0 at   7575. Status: 4.
yclient using socket 1 at   7575. Status: 4.
		ending WiFi to simulate loss of connection at 60001 ms
gclient using socket 0 at  65014. Status: 0.
yclient using socket 1 at  65015. Status: 0.
[ 65015 ] Connection to "Guest" lost.
[ 65015 ] Attempting reconnection
>>>>> WiFi DISCONNECTED from network
>>>>> DISCONNECT with reason code: 8
[ 66877 ] Current WiFi Firmware: 1.4.4
[ 69434 ] Connection to "Guest" failed
>>>>> WiFi CONNECTED to network

reconnect (1) clients in reverse order
				began yclient with socket 0 at 120251 ms
				began gclient with socket 0 at 120335 ms
gclient using socket 0 at 120335. Status: 4.
yclient using socket 0 at 120335. Status: 4.

Note that at the end both gclient and yclient are using socket 0.

It seems that the WiFiNINA library and the NINA firmware get out of sync on socket assignments.
A partial sequence of events:
  restarting WiFi clears all socket assignments;
  reconnecting 'yclient' assigns socket 0 to 'gclient'; and
  reconnecting 'gclient' results in duplicate assignment of socket 0 because
    WiFiClient::connect calls WiFiClient::stop because gclient._sock is 0 indicating that socket 0 is still assigned to 'gclient',
    WiFiClient::stop clears the assignment of socket 0, and
    socket 0 then gets assigned to 'gclient'.

Any suggestions on how to prevent this issue would be appreciated.

@aentinger aentinger added the type: imperfection Perceived defect in any part of project label May 7, 2021
@learnfromgirls
Copy link

learnfromgirls commented Dec 15, 2021

I confirm this crosstalk. I have debugged my problem with webserver and MQTT client running at same time. The fault is exactly that webserver is using client sockets that it does not own.

The scenario is that a socket is used for webserver client in available(), say sock=1.
This is then used by user app to deliver a web page and the sock 1 is then closed ( making it available for reuse).
A different service (mqtt) then connects and is allocated sock 1 which is correct and fine.
Unfortunately WiFiServer incorrectly remembered the last client sock in _lastsock even though it has no rights of ownership
and incorrectly reuses this _lastsoc=1 to make a Client . _lastsock should never be used to make a client! You cannot assume
that there are no other connections used by the app for different protocols or different remote IPs.
The result is that mqtt traffic is mixed with http traffic.

This section of code in WiFiClient::available() should be deleted, along with _lastsock member.

 // check previous received client socket
      if (_lastSock != NO_SOCKET_AVAIL) {
          WiFiClient client(_lastSock);

          if (client.connected() && client.available()) {
              sock = _lastSock;
          }
      }

If the desire was to reuse open sockets then some alternative correct arrangement should be devised.
Just deleting that code fixes the problem as the app closing the client after delivering the response works fine to recycle sockets.

I would also like to be able to close the listening server socket so please also provide an WiFiServer.end() method which closes the listening _sock.
Here was my impl but I have no idea if this is appropriate for a listening server socket.

void WiFiServer2::end() {
  if (_sock == NO_SOCKET_AVAIL)
    return;

  ServerDrv::stopClient(_sock);

  int count = 0;
  // wait maximum 5 secs for the connection to close
  while (status() != CLOSED && ++count < 50)
    delay(100);

  WiFiSocketBuffer.close(_sock);
  _sock = NO_SOCKET_AVAIL;

  //stop remembering the last client, hopefully someone else will close this guy.
  _lastSock = NO_SOCKET_AVAIL;
}

@ocrdu
Copy link

ocrdu commented Feb 18, 2022

Probably related to #87, also see https://github.com/ocrdu/WiFiNINA.

@LisaRowell
Copy link

I see this problem extensively in a MQTT broker I implemented on a MKR 1010. Any time a WiFiClient is kept open for an incoming connection past the next call to WiFiServer,available() there's a chance it gets returned as a new connection and you end up with two WiFiClients for the same socket. It's as if WiFiServer::available() was trying to serve as both an accept and a socket polling mechanism, but that only works if a single client were to be open as a time (as in the example code).

While the above suggested fix of deleting the whole _lastSock mechanism suggested by learnfromgirls seems spot on, and should be done, I was able to work around the issue with the following: Keeping a side table of open clients based on a tuple of remote IP addr and TCP port. When calling WiFiServer::available(), and getting a client back, check its addr and port against the side table. If there's a match, ignore the available result and loop back to servicing clients. If whatever client has the _lastsock socket is empty of data when the available call is done, you get a shot at getting an actual new client back. It's not a complete workaround though as continuous traffic to the _lastsock client could create starvation for new incoming connections, but it might help people get past this until a fix is made.

@JAndrassy
Copy link

JAndrassy commented Oct 19, 2023

@LisaRowell do you know server.accept()?
https://github.com/arduino-libraries/WiFiNINA/pull/257/files

I think your problem is that you don't understand how server.available() works and use it wrong.

@per1234 per1234 added the topic: code Related to content of the project itself label Oct 19, 2023
@LisaRowell
Copy link

I do know about server.accept() and would use it as a workaround if it were available.

If server.available() was designed to return both new and existing socks then it is doing it wrong and will starve socks. If it's designed to just return new socks, then it is also doing that wrong as it returns the same twice. Either way you're a close away from bad things and with the released code there isn't a clean way of implementing a server that accepts multiple persistent client connections.

@JAndrassy
Copy link

JAndrassy commented Oct 19, 2023

@LisaRowell With server.available() you have to let the server manage the clients and not store the returned objects. You can differentiate the returned clients by remoteIP and remotePort if you have to (even then you will not know about a reconnect from the same IP and port). The sockets are freed when the client disconnects or when you decide to stop() it in your sketch.
With a request-response protocol, handle the read of the request and write of the response in one go and then stop the client.

@LisaRowell
Copy link

LisaRowell commented Oct 22, 2023

I understand what you are saying, however, at for me it doesn't work that way. We might have different firmware, I'm running stock 1.5.0.

I instrument up WiFiServer to keep a history of what socks it gets back from ServerDrv::availServer() and added a dump route to both it and the WiFiClient. When at the protocol level I get a timeout (MQTT PINGREQ didn't arrive), I do a dump of the server and the socks it opened. What I'm seeing is that there is indeed data available for the client, but ServerDrv::availServer(), is returning 255 (NO_SOCKET_AVAIL). The client's status is 4 which is I believe an established, healthy TCP connection. If I read the data from the client it's a valid PINGREQ as expected so that looks good, it's just that available() isn't returning the client.

The thing I find found that looks odd to me is that SpiDrv::available() is returning false, this would cause ServerDrv::availServer() to return NO_SOCKET_AVAIL. I haven't looked at the firmware side of things yet, so I'm unfamiliar with how the NINA_GPIO0 (aka NINA_GPIOIRQ) pin, which SpiDrv::available() is reading, is being used. Doing a delay of 10 s and re-reading gets the same result.

Connected to WiFi network LunaMoth, IP Address 10.42.0.210
Connected to WiFi, starting MQTT server.
Established MQTT Connection from 10.42.0.1:57884
Established MQTT Connection from 10.42.0.1:57896
Established MQTT Connection from 10.42.0.1:57904
Keep alive time expired for Client 'lisa-toughbook-Dash4752'. Disconnecting...
Dumping WiFiServer with port = 1883
  _sock = 0; _lastsock = 3
  status = 1
  Socks opened:
    1: 10.42.0.1:57884; status = 4; available = 0
    2: 10.42.0.1:57896; status = 4; available = 2
    3: 10.42.0.1:57904; status = 4; available = 0
  ServerDrv::availServer returned 255
  SpiDrv::available returned 0
  Times last sock was returned = 111
  Times other than last sock was returned = 5
Delaying...
  SpiDrv::available returned 0

When I close the client, things get cleared up somehow and it's able to reopen and available will service it.

Any ideas what to look at from here?

Thanks.

@JAndrassy
Copy link

JAndrassy commented Oct 23, 2023

I try to replicate your problem with my PagerServer example. But it works without a problem. What is the difference?
https://github.com/arduino-libraries/WiFiNINA/pull/205/files

@LisaRowell
Copy link

LisaRowell commented Oct 24, 2023

I'm still trying to figure that what's significant and what's not. I'm potentially sending more data with multiple writes happening with no flush(). I also do an WiFiClient::available() to get the current amount of data available and read that in a chunk vs reading character by character. I have no idea if that's significant or not.

I've found that I only need two clients in my setup to cause a failure and if I comment out the line that sets _lastSock it will fail for one. I think this translates to meaning that I was getting the underlying condition (pin not set) before with one client, but I wasn't having an issue because the _lastSock mechanism was masking it.

I'm continuing to try and come up with a simple way of creating the issue.

@LisaRowell
Copy link

LisaRowell commented Oct 24, 2023

Hmmm if you insert calls to WiFiClient::connected() does it still work for you?

I'm looking at the code for that, and of course this is new to me code, but it looks like WiFiClient::connected() will call WiFiClient::available(), which would in turn call WiFiSocketBuffer::available(). WiFiSocketBuffer::available() doesn't just do a check of available data on the NINA though, it actually does a read to the local buffer. As far as the NINA is considered at that there's no data for the client as it's been read.

Since this was triggered by a WiFiClient::connected() call, by code has no idea about this transfer of data and continues to rely on WiFiServer::available() to check for a client with data. This won't work though as (ignoring _lastSock cases), it's going straight to the checking the pin from NINA for whether or not there's data, ignoring what the connected() call did under the hood buffering data.

I think, and you know this code way better than I do, that this is why WiFiClient::available() returns > 0 while SpiDrv::available() is false as there's data in WiFiSocketBufferClass::_buffers.

Does this make sense? Again appreciate your help on all this.

If I'm correct, should this be split to a different issue?

@LisaRowell
Copy link

I removed my check to WiFiClient::connected() and it definitely seems to get around the problem.

@JAndrassy
Copy link

JAndrassy commented Oct 24, 2023

but WiFiServer in the WiFiNINA library will not return any other client until there is buffered data in the _lastSock client.
how did _lastSock change in your case? or how did other than _lastSock client read data into the buffer?

@LisaRowell
Copy link

LisaRowell commented Oct 24, 2023

Until WiFiClient::connected() is called when there happens to be data on arriving on NINA, calls to WiFiServer::available() can change _lastSock, but if you call WifiClient::connected() with a sock != _lastSock and that sock has data on the NINA, you end up with data read into to the local buffer for the not lastSock sock and the pin goes into the not available state. From there WiFiServer::available() will only return _lastSock until something gives. A new connection or the stuck client sending more data will kick it out of the state, but you can't rely on that happening.

@JAndrassy
Copy link

WifiClient::connected() with a sock != _lastSock

how?

@LisaRowell
Copy link

LisaRowell commented Oct 25, 2023

Let me back up and try and explain it another way. In your simple example you had a server accepting client connections and taking their input and doing a wall type broadcast to send the input to all connected clients. A different type of application might have to send different data, perhaps a subset, to each client, so it maintains state on a client basis and sends out info periodically that's relevant for each client. Follow me?

Now, since the application is holding state for a client, it has to deal with when the client disconnects, cleaning up that state and ceasing to try to send it data. Because of how TCP works, and I can explain this in painstaking detail if you want but it's not super important, a disconnection happens without data being transmitted at the application level. This means that the application needs to somehow figure out if the connection is still alive. The mechanism to do that in this API is WiFiClient::connected(). So, as a service type function, a well former application holding state, would periodically loop through its clients and call connected to see if they're still there, cleaning up state as it goes. One of those of the clients will be _lastSock, the rest won't be.

The not obvious thing here is that WiFiClient::connected() will, under the hood, transfer any data that's sitting in the NINA up into the WiFiSocketBufferClass::_buffer on the main processor. This happens on a client, by client basis. When the transfer happens, the NINA firmware will say it doesn't have any available data, and set the pin accordingly.

Understand still? There are two places data can be, on the NINA and locally in WiFiSocketBufferClass. The pin only reflects the state of the former. That's the thing to get.

If you now look at WiFiServer::available(). It does two things, the first of which is to check to see if whatever sock is _lastSock has data available by calling WiFiClient::available(). This checks BOTH the WiFiSocketBufferClass AND the NINA. That's why that particular client gets serviced consistently.

The next thing WiFiServer::available() does (in the case of the above not finding data) is to call ServerDrv::availServer() which will check for the NINA saying it has either a new client or new data. If you look at the code, the first thing it does is look at that pin, if it's in the no data state, it's done. That's it. It never is going to look for data in the WiFiSocketBuffetClass where the above call to WiFiClient::connected() stashed away. This means that non-lastSock clients can have data can be available for them but, that will not be returned by the WiFiServer::available() mechanism.

It's a definite issue.

@JAndrassy
Copy link

sorry, but any WiFiClient returned by server.available() is invalid as soon as a new server.available() call is invoked.
the clients are manged by the WiFiServer object and you can't detect when a client disconnects.

you need server.accept()

@LisaRowell
Copy link

As I said, I already have a workaround for this and have long since moved on. I was taking the time to explain to you that hoping you could understand the issue and fix it. It is not a complex change.

As for server.accept() as being the answer, that commit is coming up on it's two year anniversary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: code Related to content of the project itself type: imperfection Perceived defect in any part of project
Projects
None yet
Development

No branches or pull requests

7 participants