Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: opening https://dashboard.nm.[IP].nip.io/enrollment-keys takes 30 sec #3026

Closed
1 task done
cryply opened this issue Aug 3, 2024 · 22 comments
Closed
1 task done
Assignees
Labels
bug Something isn't working

Comments

@cryply
Copy link

cryply commented Aug 3, 2024

Contact Details

No response

What happened?

Recently I have found that EnrollmentKeys generation page taking very long to load. About 30sec. It is bearable but annoying.
Click on createkey also takes a minute.
Same 30 sec during login. Also OK. Worst is when u join host with netclient.
There timeout exceeded and I simply cannot join client to netmaker. Has to use wireguard client.

image

image

OS: Ubuntu 22.04

Version

v0.23.0

What OS are you using?

Linux

Relevant log output

172.20.0.1 - - [03/Aug/2024:09:01:23 +0000] "GET /enrollment-keys HTTP/1.1" 304 0 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0" "172.20.0.1"
172.20.0.1 - - [03/Aug/2024:09:01:23 +0000] "GET /loader.svg HTTP/1.1" 200 593 "https://dashboard.nm.MYIP.nip.io/enrollment-keys" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0" "172.20.0.1"
172.20.0.1 - - [03/Aug/2024:09:06:19 +0000] "GET /enrollment-keys HTTP/1.1" 304 0 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0" "172.20.0.1"
172.20.0.1 - - [03/Aug/2024:09:06:20 +0000] "GET /loader.svg HTTP/1.1" 200 593 "https://dashboard.nm.MYIP.nip.io/enrollment-keys" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0" "172.20.0.1"
172.20.0.1 - - [03/Aug/2024:09:06:20 +0000] "GET /nmui-config.js HTTP/1.1" 200 156 "https://dashboard.nm.MYIP.nip.io/enrollment-keys" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0" "172.20.0.1"
172.20.0.1 - - [03/Aug/2024:09:12:16 +0000] "GET /enrollment-keys HTTP/1.1" 304 0 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0" "172.20.0.1"
172.20.0.1 - - [03/Aug/2024:12:23:35 +0000] "GET /enrollment-keys HTTP/1.1" 304 0 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0" "172.20.0.1"
172.20.0.1 - - [03/Aug/2024:12:23:35 +0000] "GET /loader.svg HTTP/1.1" 200 593 "https://dashboard.nm.MYIP.nip.io/enrollment-keys" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0" "172.20.0.1"
172.20.0.1 - - [03/Aug/2024:12:23:35 +0000] "GET /nmui-config.js HTTP/1.1" 200 156 "https://dashboard.nm.MYIP.nip.io/enrollment-keys" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0" "172.20.0.1"
172.20.0.1 - - [03/Aug/2024:12:23:35 +0000] "GET /login HTTP/1.1" 200 1233 "https://dashboard.nm.MYIP.nip.io/login" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0" "172.20.0.1

in netmaker docker logs:

{"time":"2024-08-03T12:38:52.641384513Z","level":"ERROR","source":"handlers.go 95}","msg":"failed to decrypt message for host","id":"XXXXXXX-XXXX-XXXX-XXXXXXXXX","error":"could not decrypt message, [

Contributing guidelines

  • Yes, I did.
@cryply cryply added the bug Something isn't working label Aug 3, 2024
@yabinma
Copy link
Collaborator

yabinma commented Aug 3, 2024

The decrypt message issue was reported in #2686. There was changes in v0.24.3. The issue should not be in server side in latest version.

Regarding the join slow issue, what's the server size in hardware and how many hosts totally in the network?

@cryply
Copy link
Author

cryply commented Aug 4, 2024

Thanks Max Ma

  1. I will try to upgrade server to v0.24.3. Is it 100% necessary to upgrade netclients?
  2. Server is 12 Core , 64GB RAM physical server. 100% idle.
  3. We have about 80 hosts.

Problem started AFAIK after I have tinkered with docker-compose.yml

and changed :
image: coredns/coredns -> image: coredns/coredns:1.10.1
It did actually started CoreDNS but I am still not sure how clients can access it.
also I commented all turn related stuff. and restart.

@cryply
Copy link
Author

cryply commented Aug 4, 2024

Regarding join

docker compose logs -f

netmaker       | [netmaker] 2024-08-04 14:25:11 received registration attempt with token eXXXXXXXXXXX9 
netmaker       | [netmaker] 2024-08-04 14:25:51 user1 xxxxxxxx-cccccc-zzzzz-yyyyy-xxxxxxxx registered with Netmaker 
netmaker       | [netmaker] 2024-08-04 14:25:51 added new node 11111-22222-33333-44444-555555555 to host user1 

but on client side I get:

sudo netclient join -t eXXXXXXXXXXXXXXXX9
setting host fields
[netclient] 2024-08-04 16:25:41 registration failed Post "https://api.nm.MUIP.nip.io/api/v1/host/register/
eXXXXXXXXXXXXXXXXXXXXXXX9": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

@cryply
Copy link
Author

cryply commented Aug 4, 2024

lol. I have stopped CoreDNS and managed to join client!
I also monitored process on join in htop. CoreDNS skyrocket for 100% CPU for few sec.

@cryply
Copy link
Author

cryply commented Aug 5, 2024

Unable to join again

[netclient] 2024-08-05 11:12:21 registration failed Post "https://api.nm.PYIP.nip.io/api/v1/host/register/ETOKEYXXX": context deadline exceeded (Client.Timeout exceeded while awaiting headers) - OMG this error is back. No matter if CoreDNS up/down.

@yabinma
Copy link
Collaborator

yabinma commented Aug 5, 2024

Unable to join again

[netclient] 2024-08-05 11:12:21 registration failed Post "https://api.nm.PYIP.nip.io/api/v1/host/register/ETOKEYXXX": context deadline exceeded (Client.Timeout exceeded while awaiting headers) - OMG this error is back. No matter if CoreDNS up/down.

Is there any error from server side?
Even without any error, what's the log when the server received the registration request?

@cryply
Copy link
Author

cryply commented Aug 5, 2024

On server side it is working fine(just taking long).
client added but just 15 seconds after popup error on client side.

@yabinma
Copy link
Collaborator

yabinma commented Aug 5, 2024

On server side it is working fine(just taking long). client added but just 15 seconds after popup error on client side.

How long does it take in server side? From the registration is received to node added.
What's the network range when adding a new network?

@cryply
Copy link
Author

cryply commented Aug 5, 2024

netmaker | [netmaker] 2024-08-05 12:44:55 received registration attempt with token eToken
netmaker | [netmaker] 2024-08-05 12:45:35 hostX xxxxxxx-rrrrr-gggg-ffff-zzzzzzz registered with Netmaker
netmaker | [netmaker] 2024-08-05 12:45:35 added new node yyyyyyy-wwwww-qqqq-hhhhh-zzzzzzz to host hostX

network range /16

@yabinma
Copy link
Collaborator

yabinma commented Aug 5, 2024

netmaker | [netmaker] 2024-08-05 12:44:55 received registration attempt with token eToken netmaker | [netmaker] 2024-08-05 12:45:35 hostX xxxxxxx-rrrrr-gggg-ffff-zzzzzzz registered with Netmaker netmaker | [netmaker] 2024-08-05 12:45:35 added new node yyyyyyy-wwwww-qqqq-hhhhh-zzzzzzz to host hostX

network range /16

There is no timeout in default settings. The client side should be waiting for the response from sever side even it took 40 sec as shown in the log.
Is there any timeout setting in network?

@cryply
Copy link
Author

cryply commented Aug 5, 2024

time sudo netclient join -t eXXXXXX
setting host fields
[netclient] 2024-08-05 13:38:20 registration failed Post "https://api.nm.IP.nip.io/api/v1/host/register/eXXXXXXXXXXX" : context deadline exceeded (Client.Timeout exceeded while awaiting headers)

real 0m30.110s
user 0m0.009s
sys 0m0.000s

@cryply
Copy link
Author

cryply commented Aug 5, 2024

It was working for 1 year without issues.

@cryply
Copy link
Author

cryply commented Aug 5, 2024

30 seconds in client

image

@cryply
Copy link
Author

cryply commented Aug 5, 2024

It seems it comes from devilcoves httpclient

https://github.com/devilcove/httpclient/blob/master/httpclient.go

func init() { Client = http.Client{ Timeout: 30 * time.Second, } }

@yabinma
Copy link
Collaborator

yabinma commented Aug 6, 2024

It seems it comes from devilcoves httpclient

https://github.com/devilcove/httpclient/blob/master/httpclient.go

func init() { Client = http.Client{ Timeout: 30 * time.Second, } }

This should be the right timeout setting for restful api.

Revisited the log and found abnormal,
netmaker | [netmaker] 2024-08-05 12:44:55 received registration attempt with token eToken
netmaker | [netmaker] 2024-08-05 12:45:35 hostX xxxxxxx-rrrrr-gggg-ffff-zzzzzzz registered with Netmaker

The host registration took more than 30s already(before the htttp response). I did not meet the issue before.

Can you please change the server log verbosity to 4 and re-produce the issue and let's see if there are more clues found?

@cryply
Copy link
Author

cryply commented Aug 6, 2024

Can you please change the server log verbosity to 4 and re-produce the issue and let's see if there are more clues found?
How?

@yabinma
Copy link
Collaborator

yabinma commented Aug 6, 2024

Can you please change the server log verbosity to 4 and re-produce the issue and let's see if there are more clues found?
How?

Edit netmaker.env in install directory and change to VERBOSITY=4, and then restart the containers.

@cryply
Copy link
Author

cryply commented Aug 6, 2024

Now , after restarting , its registering client in 20 sec. Will monitor further

@cryply
Copy link
Author

cryply commented Aug 6, 2024

Creating Enrollment Token faster now. Takes only 15 sec not 1 min as before.
In netmaker logs I can see :

netmaker       | [netmaker] 2024-08-06 09:17:47 [hosts.go-73] getHosts(): netadmin fetched all hosts                                                                                                                                                                                                                           
netmaker       | [netmaker] 2024-08-06 09:17:47 [node.go-324] getAllNodes(): netadmin fetched all nodes they have access to                                                                                                                                                                                                    
netmaker       | {"time":"2024-08-06T09:17:47.273107578Z","level":"INFO","source":"handlers.go 103}","msg":"recieved host update","name":"myuser","id":"XXXX"}                                                                                                                     
netmaker       | {"time":"2024-08-06T09:17:47.292303329Z","level":"INFO","source":"handlers.go 304}","msg":"check-in processed for host","name":"myuser","id":"xxxxxxx"}                                                                                                              
....... repeat of above for all nm hosts......
netmaker       | [netmaker] 2024-08-06 09:17:57 [hosts.go-73] getHosts(): netadmin fetched all hosts                                                                                                                                                                                                                           
netmaker       | [netmaker] 2024-08-06 09:17:57 [node.go-324] getAllNodes(): netadmin fetched all nodes they have access to  
netmaker       | [netmaker] 2024-08-06 09:18:02 [enrollmentkeys.go-178] createEnrollmentKey(): netadmin created enrollment key                                                                                                                                                                                                 
netmaker       | [netmaker] 2024-08-06 09:18:02 [publishers.go-237] sendPeers(): sending scheduled peer update (5 min)  

so u can see timing... token generated after getAllNodes() executed.

well its happened 2 times...

@testingNetqa
Copy link

Unfortunately, I have to say that the problem described occurs very frequently in our company. This means that 80% of the netclient pull executions end in this error.
It would be very helpful if either the timeout was set to a higher value by default or if it could be configured.

@cryply
Copy link
Author

cryply commented Aug 25, 2024

Ok you can close it. Issue was DNS request wasnt processed ion netmaker hosts and as results :
and access to ip.server.gravitl.com was blocked. after enabling DNS all start working

@yabinma
Copy link
Collaborator

yabinma commented Aug 29, 2024

closing as request. Thanks for the update.

@yabinma yabinma closed this as completed Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants