Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve v6 addresses #8861

Merged
merged 19 commits into from
Nov 22, 2021
Merged

Resolve v6 addresses #8861

merged 19 commits into from
Nov 22, 2021

Conversation

cross
Copy link
Contributor

@cross cross commented Oct 18, 2021

Replace calls to socket.gethostbyname to a new resolver method in util/network.py that uses a config pref to return IPv4 or IPv6 addresses appropriately. Discussed in #8552

@cross
Copy link
Contributor Author

cross commented Oct 18, 2021

Okay, clearly the way I'm loading config doesn't work for tests. This echo's back to my question in #8552 :

Adding [the prefer_ipv6 preference] to the initial-config.yaml is clear, but is there code that already handles putting that into configs that don't have it, or is there a way that a default value is set in memory somehow if it's not in the config?

Actually, it's a little different. Based on @hoffmang9 's comment in the discussion, I will set a default for this if it's not in the config. The question is, what if there is no config at all? It looks like the run-time tests have no chia config in the filesystem when these are running? chia init isn't run for those environments?

@cross
Copy link
Contributor Author

cross commented Oct 18, 2021

I could change the code in get_host_addr to check to see if there's a config before loading it, but it seems to me that this area of the code shouldn't really be running without a config in place so the error is reasonable. But, the tests expect to run without a config, it seems. I'm not sure what the right answer is. I'll wait for someone more familiar with the CI workflows to advise me.

chia/server/reconnect_task.py Outdated Show resolved Hide resolved
chia/util/network.py Outdated Show resolved Hide resolved
@emlowe
Copy link
Contributor

emlowe commented Oct 18, 2021

The tests do create a config.yaml file - if you look at create_block_tools() and the "BlockTools" class in block_tools.py. That creates a config.yaml in a temp directory

You can also look at the simpler TestConfig class in test_config.py

@cross
Copy link
Contributor Author

cross commented Oct 18, 2021

The tests do create a config.yaml file - if you look at create_block_tools() and the "BlockTools" class in block_tools.py. That creates a config.yaml in a temp directory

You can also look at the simpler TestConfig class in test_config.py

Looking at https://github.com/Chia-Network/chia-blockchain/runs/3929558844?check_suite_focus=true it seems many of the core CI runs are failing because they aren't able to load the config.yaml when they hit my new method. So, something must not be initialized yet.

chia/wallet/wallet_node.py Outdated Show resolved Hide resolved
@emlowe
Copy link
Contributor

emlowe commented Oct 19, 2021

It looks like the tests are failing
introducer full_node_server : ERROR Exception: <class 'concurrent.futures._base.CancelledError'>, closing connection {'host': '::1', 'port': 21233}.

The code seems to be using the ::1 IPv6 address for localhost, but perhaps the server is not listening properly for that.

@cross
Copy link
Contributor Author

cross commented Oct 19, 2021

It looks like the tests are failing introducer full_node_server : ERROR Exception: <class 'concurrent.futures._base.CancelledError'>, closing connection {'host': '::1', 'port': 21233}.

The code seems to be using the ::1 IPv6 address for localhost, but perhaps the server is not listening properly for that.

Thanks for that detail. I've been wondering how my change could've been causing these issues, but now it's clear.

Let me know if you can confirm that the environment the tests are running in are listening IPv4-only, and can be changed. And let me know if you have any more feedback on the PR as it is.

@emlowe
Copy link
Contributor

emlowe commented Oct 19, 2021

I can, one option is you can try switching the default to false and see if the tests pass. Meanwhile I'll see what I can dig up.

I'm thinking one way to handle the default is to leverage Optional and None - so rather than specify the default in the various "get" calls, let the get return None, and then change get_host_addr is None is passed in that means True or False. One line then to switch the default.

@hoffmang9
Copy link
Member

actions/runner-images#668

Certainly no third party IPv6 available on the runners. Lame...

@cross
Copy link
Contributor Author

cross commented Oct 19, 2021

actions/virtual-environments#668

Certainly no third party IPv6 available on the runners. Lame...

Wow. This is super lame. And, that says it was marked "wontfix" and closed 17 months ago. I'm very disappointed.

@cross
Copy link
Contributor Author

cross commented Oct 19, 2021

Okay. And, I think this may also bring to light a concern I had about making "ipv6 preferred" the default. There will be many people, sadly, who don't have IPv6 working and will get burned in the same way as the test runners. This is among the reasons I wanted it to be in the on-disk config, to be easier to find and switch-flip.
I'll try to rearchitect the preference default into get_host_addr as @emlowe suggests, and [to my deep sadness] switch the default to false and see if the tests pass.

@emlowe
Copy link
Contributor

emlowe commented Oct 19, 2021

It's interesting however that the socket.getaddrinfo() call returns ::1 for localhost in this case - so the OS stack has IPv6 enabled?

@cross
Copy link
Contributor Author

cross commented Oct 19, 2021

It's interesting however that the socket.getaddrinfo() call returns ::1 for localhost in this case - so the OS stack has IPv6 enabled?

Yes. If you look at the ticket Gene referenced, you can see that the ifconfig output showed link local addresses on the interface, but no packets would route out.
Interestingly, I would think the chia services are running on IPv6 sockets then. So I'm not sure why the attempts to connect to ::1 are failing. But. IPv6 is certainly not useful on the runners, so not too much worth worrying about.

…e have only that one place where the coded default lives. also fix an oversight where we were building a PeerInfo from a PeerInfo in some cases.
@emlowe
Copy link
Contributor

emlowe commented Oct 19, 2021

I'm not much of a python expert, but this is a case where I think the only reasonable option is change the parameter type to Optional[Any]

@hoffmang9
Copy link
Member

All modern OSes have ipv6 enabled even if it can't get to outside addresses.

@cross
Copy link
Contributor Author

cross commented Oct 19, 2021

All modern OSes have ipv6 enabled even if it can't get to outside addresses.

Yeah. In which case, I'd expect chia's tests to be able to listen on ::1 and connect that way. The cancellation Earle found suggests that isn't working. Again, I'm not sure how key that is to pursue at this point. I changed the in-code default to False, so I hope that avoids this issue on githubs test runners.

…py. It seems that github testers can't handle trying to use IPv6 and this may be easier for average users (sadly)
@cross cross requested a review from emlowe October 20, 2021 00:39
@cross
Copy link
Contributor Author

cross commented Oct 20, 2021

So, having "fixed" the code I added to always use IPv4, GitHub is okay with it. frown. But, this at least gives a config variable that people can use.
I'd really like for there to be a plan to eventually turn it on by default, or at least in some default cases, but with the test runners impaired the way they are, I'm not sure how we can plan to do that later.

@cross
Copy link
Contributor Author

cross commented Oct 21, 2021

I also have another concern. When I wrote this code on my Ubuntu system, I was wondering if ip6-localhost was actually prominent, because despite being in the /etc/hosts on that machine, I wasn't familiar with it. I now see that this test fails on s FreeBSD system, because that hostname does not have any addresses.

Any recommendations? I can take that test out. I could try "localhost" with IPv6, except that it seems that doesn't always contain both family addresses, so would also fail a "localhost",True == "::1" test.

@emlowe
Copy link
Contributor

emlowe commented Oct 21, 2021

I'd probably just remove it

@cross
Copy link
Contributor Author

cross commented Oct 25, 2021

Hey @emlowe . Can you and others take a look at this, and pull it in if it meets requirements?

chia/util/network.py Outdated Show resolved Hide resolved
@emlowe emlowe requested a review from altendky October 26, 2021 21:07
chia/util/network.py Outdated Show resolved Hide resolved
…issues figuring out where/how they're defined. So, quote them here.
@emlowe
Copy link
Contributor

emlowe commented Oct 27, 2021

I'm pretty happy as it is now without adding a bunch more classes I'm not sure add much in terms of code correctness or readability.

emlowe
emlowe previously approved these changes Oct 27, 2021
@cross
Copy link
Contributor Author

cross commented Nov 3, 2021

@emlowe can you get anyone else you want to review this attached, and then we can get it merged in? Thanks.

@emlowe
Copy link
Contributor

emlowe commented Nov 3, 2021

I approved it, but we had some other more pressing issues, so it will likely wait until after the next release. It will get in though.

@cross
Copy link
Contributor Author

cross commented Nov 16, 2021

The last test that failed failed to retrieve blocks from localhost, but I don't think my changes affect that flow. Let me know, but I'm assuming that updating my branch didn't introduce this, but that it's a testing infra oddity.

@altendky
Copy link
Contributor

I just kicked that workflow for you.

@wjblanke
Copy link
Contributor

wanted to merge but conflicts need to be resolved. sorry its taken so long!

@cross
Copy link
Contributor Author

cross commented Nov 20, 2021

I assume: ERROR WebSocket Error: WSMessage(type=<WSMsgType.ERROR: 258>, data=WebSocketError(<WSCloseCode.MESSAGE_TOO_BIG: 1009>, 'Message size 62914560 exceeds limit 52428800'), extra=None) isn't likely a defect with my changes. Is that a failure the tests sometimes see? Maybe can we kick the tests again @altendky ?

@altendky
Copy link
Contributor

For the record on the above error...

https://github.com/Chia-Network/chia-blockchain/runs/4268413400?check_suite_focus=true
https://gist.github.com/altendky/cdfcf9aae28e1449863008b4819466b7#file-gistfile1-txt-L1623-L1626

2021-11-19T21:38:28.5710359Z 21:38:28 full_node_server: ERROR WebSocket Error: WSMessage(type=<WSMsgType.ERROR: 258>, data=WebSocketError(<WSCloseCode.MESSAGE_TOO_BIG: 1009>, 'Message size 62914560 exceeds limit 52428800'), extra=None)
2021-11-19T21:38:28.5713537Z 2021-11-19T21:38:28.570 full_node full_node_server        : �[31mERROR   �[0m WebSocket Error: WSMessage(type=<WSMsgType.ERROR: 258>, data=WebSocketError(<WSCloseCode.MESSAGE_TOO_BIG: 1009>, 'Message size 62914560 exceeds limit 52428800'), extra=None)�[0m
2021-11-19T21:38:28.5716275Z 21:38:28 full_node_server: WARNING Trying to ban localhost for 300, but will not ban
2021-11-19T21:38:28.5718898Z 2021-11-19T21:38:28.570 full_node full_node_server        : �[31mERROR   �[0m WebSocket Error: WSMessage(type=<WSMsgType.ERROR: 258>, data=WebSocketError(<WSCloseCode.MESSAGE_TOO_BIG: 1009>, 'Message size 62914560 exceeds limit 52428800'), extra=None)�[0m

But I don't think that's the interesting bit.

https://gist.github.com/altendky/cdfcf9aae28e1449863008b4819466b7#file-gistfile1-txt-L2147-L2183

2021-11-19T21:40:56.1173457Z =================================== FAILURES ===================================
2021-11-19T21:40:56.1174024Z ______________________ TestRateLimits.test_periodic_reset ______________________
2021-11-19T21:40:56.1174423Z 
2021-11-19T21:40:56.1174959Z self = <test_rate_limits.TestRateLimits object at 0x7fe1ac415d90>
2021-11-19T21:40:56.1175403Z 
2021-11-19T21:40:56.1175778Z     @pytest.mark.asyncio
2021-11-19T21:40:56.1176280Z     async def test_periodic_reset(self):
2021-11-19T21:40:56.1176780Z         r = RateLimiter(True, 5)
2021-11-19T21:40:56.1177523Z         tx_message = make_msg(ProtocolMessageTypes.respond_transaction, bytes([1] * 500 * 1024))
2021-11-19T21:40:56.1178232Z         for i in range(10):
2021-11-19T21:40:56.1178706Z             assert r.process_msg_and_check(tx_message)
2021-11-19T21:40:56.1179113Z     
2021-11-19T21:40:56.1179478Z         saw_disconnect = False
2021-11-19T21:40:56.1179879Z         for i in range(300):
2021-11-19T21:40:56.1180361Z             response = r.process_msg_and_check(tx_message)
2021-11-19T21:40:56.1180856Z             if not response:
2021-11-19T21:40:56.1181265Z                 saw_disconnect = True
2021-11-19T21:40:56.1181706Z         assert saw_disconnect
2021-11-19T21:40:56.1182212Z         assert not r.process_msg_and_check(tx_message)
2021-11-19T21:40:56.1182718Z         await asyncio.sleep(6)
2021-11-19T21:40:56.1183225Z         assert r.process_msg_and_check(tx_message)
2021-11-19T21:40:56.1183644Z     
2021-11-19T21:40:56.1183978Z         # Counts reset also
2021-11-19T21:40:56.1184402Z         r = RateLimiter(True, 5)
2021-11-19T21:40:56.1185096Z         new_tx_message = make_msg(ProtocolMessageTypes.new_transaction, bytes([1] * 40))
2021-11-19T21:40:56.1185807Z         for i in range(3000):
2021-11-19T21:40:56.1186298Z             assert r.process_msg_and_check(new_tx_message)
2021-11-19T21:40:56.1186732Z     
2021-11-19T21:40:56.1187074Z         saw_disconnect = False
2021-11-19T21:40:56.1187479Z         for i in range(3000):
2021-11-19T21:40:56.1187972Z             response = r.process_msg_and_check(new_tx_message)
2021-11-19T21:40:56.1188476Z             if not response:
2021-11-19T21:40:56.1188899Z                 saw_disconnect = True
2021-11-19T21:40:56.1189329Z >       assert saw_disconnect
2021-11-19T21:40:56.1189710Z E       assert False
2021-11-19T21:40:56.1189954Z 
2021-11-19T21:40:56.1190430Z tests/core/server/test_rate_limits.py:157: AssertionError

I started to address something maybe similar for another test over in #9206. I'll link here and note that this test should be reviewed as well.

Rerunning the failed workflow now.

@cross
Copy link
Contributor Author

cross commented Nov 22, 2021

@wjblanke this shows clean now, though @altendky 's test change is still pending. that will only remove the possibility of the false negative we were seeing earlier here, shouldn't need to wait for it IMO.

@wjblanke wjblanke merged commit 2c4aaa1 into Chia-Network:main Nov 22, 2021
@cross cross deleted the resolve-v6 branch November 22, 2021 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants