Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

devex: add support for macOS in the CI #263

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

HalosGhost
Copy link
Collaborator

@HalosGhost HalosGhost commented May 31, 2024

This is the core step towards making macOS officially-supported rather than a bit of a second-class citizen.

cf. #201

Copy link
Collaborator

@maurermi maurermi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After doing some investigation, it appears that the reason that basic_raft_cluster_failure_test hangs on the MacOS CI is because of lines util/raft/node.cpp:46-63:

        m_raft_instance = m_launcher.init(m_sm,
                                          m_smgr,
                                          m_raft_logger,
                                          m_port,
                                          m_asio_opt,
                                          params,
                                          m_init_opts);

        if(!m_raft_instance) {
            m_log->error("Failed to initialize raft launcher");
            return false;
        }

        m_log->info("Waiting for raft initialization");
        static constexpr auto wait_time = std::chrono::milliseconds(100);
        while(!m_raft_instance->is_initialized()) {
            std::this_thread::sleep_for(wait_time);
        }

On MacOS, m_launcher.init appears to return true, when it should not, and so the below loop is infinite.

  while(!m_raft_instance->is_initialized()) {
      std::this_thread::sleep_for(wait_time);
  }

In more detail, when m_launcher.init() is called, we expect asio_listener_ to be null in the following block (from NuRaft-*/src/launcher.cxx:38-40)

    asio_svc_ = cs_new<asio_service>(asio_options, lg);
    asio_listener_ = asio_svc_->create_rpc_listener(port_number, lg);
    if (!asio_listener_) return nullptr;

Running in an Ubuntu VM, this block does return nullptr but on my Mac (running macOS Sonoma), this does not return nullptr (verified using gdb/lldb). I have not verified why this difference occurs, but this does show we could use the following change:

In the loop in node.cpp:61-63, we would likely benefit from a timeout (ideally configurable) such that the loop does not continue infinitely. Something like the following may make sense:

  uint elapsed_time = 0;
  auto timeout = <config_raft_init_timeout>;
  while(!m_raft_instance->is_initialized() || elapsed_time < timeout) {
      std::this_thread::sleep_for(wait_time);
      elapsed_time += wait_time;
  }

@eolesinski
Copy link
Contributor

@HalosGhost Have you tried rerunning this now that #290 has been merged in?

This is the core step towards making macOS officially-supported rather
than a bit of a second-class citizen.

Signed-off-by: Sam Stuewe <stuewe@mit.edu>
@HalosGhost
Copy link
Collaborator Author

@HalosGhost Have you tried rerunning this now that #290 has been merged in?

Rebased (resolved conflicts), and pushed; CI running!

@HalosGhost
Copy link
Collaborator Author

HalosGhost commented Aug 6, 2024

@eolesinski and @maurermi, any idea why these might now be failing on macOS?

[  FAILED  ] 8 tests, listed below:
[  FAILED  ] tcp_rpc_test.echo_test
[  FAILED  ] tcp_rpc_test.response_error_test
[  FAILED  ] tcp_rpc_test.timeout_test
[  FAILED  ] tcp_rpc_test.no_callback_test
[  FAILED  ] tcp_rpc_test.cancel_test
[  FAILED  ] tcp_rpc_test.async_echo_test
[  FAILED  ] tcp_rpc_test.async_error_test
[  FAILED  ] sentinel_2pc_test.bad_rpc_server_endpoint

Presumably related to the timeout that was added?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants