Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting HG_OPNOTSUPPORTED when performing RDMA on data living in CUDA memory #7

Closed
thomas-bouvier opened this issue Jan 3, 2023 · 16 comments
Assignees

Comments

@thomas-bouvier
Copy link

Following #6, I'm trying to create a bulk to transfer CUDA variables over RDMA.

#include <torch/extension.h>
#include <iostream>
#include <thallium.hpp>

#define __DEBUG
#include "debug.hpp"

namespace tl = thallium;

int main(int argc, char** argv) {
    struct hg_init_info hii;
    memset(&hii, 0, sizeof(hii));
    hii.na_init_info.request_mem_device = true;
    tl::engine myEngine("tcp", MARGO_CLIENT_MODE, true, 1, &hii);

    tl::remote_procedure remote_do_rdma = myEngine.define("do_rdma");
    tl::endpoint server_endpoint = myEngine.lookup("tcp://127.0.0.1:1234");

    auto options = torch::TensorOptions().dtype(torch::kFloat32).device(torch::kCUDA);
    torch::Tensor aug_samples = torch::zeros({3, 224, 224}, options);
    std::vector<std::pair<void*, std::size_t>> segments;
    segments.emplace_back(aug_samples.data_ptr(), aug_samples.nbytes());

    struct hg_bulk_attr attr;
    memset(&attr, 0, sizeof(attr));
    if (aug_samples.is_cuda()) {
        DBG("Samples are in CUDA memory!");
        attr.mem_type = (hg_mem_type_t) HG_MEM_TYPE_CUDA;
        attr.device = 0;
    } else {
        attr.mem_type = (hg_mem_type_t) HG_MEM_TYPE_HOST;
    }

    tl::bulk local_bulk = myEngine.expose(segments, tl::bulk_mode::write_only, attr);
    remote_do_rdma.on(server_endpoint)(local_bulk);

    return 0;
}

When running the code above, I get the following error :

[DEBUG 0] [client.cpp:37:main] Samples are in CUDA memory!
Function returned HG_OPNOTSUPPORTED
terminate called after throwing an instance of 'thallium::margo_exception'
  what():  [/opt/software/linux-ubuntu20.04-skylake_avx512/gcc-9.4.0/mochi-thallium-main-6blapl4zngvcquy5lciliiogjzydzupr/include/thallium/engine.hpp:1132][margo_bulk_create] Function returned HG_OPNOTSUPPORTED
Aborted (core dumped)

As suggested by @carns:

  1. I initialized Mercury with device memory support:
struct hg_init_info hii;
memset(&hii, 0, sizeof(hii));
hii.na_init_info.request_mem_device = true;
tl::engine myEngine("tcp", MARGO_CLIENT_MODE, true, 1, &hii);
  1. I built libfabric using the +cuda variant https://github.com/mochi-hpc/mochi-spack-packages/blob/e222ad18083171a2e6806a0d363621f9c142e45e/packages/libfabric/package.py#L75. However, this failed as I am working with Spack in a Docker container, and the build process is checking for runtime CUDA availability:
1 error found in build log:
     132    checking cuda_runtime.h presence... yes
     133    checking for cuda_runtime.h... yes
     134    configure: looking for library in lib64
     135    checking for cudaMemcpy in -lcudart... no
     136    configure: looking for library in lib
     137    checking for cudaMemcpy in -lcudart... no
  >> 138    configure: error: CUDA support requested but CUDA runtime not avail
            able.

I used the --enable-cuda-dlopen flag as suggested in ofiwg/libfabric#7790 (comment) to overcome this issue, and opened a PR mochi-hpc/mochi-spack-packages#16 to add the corresponding variant in the mochi-spack-packages repo. I should probably make sure that libfabric is supporting CUDA independently from Thallium first.

Any pointers to debug that issue? Thanks!

@carns
Copy link
Member

carns commented Jan 6, 2023

Hi @thomas-bouvier . It's a little hard to tell if the error is coming from Mercury or libfabric. Can you repeat running your example with more logging enabled?

Specifically export FI_LOG_LEVEL=debug and HG_LOG_LEVEL=debug will turn on just about everything at each level, I think.

Tagging @soumagne in case he has any insight. I think the problem might be more obvious with libfabric debug messages, though.

@carns
Copy link
Member

carns commented Jan 6, 2023

Somewhat orthogonal to debugging the problem at hand, but I'll leave this thought here anyway: would it be helpful in the long run to have a standalone margo utility (similar to the margo-info tool) that can validate a given software stack's ability to register a CUDA region for RDMA on libfabric? That's the portion that's failing here, rather than the RDMA transfer itself. That means it could be validated with a single command-line process, I think.

@thomas-bouvier
Copy link
Author

thomas-bouvier commented Jan 9, 2023

Hi @carns, thank you very much for your feedback. Here are the complete logs:

root@gemini-1:~/bug# FI_LOG_LEVEL=debug HG_LOG_LEVEL=debug LD_LIBRARY_PATH="/opt/view/lib;/opt/view/lib64;/opt/view/lib/python3.10/site-packages/torch/lib" ./client
libfabric:1073237:1673298660::core:core:fi_param_get_():278<info> variable perf_cntr=<not set>
libfabric:1073237:1673298660::core:core:fi_param_get_():278<info> variable hook=<not set>
libfabric:1073237:1673298660::core:core:fi_param_get_():278<info> variable hmem_cuda_use_gdrcopy=<not set>
libfabric:1073237:1673298660::core:core:fi_param_get_():278<info> variable hmem_cuda_enable_xfer=<not set>
libfabric:1073237:1673298660::core:core:ofi_hmem_init():247<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:1073237:1673298660::core:core:ofi_hmem_init():247<info> Hmem iface FI_HMEM_ZE not supported
libfabric:1073237:1673298660::core:core:ofi_hmem_init():247<info> Hmem iface FI_HMEM_NEURON not supported
libfabric:1073237:1673298660::core:core:ofi_hmem_init():247<info> Hmem iface FI_HMEM_SYNAPSEAI not supported
libfabric:1073237:1673298660::core:core:fi_param_get_():278<info> variable hmem_disable_p2p=<not set>
libfabric:1073237:1673298660::core:core:fi_param_get_():278<info> variable mr_cache_max_size=<not set>
libfabric:1073237:1673298660::core:core:fi_param_get_():278<info> variable mr_cache_max_count=<not set>
libfabric:1073237:1673298660::core:core:fi_param_get_():278<info> variable mr_cache_monitor=<not set>
libfabric:1073237:1673298660::core:core:fi_param_get_():278<info> variable mr_cuda_cache_monitor_enabled=<not set>
libfabric:1073237:1673298660::core:core:fi_param_get_():278<info> variable mr_rocr_cache_monitor_enabled=<not set>
libfabric:1073237:1673298660::core:core:fi_param_get_():278<info> variable mr_ze_cache_monitor_enabled=<not set>
libfabric:1073237:1673298660::core:mr:ofi_default_cache_size():77<info> default cache size=3380746444
libfabric:1073237:1673298660::core:core:fi_param_get_():278<info> variable provider=<not set>
libfabric:1073237:1673298660::core:core:fi_param_get_():278<info> variable universe_size=<not set>
libfabric:1073237:1673298660::core:core:fi_param_get_():278<info> variable provider_path=<not set>
libfabric:1073237:1673298660::ofi_rxm:core:fi_param_get_():278<info> variable enable_passthru=<not set>
libfabric:1073237:1673298660::ofi_rxm:core:fi_param_get_():278<info> variable buffer_size=<not set>
libfabric:1073237:1673298660::ofi_rxm:core:fi_param_get_():278<info> variable tx_size=<not set>
libfabric:1073237:1673298660::ofi_rxm:core:fi_param_get_():278<info> variable rx_size=<not set>
libfabric:1073237:1673298660::ofi_rxm:core:fi_param_get_():278<info> variable msg_tx_size=<not set>
libfabric:1073237:1673298660::ofi_rxm:core:fi_param_get_():278<info> variable msg_rx_size=<not set>
libfabric:1073237:1673298660::ofi_rxm:core:fi_param_get_():278<info> variable cm_progress_interval=<not set>
libfabric:1073237:1673298660::ofi_rxm:core:fi_param_get_():278<info> variable cq_eq_fairness=<not set>
libfabric:1073237:1673298660::ofi_rxm:core:fi_param_get_():278<info> variable data_auto_progress=<not set>
libfabric:1073237:1673298660::ofi_rxm:core:fi_param_get_():278<info> variable use_rndv_write=<not set>
libfabric:1073237:1673298660::ofi_rxm:core:fi_param_get_():278<info> variable def_wait_obj=<not set>
libfabric:1073237:1673298660::ofi_rxm:core:fi_param_get_():278<info> variable def_tcp_wait_obj=<not set>
libfabric:1073237:1673298660::core:core:ofi_register_provider():466<info> registering provider: ofi_rxm (116.10)
libfabric:1073237:1673298660::verbs:core:fi_param_get_():278<info> variable tx_size=<not set>
libfabric:1073237:1673298660::verbs:core:fi_param_get_():278<info> variable rx_size=<not set>
libfabric:1073237:1673298660::verbs:core:fi_param_get_():278<info> variable tx_iov_limit=<not set>
libfabric:1073237:1673298660::verbs:core:fi_param_get_():278<info> variable rx_iov_limit=<not set>
libfabric:1073237:1673298660::verbs:core:fi_param_get_():278<info> variable inline_size=<not set>
libfabric:1073237:1673298660::verbs:core:fi_param_get_():278<info> variable min_rnr_timer=<not set>
libfabric:1073237:1673298660::verbs:core:fi_param_get_():278<info> variable use_odp=<not set>
libfabric:1073237:1673298660::verbs:core:fi_param_get_():278<info> variable prefer_xrc=<not set>
libfabric:1073237:1673298660::verbs:core:fi_param_get_():278<info> variable xrcd_filename=<not set>
libfabric:1073237:1673298660::verbs:core:fi_param_get_():278<info> variable cqread_bunch_size=<not set>
libfabric:1073237:1673298660::verbs:core:fi_param_get_():278<info> variable gid_idx=<not set>
libfabric:1073237:1673298660::verbs:core:fi_param_get_():278<info> variable device_name=<not set>
libfabric:1073237:1673298660::verbs:core:vrb_read_params():716<info> dmabuf support is disabled
libfabric:1073237:1673298660::verbs:core:fi_param_get_():278<info> variable iface=<not set>
libfabric:1073237:1673298660::verbs:core:fi_param_get_():278<info> variable dgram_use_name_server=<not set>
libfabric:1073237:1673298660::verbs:core:fi_param_get_():278<info> variable dgram_name_server_port=<not set>
libfabric:1073237:1673298660::verbs:fabric:verbs_devs_print():887<info> list of verbs devices found for FI_EP_MSG:
libfabric:1073237:1673298661::verbs:fabric:vrb_get_device_attrs():618<info> device mlx5_0: first found active port is 1
libfabric:1073237:1673298661::verbs:fabric:vrb_get_device_attrs():618<info> device mlx5_0: first found active port is 1
libfabric:1073237:1673298661::verbs:fabric:vrb_get_device_attrs():618<info> device mlx5_0: first found active port is 1
libfabric:1073237:1673298661::verbs:fabric:vrb_get_device_attrs():618<info> device mlx5_1: first found active port is 1
libfabric:1073237:1673298661::verbs:fabric:vrb_get_device_attrs():618<info> device mlx5_1: first found active port is 1
libfabric:1073237:1673298661::verbs:fabric:vrb_get_device_attrs():618<info> device mlx5_1: first found active port is 1
libfabric:1073237:1673298661::verbs:fabric:vrb_get_device_attrs():618<info> device mlx5_2: first found active port is 1
libfabric:1073237:1673298661::verbs:fabric:vrb_get_device_attrs():618<info> device mlx5_2: first found active port is 1
libfabric:1073237:1673298661::verbs:fabric:vrb_get_device_attrs():618<info> device mlx5_2: first found active port is 1
libfabric:1073237:1673298661::verbs:fabric:vrb_get_device_attrs():614<info> device mlx5_3: there are no active ports
libfabric:1073237:1673298661::verbs:fabric:vrb_get_device_attrs():614<info> device mlx5_3: there are no active ports
libfabric:1073237:1673298661::verbs:fabric:vrb_get_device_attrs():614<info> device mlx5_3: there are no active ports
libfabric:1073237:1673298661::core:core:ofi_register_provider():466<info> registering provider: verbs (116.10)
libfabric:1073237:1673298661::tcp:core:fi_param_get_():278<info> variable port_high_range=<not set>
libfabric:1073237:1673298661::tcp:core:fi_param_get_():278<info> variable port_low_range=<not set>
libfabric:1073237:1673298661::tcp:core:fi_param_get_():278<info> variable tx_size=<not set>
libfabric:1073237:1673298661::tcp:core:fi_param_get_():278<info> variable rx_size=<not set>
libfabric:1073237:1673298661::tcp:core:fi_param_get_():278<info> variable nodelay=<not set>
libfabric:1073237:1673298661::tcp:core:fi_param_get_():278<info> variable staging_sbuf_size=<not set>
libfabric:1073237:1673298661::tcp:core:fi_param_get_():278<info> variable prefetch_rbuf_size=<not set>
libfabric:1073237:1673298661::tcp:core:fi_param_get_():278<info> variable zerocopy_size=<not set>
libfabric:1073237:1673298661::core:core:ofi_register_provider():466<info> registering provider: tcp (116.10)
libfabric:1073237:1673298661::net:core:fi_param_get_():278<info> variable prov_name=<not set>
libfabric:1073237:1673298661::net:core:fi_param_get_():278<info> variable port_high_range=<not set>
libfabric:1073237:1673298661::net:core:fi_param_get_():278<info> variable port_low_range=<not set>
libfabric:1073237:1673298661::net:core:fi_param_get_():278<info> variable tx_size=<not set>
libfabric:1073237:1673298661::net:core:fi_param_get_():278<info> variable rx_size=<not set>
libfabric:1073237:1673298661::net:core:fi_param_get_():278<info> variable nodelay=<not set>
libfabric:1073237:1673298661::net:core:fi_param_get_():278<info> variable staging_sbuf_size=<not set>
libfabric:1073237:1673298661::net:core:fi_param_get_():278<info> variable prefetch_rbuf_size=<not set>
libfabric:1073237:1673298661::net:core:fi_param_get_():278<info> variable zerocopy_size=<not set>
libfabric:1073237:1673298661::net:core:fi_param_get_():278<info> variable poll_fairness=<not set>
libfabric:1073237:1673298661::net:core:fi_param_get_():278<info> variable poll_cooldown=<not set>
libfabric:1073237:1673298661::net:core:fi_param_get_():278<info> variable disable_auto_progress=<not set>
libfabric:1073237:1673298661::core:core:ofi_register_provider():466<info> registering provider: net (116.10)
libfabric:1073237:1673298661::core:core:ofi_register_provider():466<info> registering provider: ofi_hook_perf (116.10)
libfabric:1073237:1673298661::core:core:ofi_register_provider():466<info> registering provider: ofi_hook_debug (116.10)
libfabric:1073237:1673298661::core:core:fi_param_get_():278<info> variable hmem_cuda_use_gdrcopy=<not set>
libfabric:1073237:1673298661::core:core:fi_param_get_():278<info> variable hmem_cuda_enable_xfer=<not set>
libfabric:1073237:1673298661::core:core:ofi_hmem_init():247<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:1073237:1673298661::core:core:ofi_hmem_init():247<info> Hmem iface FI_HMEM_ZE not supported
libfabric:1073237:1673298661::core:core:ofi_hmem_init():247<info> Hmem iface FI_HMEM_NEURON not supported
libfabric:1073237:1673298661::core:core:ofi_hmem_init():247<info> Hmem iface FI_HMEM_SYNAPSEAI not supported
libfabric:1073237:1673298661::core:core:fi_param_get_():278<info> variable hmem_disable_p2p=<not set>
libfabric:1073237:1673298661::core:core:ofi_register_provider():466<info> registering provider: ofi_hook_hmem (116.10)
libfabric:1073237:1673298661::core:core:ofi_register_provider():466<info> registering provider: ofi_hook_dmabuf_peer_mem (116.10)
libfabric:1073237:1673298661::core:core:ofi_register_provider():466<info> registering provider: ofi_hook_noop (116.10)
libfabric:1073237:1673298661::ofi_rxm:core:ofi_check_fabric_attr():410<info> Requesting provider tcp;ofi_rxm, skipping verbs
libfabric:1073237:1673298661::ofi_rxm:core:ofi_check_ep_attr():689<info> Unsupported protocol
libfabric:1073237:1673298661::ofi_rxm:core:ofi_check_ep_attr():690<info> Supported: FI_PROTO_RXM_TCP
libfabric:1073237:1673298661::ofi_rxm:core:ofi_check_ep_attr():690<info> Requested: FI_PROTO_RXM
libfabric:1073237:1673298661::ofi_rxm:core:fi_param_get_():278<info> variable use_srx=<not set>
libfabric:1073237:1673298661:ofi_rxm:core:core:ofi_layering_ok():1025<info> Provider ofi_rxm is excluded
libfabric:1073237:1673298661:ofi_rxm:tcp:core:fi_param_get_():278<info> variable iface=<not set>
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.16.53.1, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.17.0.1, iface name: docker0, speed: 100
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: fe80::dac4:97ff:feb8:3283, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1883<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1898<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:util_getinfo_ifs():333<info> Chosen addr for using: 172.16.53.1, speed 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:fi_param_get_():278<info> variable iface=<not set>
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.16.53.1, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.17.0.1, iface name: docker0, speed: 100
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: fe80::dac4:97ff:feb8:3283, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1883<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1898<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:util_getinfo_ifs():333<info> Chosen addr for using: 172.16.53.1, speed 10000
libfabric:1073237:1673298661::ofi_rxm:core:fi_param_get_():278<info> variable use_srx=<not set>
libfabric:1073237:1673298661:ofi_rxm:core:core:ofi_layering_ok():1025<info> Provider ofi_rxm is excluded
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_check_ep_attr():745<info> Provider requires use of shared rx context
libfabric:1073237:1673298661:ofi_rxm:tcp:core:fi_param_get_():278<info> variable iface=<not set>
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.16.53.1, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.17.0.1, iface name: docker0, speed: 100
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: fe80::dac4:97ff:feb8:3283, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1883<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1898<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:util_getinfo_ifs():333<info> Chosen addr for using: 172.16.53.1, speed 10000
libfabric:1073237:1673298661::ofi_rxm:core:fi_param_get_():278<info> variable use_srx=<not set>
libfabric:1073237:1673298661:ofi_rxm:core:core:ofi_layering_ok():1025<info> Provider ofi_rxm is excluded
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_check_ep_attr():745<info> Provider requires use of shared rx context
libfabric:1073237:1673298661:ofi_rxm:tcp:core:fi_param_get_():278<info> variable iface=<not set>
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.16.53.1, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.17.0.1, iface name: docker0, speed: 100
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: fe80::dac4:97ff:feb8:3283, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1883<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1898<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:util_getinfo_ifs():333<info> Chosen addr for using: 172.16.53.1, speed 10000
libfabric:1073237:1673298661::ofi_rxm:core:ofi_check_fabric_attr():410<info> Requesting provider tcp;ofi_rxm, skipping verbs
libfabric:1073237:1673298661::ofi_rxm:core:ofi_check_ep_attr():689<info> Unsupported protocol
libfabric:1073237:1673298661::ofi_rxm:core:ofi_check_ep_attr():690<info> Supported: FI_PROTO_RXM_TCP
libfabric:1073237:1673298661::ofi_rxm:core:ofi_check_ep_attr():690<info> Requested: FI_PROTO_RXM
libfabric:1073237:1673298661::ofi_rxm:core:fi_param_get_():278<info> variable use_srx=<not set>
libfabric:1073237:1673298661:ofi_rxm:core:core:ofi_layering_ok():1025<info> Provider ofi_rxm is excluded
libfabric:1073237:1673298661:ofi_rxm:tcp:core:fi_param_get_():278<info> variable iface=<not set>
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.16.53.1, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.17.0.1, iface name: docker0, speed: 100
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: fe80::dac4:97ff:feb8:3283, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1883<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1898<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:util_getinfo_ifs():333<info> Chosen addr for using: 172.16.53.1, speed 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:fi_param_get_():278<info> variable iface=<not set>
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.16.53.1, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.17.0.1, iface name: docker0, speed: 100
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: fe80::dac4:97ff:feb8:3283, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1883<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1898<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:util_getinfo_ifs():333<info> Chosen addr for using: 172.16.53.1, speed 10000
libfabric:1073237:1673298661::ofi_rxm:core:fi_param_get_():278<info> variable use_srx=<not set>
libfabric:1073237:1673298661:ofi_rxm:core:core:ofi_layering_ok():1025<info> Provider ofi_rxm is excluded
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_check_ep_attr():745<info> Provider requires use of shared rx context
libfabric:1073237:1673298661:ofi_rxm:tcp:core:fi_param_get_():278<info> variable iface=<not set>
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.16.53.1, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.17.0.1, iface name: docker0, speed: 100
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: fe80::dac4:97ff:feb8:3283, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1883<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1898<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:util_getinfo_ifs():333<info> Chosen addr for using: 172.16.53.1, speed 10000
libfabric:1073237:1673298661::ofi_rxm:core:fi_param_get_():278<info> variable use_srx=<not set>
libfabric:1073237:1673298661:ofi_rxm:core:core:ofi_layering_ok():1025<info> Provider ofi_rxm is excluded
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_check_ep_attr():745<info> Provider requires use of shared rx context
libfabric:1073237:1673298661:ofi_rxm:tcp:core:fi_param_get_():278<info> variable iface=<not set>
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.16.53.1, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.17.0.1, iface name: docker0, speed: 100
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: fe80::dac4:97ff:feb8:3283, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1883<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1898<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:util_getinfo_ifs():333<info> Chosen addr for using: 172.16.53.1, speed 10000
libfabric:1073237:1673298661::core:core:ofi_layering_ok():1025<info> Provider ofi_rxm is excluded
libfabric:1073237:1673298661::tcp:core:fi_param_get_():278<info> variable iface=<not set>
libfabric:1073237:1673298661::tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.16.53.1, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661::tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.17.0.1, iface name: docker0, speed: 100
libfabric:1073237:1673298661::tcp:core:ofi_get_list_of_addr():2051<info> Available addr: fe80::dac4:97ff:feb8:3283, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661::tcp:core:ofi_insert_loopback_addr():1883<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:1073237:1673298661::tcp:core:ofi_insert_loopback_addr():1898<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:1073237:1673298661::tcp:core:util_getinfo_ifs():333<info> Chosen addr for using: 172.16.53.1, speed 10000
libfabric:1073237:1673298661::tcp:core:fi_param_get_():278<info> variable iface=<not set>
libfabric:1073237:1673298661::tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.16.53.1, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661::tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.17.0.1, iface name: docker0, speed: 100
libfabric:1073237:1673298661::tcp:core:ofi_get_list_of_addr():2051<info> Available addr: fe80::dac4:97ff:feb8:3283, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661::tcp:core:ofi_insert_loopback_addr():1883<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:1073237:1673298661::tcp:core:ofi_insert_loopback_addr():1898<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:1073237:1673298661::tcp:core:util_getinfo_ifs():333<info> Chosen addr for using: 172.16.53.1, speed 10000
libfabric:1073237:1673298661::core:core:fi_fabric_():1340<info> Opened fabric: 172.16.48.0/20
libfabric:1073237:1673298661::core:core:fi_fabric_():1340<info> Opened fabric: 172.16.48.0/20
libfabric:1073237:1673298661::ofi_rxm:core:fi_param_get_():278<info> variable use_srx=<not set>
libfabric:1073237:1673298661:ofi_rxm:core:core:ofi_layering_ok():1025<info> Provider ofi_rxm is excluded
libfabric:1073237:1673298661:ofi_rxm:tcp:core:fi_param_get_():278<info> variable iface=<not set>
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.16.53.1, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.17.0.1, iface name: docker0, speed: 100
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: fe80::dac4:97ff:feb8:3283, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1883<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1898<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:fi_param_get_():278<info> variable iface=<not set>
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.16.53.1, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.17.0.1, iface name: docker0, speed: 100
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: fe80::dac4:97ff:feb8:3283, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1883<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1898<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:1073237:1673298661::tcp:core:ofi_check_rx_attr():805<info> Tx only caps ignored in Rx caps
libfabric:1073237:1673298661::tcp:core:ofi_check_tx_attr():903<info> Rx only caps ignored in Tx caps
libfabric:1073237:1673298661::tcp:core:ofi_check_rx_attr():805<info> Tx only caps ignored in Rx caps
libfabric:1073237:1673298661::tcp:core:ofi_check_tx_attr():903<info> Rx only caps ignored in Tx caps
libfabric:1073237:1673298661::ofi_rxm:core:fi_param_get_():278<info> variable enable_dyn_rbuf=<not set>
libfabric:1073237:1673298661::ofi_rxm:av:util_av_init():487<info> AV size 1024
libfabric:1073237:1673298661::ofi_rxm:core:fi_param_get_():278<info> variable comp_per_progress=<not set>
libfabric:1073237:1673298661::ofi_rxm:core:ofi_check_fabric_attr():410<info> Requesting provider tcp;ofi_rxm, skipping verbs
libfabric:1073237:1673298661::ofi_rxm:core:ofi_check_ep_attr():689<info> Unsupported protocol
libfabric:1073237:1673298661::ofi_rxm:core:ofi_check_ep_attr():690<info> Supported: FI_PROTO_RXM_TCP
libfabric:1073237:1673298661::ofi_rxm:core:ofi_check_ep_attr():690<info> Requested: FI_PROTO_RXM
libfabric:1073237:1673298661::ofi_rxm:core:ofi_check_rx_attr():805<info> Tx only caps ignored in Rx caps
libfabric:1073237:1673298661::ofi_rxm:core:ofi_check_tx_attr():903<info> Rx only caps ignored in Tx caps
libfabric:1073237:1673298661::ofi_rxm:core:ofi_check_rx_attr():805<info> Tx only caps ignored in Rx caps
libfabric:1073237:1673298661::ofi_rxm:core:ofi_check_tx_attr():903<info> Rx only caps ignored in Tx caps
libfabric:1073237:1673298661::ofi_rxm:core:ofi_check_ep_attr():789<info> Tag size exceeds supported size
libfabric:1073237:1673298661::ofi_rxm:core:ofi_check_ep_attr():790<info> Supported: 6148914691236517205
libfabric:1073237:1673298661::ofi_rxm:core:ofi_check_ep_attr():790<info> Requested: 12297829382473034410
libfabric:1073237:1673298661::ofi_rxm:core:fi_param_get_():278<info> variable use_srx=<not set>
libfabric:1073237:1673298661:ofi_rxm:core:core:ofi_layering_ok():1025<info> Provider ofi_rxm is excluded
libfabric:1073237:1673298661:ofi_rxm:tcp:core:fi_param_get_():278<info> variable iface=<not set>
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.16.53.1, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.17.0.1, iface name: docker0, speed: 100
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: fe80::dac4:97ff:feb8:3283, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1883<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1898<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:fi_param_get_():278<info> variable iface=<not set>
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.16.53.1, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: 172.17.0.1, iface name: docker0, speed: 100
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_get_list_of_addr():2051<info> Available addr: fe80::dac4:97ff:feb8:3283, iface name: enp1s0f0, speed: 10000
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1883<info> available addr: : fi_sockaddr_in://127.0.0.1:0
libfabric:1073237:1673298661:ofi_rxm:tcp:core:ofi_insert_loopback_addr():1898<info> available addr: : fi_sockaddr_in6://[::1]:0
libfabric:1073237:1673298661::tcp:core:ofi_check_rx_attr():805<info> Tx only caps ignored in Rx caps
libfabric:1073237:1673298661::tcp:core:ofi_check_tx_attr():903<info> Rx only caps ignored in Tx caps
libfabric:1073237:1673298661::tcp:core:ofi_check_rx_attr():805<info> Tx only caps ignored in Rx caps
libfabric:1073237:1673298661::tcp:core:ofi_check_tx_attr():903<info> Rx only caps ignored in Tx caps
libfabric:1073237:1673298661::ofi_rxm:core:fi_param_get_():278<info> variable enable_direct_send=<not set>
libfabric:1073237:1673298661::ofi_rxm:core:fi_param_get_():278<info> variable eager_limit=<not set>
libfabric:1073237:1673298661::ofi_rxm:core:rxm_ep_settings_init():1259<info> Settings:
                 MR local: MSG - 0, RxM - 0
                 Completions per progress: MSG - 1
                 Buffered min: 0
                 Min multi recv size: 16384
                 inject size: 128
                 Protocol limits: Eager: 16384, SAR: 16384
libfabric:1073237:1673298661::ofi_rxm:av:ofi_av_insert_addr():291<info> inserting addr
: fi_sockaddr_in://172.16.53.1:38933
libfabric:1073237:1673298661::ofi_rxm:av:ofi_av_insert_addr():314<info> fi_addr: 0
libfabric:1073237:1673298661::ofi_rxm:av:ofi_av_insert_addr():291<info> inserting addr
: fi_sockaddr_in://127.0.0.1:1234
libfabric:1073237:1673298661::ofi_rxm:av:ofi_av_insert_addr():314<info> fi_addr: 1
[DEBUG 1] [client.cpp:37:main] Samples are in CUDA memory!
Function returned HG_OPNOTSUPPORTED
terminate called after throwing an instance of 'thallium::margo_exception'
  what():  [/tmp/opt/software/linux-debian11-broadwell/gcc-10.2.1/mochi-thallium-main-akonmmfedv533tq745tml3x7x7fb6jvn/include/thallium/engine.hpp:1132][margo_bulk_create] Function returned HG_OPNOTSUPPORTED
Aborted

Many lines display <not set>, this looks suspicious to me. libfabric@1.16.1+cuda and rdma-core@41.0 are installed.

Input spec
--------------------------------
mochi-thallium@main
    ^argobots
    ^libfabric+cuda fabrics=rxm,tcp,verbs
    ^mercury@2.2.0~boostsys~checksum+ofi

I checked with the grid5000 team, OFED is not what's installed on the machine initially. They use the rdma-core packaged in debian. So it should be ok to build it myself using Spack (that's what I did).

I think the tool you describe would be really convenient, as RDMA+CUDA seems to be quite tricky to achieve :)

@carns
Copy link
Member

carns commented Jan 10, 2023

I actually don't really see anything alarming in that output. Unfortunately it doesn't have any messages from Mercury, though. I forgot that you have to build the Mercury spack package with +debug for it to emit messages, though :(

Can you try again with it built that way?

I'll open an issue to track the CUDA validation command line utility idea.

@soumagne
Copy link
Member

@carns error and warnings messages should be printed regardless. The +debug variant only affects debug messages. Actually in that case setting FI_LOG_LEVEL=warn and HG_LOG_LEVEL=warn should be sufficient I would expect.

@thomas-bouvier
Copy link
Author

I built mercury+debug to be sure, and I got the exact same output :(

Input spec
--------------------------------
mochi-thallium@main
    ^argobots
    ^libfabric+cuda fabrics=rxm,tcp,verbs
    ^mercury@2.2.0~boostsys~checksum+debug+ofi

@soumagne
Copy link
Member

You could also try setting HG_LOG_SUBSYS=na just in case something is somehow hiding the output.

@thomas-bouvier
Copy link
Author

Interesting, HG_LOG_SUBSYS=na FI_LOG_LEVEL=warn HG_LOG_LEVEL=warn gives the following:

root@gemini-1:~/bug# HG_LOG_SUBSYS=na FI_LOG_LEVEL=warn HG_LOG_LEVEL=warn ./client
[DEBUG 1] [client.cpp:37:main] Samples are in CUDA memory!
# [10797.956487] mercury->mem: [error] /tmp/spack-stage/root/spack-stage-mercury-2.2.0-3eaxn43bgam7of6jxmjoq6jvbrcfn5md/spack-src/src/na/na_ofi.c:6076
 # na_ofi_mem_register(): selected provider does not support device registration
Function returned HG_OPNOTSUPPORTED
terminate called after throwing an instance of 'thallium::margo_exception'
  what():  [/tmp/opt/software/linux-debian11-broadwell/gcc-10.2.1/mochi-thallium-main-ruzuusshqrs2jyz4wfiror6icwsefmui/include/thallium/engine.hpp:1132][margo_bulk_create] Function returned HG_OPNOTSUPPORTED
Aborted

The full output with FI_LOG_LEVEL=debug HG_LOG_LEVEL=debug is a bit long (but seems useful):

full_output.txt

@soumagne
Copy link
Member

which provider are you trying to use ? tcp ? I had somehow missed that in your previous log but it can only work with verbs and shm providers.

@thomas-bouvier
Copy link
Author

I was using tcp as this example in the docs : https://github.com/mochi-hpc/mochi-doc/blob/29e3a87d30b500a7d70a4601893975254ec3e5de/code/thallium/08_rdma/client.cpp

I tried with verbs providers verbs and ofi+verbs, but the Thallium engine can't get initialized with these. margo-info shows them in red. I discussed this with @carns in the past : a suggestion was to build libfabric reusing the (external) vendor rdma-core. It turns out that what is installed on my system is the version of rdma-core packaged with debian. Building rdma-core myself should not be a problem.

Anyway, let's focus on shm providers for now. I updated my code as follows to leverage the na+sm provider:

client.cpp

#include <torch/extension.h>
#include <iostream>
#include <thallium.hpp>

#define __DEBUG
#include "debug.hpp"

namespace tl = thallium;

int main(int argc, char** argv) {
    struct hg_init_info hii;
    memset(&hii, 0, sizeof(hii));
    hii.na_init_info.request_mem_device = true;
    tl::engine myEngine("na+sm://122-1", THALLIUM_CLIENT_MODE, true, 1, &hii);

    tl::remote_procedure remote_do_rdma = myEngine.define("do_rdma");
    tl::endpoint server_endpoint = myEngine.lookup("na+sm://123-1");

    auto options = torch::TensorOptions().dtype(torch::kFloat32).device(torch::kCUDA);
    torch::Tensor aug_samples = torch::zeros({3, 224, 224}, options);
    std::vector<std::pair<void*, std::size_t>> segments;
    segments.emplace_back(aug_samples.data_ptr(), aug_samples.nbytes());

    struct hg_bulk_attr attr;
    memset(&attr, 0, sizeof(attr));
    if (aug_samples.is_cuda()) {
        DBG("Samples are in CUDA memory!");
        attr.mem_type = (hg_mem_type_t) HG_MEM_TYPE_CUDA;
        attr.device = 0;
    } else {
        attr.mem_type = (hg_mem_type_t) HG_MEM_TYPE_HOST;
    }

    tl::bulk local_bulk = myEngine.expose(segments, tl::bulk_mode::write_only, attr);
    remote_do_rdma.on(server_endpoint)(local_bulk);

    return 0;
}

server.cpp

#include <torch/extension.h>
#include <iostream>
#include <thallium.hpp>
#include <thallium/serialization/stl/string.hpp>

namespace tl = thallium;

int main(int argc, char** argv) {

    tl::engine myEngine("na+sm://123-1", THALLIUM_SERVER_MODE);

    std::function<void(const tl::request&, tl::bulk&)> f =
        [&myEngine](const tl::request& req, tl::bulk& b) {
            auto options = torch::TensorOptions().dtype(torch::kFloat32);
            torch::Tensor tensor = torch::zeros({3, 224, 224}, options);
            std::vector<std::pair<void*, std::size_t>> segments;
            segments.emplace_back(tensor.data_ptr(), tensor.nbytes());

            tl::bulk bulk = myEngine.expose(segments, tl::bulk_mode::read_only);
            bulk >> b.on(req.get_endpoint());
        };
    myEngine.define("do_rdma", f).disable_response();
}

With this code, I get the following error on the server:

Function returned HG_FAULT
terminate called after throwing an instance of 'thallium::margo_exception'
  what():  [/tmp/opt/software/linux-debian11-broadwell/gcc-10.2.1/mochi-thallium-main-ruzuusshqrs2jyz4wfiror6icwsefmui/include/thallium/remote_bulk.hpp:157][margo_bulk_transfer] Function returned HG_FAULT
Aborted

Maybe something is wrong with my code though?

client.txt
Failling server.txt

@soumagne
Copy link
Member

ok yeah I think it's better to focus on getting the verbs provider to work. Sorry for the confusion and for the redundancy there because of multiple libraries providing the same functionality but I actually meant ofi+shm (shared-memory provider from libfabric, which supports gdr copy) not na+sm (shared-memory plugin from mercury which is not GPU enabled) for the shared-memory functionality but I have not tested it with cuda, it is enabled to do what you want though, so in theory it should work. I don't think it has been tested with thallium either yet so you can try but it may be safer to stick with verbs.

@carns
Copy link
Member

carns commented Jan 11, 2023

@soumagne Do you have any idea why we weren't getting this message originally (or even when setting HG_LOG_LEVEL=debug)?

# [10797.956487] mercury->mem: [error] /tmp/spack-stage/root/spack-stage-mercury-2.2.0-3eaxn43bgam7of6jxmjoq6jvbrcfn5md/spack-src/src/na/na_ofi.c:6076
 # na_ofi_mem_register(): selected provider does not support device registration

That might be a separate issue to open up. It would have helped a lot to see this sooner :)

@soumagne
Copy link
Member

yes the default log subsys is fatal only so it still requires setting HG_LOG_SUBSYS to something other than fatal like hg or na to get all the log. When using HG_Set_log_level() it does that implicitly but not when setting the env variable, we could maybe improve that...

@thomas-bouvier
Copy link
Author

@soumagne what about ucx? Does it support CUDA?

@thomas-bouvier
Copy link
Author

Closing this as I get a different error from HG_OPNOTSUPPORTED now. Thank you for your input :)

@soumagne
Copy link
Member

@thomas-bouvier UCX is another option that also supports CUDA, which we also enabled to support that type of transfer, though I don't think anybody tested it just yet :) the code is there though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants