ENHANCE: Select smallest host as owner when ketama hash collided #113 #237

uhm0311 · 2022-02-24T09:10:30Z

Ketama hash collision 시의 owner 결정 #113 이슈에 관한 PR입니다.
기존 구현에서 hash ring은 항상 qsort() 함수로 정렬되는 특성을 이용했습니다.
continuum item 정렬 시 compare 함수에서 해시 값이 같을 때 Replication인 경우 groupname을, 아닌 경우 hostname과 port를 기준으로 정렬하도록 했습니다.
위의 추가적인 정렬 기준을 통해 dispatch_host() 함수에서 최종적으로 선택된 hash 값이 중복이고 왼쪽(smaller host)에 hash 값이 같은 경우 중복되는 hash 값 중 가장 왼쪽의 rgroup 혹은 server를 사용하도록 하는 로직을 추가했습니다.

SuhwanJang

@uhm0311
1차 리뷰하였고 오프라인으로 코멘트에 대한 보충설명 전달할 예정입니다.

SuhwanJang · 2022-02-25T02:50:55Z

libmemcached/hosts.cc

+      return 1;
+    else
+      return -1;
+#else


여기 변경은 기존 코드 return_value= (int) (a->port - b->port); 로 둬도 상관 없지 않나요?

@SuhwanJang
port의 타입이 uint16_t로 확인되었는데, unsigned 정수의 특성상 a->port보다 b->port가 더 크면 그 결과값이 언더플로우 되어 큰 값을 반환하게 됩니다. 본래 의도는 b->port가 더 크면 음수를 반환하는 것으로 보여 수정했습니다.

libmemcached/hosts.cc

SuhwanJang · 2022-02-25T03:26:21Z

libmemcached/continuum.hpp

+#ifdef ENABLE_REPLICATION
+  memcached_rgroup_st *rgroup;
+#endif
+#endif


node_name 이 필요한 것이므로 server, rgroup 대신 node_name[MAX_NODENAME_LENGTH] 를 두어 cluster type 에 따라 입력하면 되고,
이것을 이 구조체에 두면 node 하나당 continuum item 160개를 생성하므로 메모리 낭비입니다.

별도 구조체 (memcached_continuum_node_st)에 node_name, memcached_continuum_item_st 를 두고,
정렬은 아래 순서로 해야 될 것 입니다.

memcached_continuum_node_st array 를 node_name 순으로 정렬

전체 memcached_continuum_item_st array 를 hpoint 순으로 정렬

@SuhwanJang
MAX_NODENAME_LENGTH를 몇으로 줘야 좋은지는 모르겠지만, 16을 초과해야 한다면 현재 상태가 더 적은 메모리를 사용합니다. 64비트 머신 기준으로 두 개의 포인터는 16바이트를 차지하기 때문입니다.

uhm0311 · 2022-02-25T08:21:33Z

@jhpark816

@SuhwanJang 님 리뷰 반영하여 정렬 방법을 바꿔 메모리 사용량을 줄였습니다.
기존 PR에서는 continuum item 구조체에 server 포인터와 rgroup 포인터를 넣고 hash 값이 같은 두 아이템의 정렬 기준에 server->hostname, server->port 혹은 rgroup->groupname으로 정렬하는 새로운 기준을 추가했습니다.
1차 리뷰 반영 후 변경된 방법은 다음과 같습니다. arcus-memcached 내에서 수행하는 방식과 유사합니다.
1. continuum item 구조체에 uint32_t nindex 매개변수를 추가합니다.
2. 각 server 혹은 rgroup을 node라는 단위로 취급하여 memcached_server_count(ptr) 개수의 배열로 만듭니다.
3. 각 node 안에 nodename을 넣습니다. nodename은 ("%s:%u", server->hostname, server->port) 혹은 ("%s", rgroup->groupname) 입니다.
4. 각 node에 들어갈 160개 continuum item의 포인터를 저장합니다.
5. nodename 기준으로 node 배열을 정렬합니다.
6. node 배열에서 for문을 돌면서 continuum item 포인터를 통해 nindex 값을 node의 인덱스 값으로 정합니다. nodename 기준으로 정렬되어 있으므로 nindex 값은 곧 기존 PR에서 새로 추가되었던 정렬 기준과 같은 용도의 정렬 기준으로 삼을 수 있습니다.
7. continnum item 배열을 정렬하면서 hash 값이 같은 경우 nindex 값을 기준으로 정렬합니다.

SuhwanJang · 2022-02-28T02:13:34Z

@uhm0311 리뷰하였습니다.
run_distribution() 에서 use_sort_hosts 가 true 이면, rgroups/servers 를 group_name/hostname 순으로 정렬합니다.
정렬 후에 update_continuum() 하면, 기존에 memcached_continuum_item_st 에 있던 index 필드로 구현이 가능할 것 같습니다.
가능한지 확인해보시고 가능하다면 use_sort_hosts 를 true 로 사용하도록 변경하는 것이 나을 것 같습니다.

uhm0311 · 2022-02-28T04:42:32Z

@jhpark816 @SuhwanJang

STARTING SERVER(pid:96258): /usr/local/bin/memcached -P /tmp/memcached.pidhT1Y9Y -d -p 11221 -l 127.0.0.1  -m 512  -g 1  -M -E /usr/local/lib/default_engine.so 
 
 
STARTING SERVER(pid:96260): /usr/local/bin/memcached -P /tmp/memcached.pide375LU -d -p 11222 -l 127.0.0.1  -m 512  -g 1  -M -E /usr/local/lib/default_engine.so 
 
 
STARTING SERVER(pid:96263): /usr/local/bin/memcached -P /tmp/memcached.pidQq2wgX -d -p 11223 -l 127.0.0.1  -m 512  -g 1  -M -E /usr/local/lib/default_engine.so 
 
 
STARTING SERVER(pid:96265): /usr/local/bin/memcached -P /tmp/memcached.pids2hGVx -d -p 11224 -l 127.0.0.1  -m 512  -g 1  -M -E /usr/local/lib/default_engine.so 
 
 
STARTING SERVER(pid:96267): /usr/local/bin/memcached -P /tmp/memcached.pidFb5i5Z -d -p 11225 -l 127.0.0.1  -m 512  -g 1  -M -E /usr/local/lib/default_engine.so 
 
Collection: libmemcachedutil
	Testing libmemcached_util_ping()					0.1[ ok ]
	Testing libmemcached_util_getpid()					0.1[ ok ]
	Testing libmemcached_util_getpid(MEMCACHED_CONNECTION_FAILURE)					0.1[ ok ]
 
Collection: basic
	Testing init					0.0[ ok ]
	Testing clone					0.0[ ok ]
	Testing reset					0.0[ ok ]
	Testing reset heap					0.0[ ok ]
	Testing reset stack clone					0.0[ ok ]
	Testing reset heap clone					0.0[ ok ]
	Testing memcached_return_t					0.0[ ok ]
 
Collection: hsieh_availability
	Testing hsieh_avaibility_test					[ skipped ]
 
Collection: murmur_availability
	Testing murmur_avaibility_test					0.0[ ok ]
 
Collection: memcached_server_add
	Testing memcached_server_add("")					0.0[ ok ]
	Testing memcached_server_add(NULL)					0.0[ ok ]
 
Collection: immediate_reconnect
	Testing immediate_reconnect					0.1[ ok ]
 
Collection: block
	Testing util_version					0.0[ ok ]
	Testing flush					0.3[ ok ]
	Testing init					0.0[ ok ]
	Testing allocation					0.0[ ok ]
	Testing server_list_null_test					0.0[ ok ]
	Testing server_unsort					0.0[ ok ]
Assertion failed: (bigger <= memcached_server_port(server)), function server_display_function, file mem_functions.cc, line 132.
FAIL tests/testapp (exit status: 134)

오류가 나는 코드는 다음과 같습니다.

static memcached_return_t server_display_function(const memcached_st *ptr,
                                                  const memcached_server_st *server,
                                                  void *context)
{
  /* Do Nothing */
  size_t bigger= *((size_t *)(context));
  (void)ptr;
  assert(bigger <= memcached_server_port(server));
  *((size_t *)(context))= memcached_server_port(server);

  return MEMCACHED_SUCCESS;
}

static test_return_t server_sort_test(memcached_st *ptr)
{
  size_t bigger= 0; /* Prime the value for the test_true in server_display_function */

  memcached_return_t rc;
  memcached_server_fn callbacks[1];
  memcached_st *local_memc;
  (void)ptr;

  local_memc= memcached_create(NULL);
  test_true(local_memc);
  memcached_behavior_set(local_memc, MEMCACHED_BEHAVIOR_SORT_HOSTS, 1);

  for (uint32_t x= 0; x < TEST_PORT_COUNT; x++)
  {
    test_ports[x]= (in_port_t)random() % 64000;
    rc= memcached_server_add_with_weight(local_memc, "localhost", test_ports[x], 0);
    test_compare(memcached_server_count(local_memc), x +1);
#if 0 // Rewrite
    test_true(memcached_server_list_count(memcached_server_list(local_memc)) == x+1);
#endif
    test_compare(MEMCACHED_SUCCESS, rc);
  }

  callbacks[0]= server_display_function;
  memcached_server_cursor(local_memc, callbacks, (void *)&bigger,  1);


  memcached_free(local_memc);

  return TEST_SUCCESS;
}

java-client와 동일한 방식으로 정렬하기 위해서는 ("%s:%u", hostname, port)를 하나의 문자열로 취급하여 strcmp() 함수를 호출해야 합니다.
기존의 정렬 방식이 hostname은 strcmp() 함수로, port는 숫자로 정렬합니다.
테스트 코드가 port는 숫자로 정렬하는 것을 기준으로 짜여져 있어서 에러가 발생합니다.
새로운 정렬 기준은 다음과 같습니다.

static int compare_servers(const void *p1, const void *p2)
{
  int return_value;
  memcached_server_instance_st a= (memcached_server_instance_st)p1;
  memcached_server_instance_st b= (memcached_server_instance_st)p2;

  return_value= strcmp(a->hostname, b->hostname);
  if (return_value == 0) {
#ifdef KETAMA_HASH_COLLSION
#define PORT_LENGTH 6
    char a_port[PORT_LENGTH] = "";
    char b_port[PORT_LENGTH] = "";

    int a_length = snprintf(a_port, PORT_LENGTH, "%u", (uint32_t)a->port);
    int b_length = snprintf(b_port, PORT_LENGTH, "%u", (uint32_t)b->port);

    if (not (a_length >= PORT_LENGTH or a_length < 0 or b_length >= PORT_LENGTH or b_length < 0))
      return_value= strcmp(a_port, b_port);
    else
#endif
    return_value= (int) (a->port - b->port);
  }
  return return_value;
}

compare_servers2() 함수를 만들고 sort_hosts2() 함수를 만들어서 사용하는 방안이 있고, 테스트 코드를 수정하는 방안이 있습니다. 테스트 코드 수정 시 port 비교를 snprintf(port_str, "%u", port) 하여 strcmp() 함수를 호출하도록 변경하면 될 것 같습니다. 어느 쪽 방안으로 하면 좋을까요?

SuhwanJang

@uhm0311 리뷰 하였습니다.
테스트 코드도 arcus node 선택 기준에 맞춰 수정해야 합니다.

SuhwanJang · 2022-02-28T04:33:13Z

libmemcached/hosts.cc

+    if (not (a_length >= PORT_LENGTH or a_length < 0 or b_length >= PORT_LENGTH or b_length < 0))
+      return_value= strcmp(a_port, b_port);
+    else
+#endif


server, java-client 와 동일하게 port 를 문자열로 비교하려고 그런 것이네요.
매번 port 문자열을 만드는 것보다 memcached_server_instance_st 구조체에 port 문자열 필드를 추가하고 port 설정하는 시점에 같이 문자열도 설정하여 사용하는 것이 나은 것 같습니다.
아니면 hostname 필드에 hostname:port 문자열을 넣고 이를 통해 비교하는게 좋을 것 같은데 hostname 이 바뀜에 따라서 기존 동작에 문제가 되는 부분이 있는지 확인 바랍니다.

SuhwanJang · 2022-02-28T04:35:40Z

libmemcached/hosts.cc

@@ -228,6 +251,10 @@ static memcached_return_t update_continuum(memcached_st *ptr)
  }
 #endif

+#ifdef KETAMA_HASH_COLLSION
+  if (not ptr->flags.use_sort_hosts)


use_sort_hosts 가 false 인데 sort_hosts 를 하는 코드를 두면 이상하지 않나요?
arcus init 시에 use_sort_hosts 를 true 로 설정하면 됩니다.

uhm0311 · 2022-02-28T12:44:55Z

@SuhwanJang

리뷰 반영했습니다.
dispatch_host() 에서 변경 사항이 있습니다.

while (left < right)
{
  middle= left + (right - left) / 2;
  if (middle->value < hash)
    left= middle + 1;
  else
#ifdef KETAMA_HASH_COLLSION
  {
#endif
    right= middle;
#ifdef KETAMA_HASH_COLLSION
    if (right->value == hash)
      break;
  }
#endif
}
if (right == end)
  right= begin;
#ifdef KETAMA_HASH_COLLSION
if (right != begin) {
  left= right - 1;
  while (left->value == right->value) {
    right= left;
    if (right == begin) {
      break;
    }
    left= right - 1;
  }
}
#endif

서버가 1개일 때 middle == begin이며 right = middle - 1로 대입할 경우 seg fault가 발생합니다.
또한 middle == begin일 때 right != end이면서 right == begin이므로 아래 조건문도 변경했습니다.

SuhwanJang

@uhm0311 리뷰하였습니다.

SuhwanJang · 2022-03-02T01:25:59Z

libmemcached/server.h

@@ -105,6 +105,9 @@ struct memcached_server_st {
  char read_buffer[MEMCACHED_MAX_BUFFER];
  char write_buffer[MEMCACHED_MAX_BUFFER];
  char hostname[MEMCACHED_NI_MAXHOST];
+#ifdef KETAMA_HASH_COLLSION
+  char port_string[MEMCACHED_MAX_PORT_LENGTH];


이전 코멘트에 적어두었지만, hostname 필드에 port 기입이 가능하면 hostname 필드를 사용하는 방안이 나은 것 같습니다.
가능한지 확인바랍니다.기존에 hostname, port 를 별도 필드로 두어야 하는 이유가 있는지를 보면 됩니다.

@SuhwanJang

arcus-c-client/libmemcached/connect.cc

Lines 134 to 160 in cfe538c

char str_port[NI_MAXSERV];

int length= snprintf(str_port, NI_MAXSERV, "%u", (uint32_t)server->port);

if (length >= NI_MAXSERV or length < 0)

{

return MEMCACHED_FAILURE;

}

struct addrinfo hints;

memset(&hints, 0, sizeof(struct addrinfo));

#if 0

hints.ai_family= AF_INET;

#endif

if (server->type == MEMCACHED_CONNECTION_UDP)

{

hints.ai_protocol= IPPROTO_UDP;

hints.ai_socktype= SOCK_DGRAM;

}

else

{

hints.ai_socktype= SOCK_STREAM;

hints.ai_protocol= IPPROTO_TCP;

}

server->address_info= NULL;

int errcode;

switch(errcode= getaddrinfo(server->hostname, str_port, &hints, &server->address_info))

getaddrinfo() 함수 구현

port를 string으로 변환하여 사용하고 있습니다. getaddrinfo() 함수를 수행할 때 hostname과 port가 모두 정해져야 지정된 노드에 연결이 가능하므로 port 정보를 hostname에 통합하지 않고 따로 갖고 있는 건 맞아 보입니다.

arcus-c-client/libmemcached/server.h

Lines 182 to 183 in cfe538c

LIBMEMCACHED_API

in_port_t memcached_server_port(const memcached_server_instance_st self);

내부적으로만 보자면 숫자 port를 사용하지 않고 문자열로 변환하여 사용하므로 문자열 port만 갖고 있어도 될 것 같습니다. 그런데 port 정보를 memcached_server_port() 함수로 응용에 노출하고 있기 때문에 우선은 둘 다 갖고 있는 게 좋아 보입니다.

@uhm0311
hostname 에서 port 문자열 추출 및 이를 정수로 변환해서 사용하면 됩니다만 따로 둬서 사용하는 곳들이 많아서 기존 방식 유지하는 것이 좋을 것 같습니다.

필드명 변경

port_string => str_port

아래 함수를 보면, str_port 사용하는 코드가 있습니다.
여기도 이미 알고 있는 str_port 사용하도록 하시죠.

그리고, 질문이 하나 있는 데
아래 함수에서 str_port 생성하기 위해 NI_MAXSERV 크기의 char array 사용합니다.
NI_MAXSERV 사용하는 이유는 무엇일까요?

static memcached_return_t set_hostinfo(memcached_server_st *server) { if (server->address_info) { freeaddrinfo(server->address_info); server->address_info= NULL; server->address_info_next= NULL; } char str_port[NI_MAXSERV]; int length= snprintf(str_port, NI_MAXSERV, "%u", (uint32_t)server->port); if (length >= NI_MAXSERV or length < 0) { return MEMCACHED_FAILURE; } . . . }

SuhwanJang · 2022-03-02T01:30:26Z

libmemcached/arcus.cc

@@ -100,6 +100,9 @@ static inline arcus_return_t do_arcus_init(memcached_st *mc,
  mc->flags.use_sort_hosts= false;
  memcached_behavior_set(mc, MEMCACHED_BEHAVIOR_NO_BLOCK,        1);
  memcached_behavior_set(mc, MEMCACHED_BEHAVIOR_TCP_NODELAY,     1);
+#ifdef KETAMA_HASH_COLLSION
+  memcached_behavior_set(mc, MEMCACHED_BEHAVIOR_SORT_HOSTS,      1);


위에서 use_sort_hosts= false 설정하므로 여기서 다시 변경하는 것은 맞지 않습니다.
위에서 false 를 true 로 변경하면 되지 않나요?

SuhwanJang · 2022-03-02T01:37:24Z

libmemcached/hash.cc

+          left= right - 1;
+        }
+      }
+#endif


코드가 이해하기 어려워 아래와 같이 하고 문제 없는 지 검토 바랍니다.

while (left <= right) { middle= left + (right - left) / 2; if (middle->value < hash) left= middle + 1; else if (middle->value > hash) right= middle - 1; else break; /* found */ } if (middle->value == hash) { while (middle > begin && (middle-1)->value == hash) { /* duplicate hpoints */ middle= middle-1; } return middle->index; } else { return (left >= end ? begin->index : left->index); }

@SuhwanJang

hash 값이 (end - 1)->value 보다 큰 경우

right == end, left == right - 1인 상황이 오게 됩니다. 이 때 middle == right - 1이 될 것이고, 그 다음 left = middle + 1이므로 left == end가 됩니다. left <= right 조건을 만족하므로 left == middle == right == end, left == end + 1이 됩니다. while 문을 빠져나와 middle->value 접근은 곧 end->value 접근이므로 seg fault가 날 수 있습니다.

left < right 조건으로 변경하면 문제가 없어 보입니다.

hash 값이 begin->value 보다 작은 경우

문제 없는 것으로 보입니다.

hash 값이 begin->value 이상이며 (end-1)->value 이하이지만 정확히 맞는 값은 없는 경우

문제 없는 것으로 보입니다.

현재 논의되는 이진탐색 부분은 java TreeMap의 ceilingKey() 함수로 가져오는 부분을 구현한 것입니다. 그래서 3번과 같은 경우, left == right - 1인 상황에서 hash 값이 left->value 보다 크고 right->value 보다 작을 때 기존과 같이 right 값을 반환하는 것이 ceilingKey() 함수의 구현이라는 관점에서 보면 더욱 직관적이라고 생각합니다.

@uhm0311
hash 값이 (end - 1)->value 보다 큰 경우 문제가 있네요.
그런데 while 문 빠져나오기 전에 에러가 날 것이고, left < right 조건으로 변경하면 다른 케이스에서 결과가 틀려져 맞지 않습니다.

while 문에 left < end 조건이 필요하네요.

while (left <= right && left < end) { middle= left + (right - left) / 2; if (middle->value < hash) left= middle + 1; else if (middle->value > hash) right= middle - 1; else break; /* found */ } if (middle->value == hash) { while (middle > begin && (middle-1)->value == hash) { /* duplicate hpoints */ middle= middle-1; } return middle->index; } else { return (left == end ? begin->index : left->index); }

@SuhwanJang

if (hash > (end-1)->value || hash < begin->value) return begin->index; while (left < right) { middle= left + (right - left) / 2; if (middle->value < hash) left= middle + 1; else { right= middle; if (right->value == hash) break; } } while (right > begin && (right - 1)->value == hash) right= right - 1; return right->index;

이런 코드는 어떨까요? middle이나 left를 다루는 것 보다 right를 다루는 것이 ceilingKey() 함수의 구현이라는 관점에서 그 의미를 살리는 방향이라고 생각합니다.

@uhm0311 괜찮은 것 같습니다. 처음 if 문은 수정하죠

if (hash > (end-1)->value || hash <= begin->value) return begin->index;

@SuhwanJang @uhm0311
binary search 관련 코멘트하면, 아래 코드가 읽기가 쉬운 것 같습니다.

begin= left= ptr->ketama.info->continuum; end= right= ptr->ketama.info->continuum + num - 1; while (left <= right) { middle= left + (right - left) / 2; if (middle->value == hash) break; if (middle->value < hash) left= middle + 1; else right= middle - 1; } if (left <= right) { while (middle > begin && (middle-1)->value == hash) { /* duplicate hpoints */ middle= middle-1; } return middle->index; } else { return (left <= end ? left->index : begin->index); }

@jhpark816

left == right - 1이고 left->value < hash < right->value라고 가정합시다.
middle = left가 대입됩니다.
middle->value < hash이므로 left = middle + 1, 곧 left == right가 됩니다.
이 때 while (left <= right)이면 middle = right가 대입됩니다.
middle->value > hash이므로 right = middle - 1, 곧 left - 1 == right가 됩니다.
while (left <= right)를 빠져 나와 if 문에서 left > right이므로 else로 들어갑니다.
left->value == (left-1)->value인 경우에 대한 고려가 else 문에는 없습니다.
이러한 문제로 인해 while 문을 left < right 조건으로 둔 것이며, 조건을 따지지 않고 right->value == (right-1)->value인 경우를 고려하도록 한 것입니다.

uhm0311 · 2022-03-02T06:03:05Z

@SuhwanJang

리뷰 반영했습니다.

SuhwanJang · 2022-03-02T06:10:11Z

@jhpark816 @uhm0311
리뷰 완료하였습니다. use_sort_hosts 는 사용할 일이 없어보여 제거해도 될 것 같은데 의견 부탁드립니다.

jhpark816

일부 review.

port 비교 부분에 대해 코멘트를 먼저 남깁니다.

jhpark816 · 2022-03-02T06:30:47Z

libmemcached/hosts.cc

    return_value= (int) (a->port - b->port);
+#endif


port number를 string 변환하여 비교할 필요가 없습니다.
int 타입 캐스팅 결과를 리턴하는 데, 기존 코드가 문제되는 지를 시험을 통해 먼저 확인해 주세요.

문제가 된다면, 아래 코드를 다시 확인 바랍니다.

return_value= (int)a->port - (int)b->port;

@jhpark816

java-client와 동일한 방식으로 정렬하기 위해서는 ("%s:%u", hostname, port)를 하나의 문자열로 취급하여 strcmp() 함수를 호출해야 합니다.
기존의 정렬 방식이 hostname은 strcmp() 함수로, port는 숫자로 정렬합니다.

a->port == 350, b->port == 1150인 경우 숫자로 비교하면 a가 앞으로 가지만 문자열로 비교하면 b가 앞으로 갑니다. 이 방식이 java-client에서의 방식과 동일합니다.

문제가 된다면, 아래 코드를 다시 확인 바랍니다.

해당 코드와는 무관합니다.

@uhm0311 OK. 이해했습니다.

jhpark816

추가 리뷰입니다.

jhpark816 · 2022-03-02T07:49:06Z

libmemcached/arcus.cc

  mc->flags.use_sort_hosts= false;
+#endif


use_sort_hosts == true 이면,
앞서 수정한 udpate_cachelist 동작에 비효율성이 생기게 될 것 같은 데,
어떻게 처리하면 좋을까요?

@jhpark816
sort_hosts 작업을 update_continuum() 내부로 옮겨 master update 시에만 수행하도록 처리해야 할 것 같습니다.

static memcached_return_t update_continuum(memcached_st *ptr) { uint32_t continuum_index= 0; memcached_server_st *list; uint32_t pointer_counter= 0; uint32_t pointer_per_server= MEMCACHED_POINTS_PER_SERVER; uint32_t pointer_per_hash= 1; uint32_t live_servers= 0; struct timeval now; #ifdef LIBMEMCACHED_WITH_ZK_INTEGRATION arcus_st *arcus= static_cast<arcus_st *>(memcached_get_server_manager(ptr)); if (arcus && arcus->pool) { if (memcached_pool_get_master(arcus->pool) != ptr) { /* get the reference to the ketama hash of the master */ memcached_st *master = memcached_pool_get_master(arcus->pool); memcached_ketama_release(ptr); memcached_ketama_reference(ptr, master); return MEMCACHED_SUCCESS; } } #endif if (ptr->flags.use_sort_hosts) { sort_hosts(ptr); } ...

@SuhwanJang

해당 부분에 대해 저도 고민했었습니다. 그런데 master mc에 대해서만 정렬을 하면 master mc의 서버 리스트는 정렬되어 올바른 continuum_item->index를 가리키는데, 해당 index로 member mc의 서버 객체에 접근하면 의도하지 않은 서버 객체를 가져오기 때문에 모두 정렬해줘야 하는 것으로 보았습니다.

@uhm0311
아 그렇네요. servers 는 member 걸로 사용하므로 문제 있겠네요.

@uhm0311 @SuhwanJang

정렬해야 한다면, master, member mc 모두 정렬해야 합니다.

비효율성이 생기는 부분은 아래 함수입니다.

memcached_update_cachelist_with_master()

그래서, 아래 방안들을 검토해 주시죠.

충돌이 생기는 hpoint에 대해서만 정렬하여 처리하는 방안

전체 정렬할 경우, memcached_update_cachelist_with_master() 수정 방안

@jhpark816 @SuhwanJang

충돌이 생기는 hpoint에 대해서만 정렬하여 처리하는 방안

현재 형태의 코드가 되기 전에 continuum_item 구조체에 memcached_server_st 포인터와 memcached_rgroup_st 포인터를 추가하여 hash 값이 같은 경우에 replication 여부에 따라 server->hostname, server->port_string 혹은 rgorup->groupname 기준으로 정렬하게 했었습니다. 이 때 메모리 사용량이 높아서 리뷰를 받아 코드가 수정되었습니다. continuum_item 구조체에 memcached_server_st 포인터와 memcached_rgroup_st 포인터를 추가하는 방안은 C++ 라이브러리를 도입하지 않고 수정하는 하나의 방안이 될 수 있습니다.

@uhm0311
memcached_continuum_item_st 구조체에 node index가 있으니,
memcached_server_st 포인터와 memcached_rgroup_st 포인터는 없어도 될 것입니다.

어떤 형태의 코드가 되는 지가 궁금합니다.

@jhpark816

qsort() 함수를 사용하기 위해서는 compare 함수가 필요하고, compare 함수는 return type int에 const void* 두 개 만을 매개변수로 사용합니다. 전역으로 서버 리스트 정보를 두지 않으면 compare 함수 내에서 현재 비교 대상인 두 개의 continuum_item 구조체를 통해 서버 정보를 가져올 방법이 없습니다. 그래서 continuum_item 구조체에 memcached_server_st 포인터와 memcached_rgroup_st 포인터를 추가하는 방안을 사용했습니다.

전역으로 서버 리스트 정보를 둔다고 하면 lock을 걸어야 할 것 같습니다. creation thread, worker thread, zk manager thread에 의해 update_continuum() 함수가 호출될 수 있기 때문입니다.

update_continuum() 함수의 매개변수로 들어오는 mc가 master mc인 경우에만 서버 리스트 정보를 전역으로 둔다고 가정해도 lock이 필요합니다. memcached_behavior_set() 함수나 memcached_behavior_set_distribution() 함수에 의해 master mc에 대해 run_distribution() 함수가 호출되기 때문입니다. 이 때 zk manager thread와 creation thread에 의한 동시 접근 문제가 발생할 수 있습니다.

@jhpark816

전역으로 서버 리스트 정보를 둔다고 하면 lock을 걸어야 할 것 같습니다. creation thread, worker thread, zk manager thread에 의해 update_continuum() 함수가 호출될 수 있기 때문입니다.

이 부분은 잘못 생각한 것 같습니다. master mc의 서버 리스트가 변화하는 경우만 상정하면 되기 때문입니다.

uhm0311 · 2022-03-02T11:03:04Z

@jhpark816 @SuhwanJang

충돌이 생기는 hpoint에 대해서만 정렬하여 처리하는 방안

캐시 리스트에 대해선 정렬하지 않고, 전역으로 캐시 리스트 정보를 둔 다음 continuum_item compare 함수에서 hash 값이 동일한 경우 전역 캐시 리스트[item->index]를 통해 정렬하도록 했습니다.

jhpark816

review 완료

jhpark816 · 2022-03-07T06:02:11Z

libmemcached/hash.cc

+#ifdef KETAMA_HASH_COLLSION
+      if (hash > (end-1)->value || hash <= begin->value)
+        return begin->index;
+#endif


위의 if 문은 추가하지 않도록 합시다.

위의 if 조건으로 바로 결과를 얻는 경우가 희박합니다.

그 외의 경우는 수행 cost가 가지게 됩니다.

jhpark816 · 2022-03-07T06:02:47Z

libmemcached/hash.cc

+          if (right->value == hash)
+            break;
+        }
+#endif


위의 비교도 추가하지 않도록 합시다.
즉, 기존 로직을 그대로 유지하시죠.

jhpark816 · 2022-03-07T06:04:17Z

libmemcached/hash.cc

+#ifdef KETAMA_HASH_COLLSION
+      while (right > begin && (right-1)->value == hash)
+        right= right-1;
+#else
      if (right == end)
        right= begin;


이 위치에 while 문으로 first item 찾는 로직만 추가하도록 하시죠.

기존 로직을 그대로 유지하면서
smallest host 찾는 로직만 추가하는 형태가 됩니다.

jhpark816 · 2022-03-07T06:04:51Z

libmemcached/hosts.cc

+memcached_rgroup_st *rgroups= NULL;
+#endif
+memcached_server_st *servers= NULL;
+#endif


변수 명을 more specific하게 변경하시죠.

cmp_rgroups

cmp_servers

jhpark816 · 2022-03-07T06:11:19Z

libmemcached/hosts.cc

+  {
+#ifdef ENABLE_REPLICATION
+    if (rgroups != NULL)
+      return strcmp(rgroups[ct1->index].groupname, rgroups[ct2->index].groupname);


중괄호({ })를 붙입시다.

jhpark816 · 2022-03-07T06:12:36Z

libmemcached/hosts.cc

+        return_value= strcmp(servers[ct1->index].port_string, servers[ct2->index].port_string);
+
+      return return_value;
+    }


함수의 completeness를 위해 이 위치에 retun 0; 문장을 추가합시다.

jhpark816 · 2022-03-07T06:14:23Z

libmemcached/hosts.cc

    return 0;
+#endif


결국, 아래와 같이 코드 정리하는 것이 좋겠습니다.

{ #ifdef KETAMA_HASH_COLLSION #ifdef ENABLE_REPLICATION if (rgroups != NULL) { return strcmp(rgroups[ct1->index].groupname, rgroups[ct2->index].groupname); } #endif if (servers != NULL) { int return_value= strcmp(servers[ct1->index].hostname, servers[ct2->index].hostname); if (return_value == 0) return_value= strcmp(servers[ct1->index].port_string, servers[ct2->index].port_string); return return_value; } #endif return 0; }

사소한 것이지만, 아래 형태로 코드를 작성하시죠.
이유는 ifdef KETAMA_HASH_COLLSION 문장을 하나만 넣으려고 합니다.

```C { #ifdef KETAMA_HASH_COLLSION #ifdef ENABLE_REPLICATION if (rgroups != NULL) { return strcmp(rgroups[ct1->index].groupname, rgroups[ct2->index].groupname); } #endif if (servers != NULL) { int return_value= strcmp(servers[ct1->index].hostname, servers[ct2->index].hostname); if (return_value == 0) return_value= strcmp(servers[ct1->index].port_string, servers[ct2->index].port_string); return return_value; } #endif return 0; }

jhpark816 · 2022-03-07T06:15:47Z

libmemcached/server.h

@@ -105,6 +105,9 @@ struct memcached_server_st {
  char read_buffer[MEMCACHED_MAX_BUFFER];
  char write_buffer[MEMCACHED_MAX_BUFFER];
  char hostname[MEMCACHED_NI_MAXHOST];
+#ifdef KETAMA_HASH_COLLSION
+  char port_string[MEMCACHED_MAX_PORT_LENGTH];


필드명 변경

port_string => str_port

jhpark816 · 2022-03-07T06:19:40Z

libmemcached/server.h

@@ -105,6 +105,9 @@ struct memcached_server_st {
  char read_buffer[MEMCACHED_MAX_BUFFER];
  char write_buffer[MEMCACHED_MAX_BUFFER];
  char hostname[MEMCACHED_NI_MAXHOST];
+#ifdef KETAMA_HASH_COLLSION
+  char port_string[MEMCACHED_MAX_PORT_LENGTH];


아래 함수를 보면, str_port 사용하는 코드가 있습니다.
여기도 이미 알고 있는 str_port 사용하도록 하시죠.

그리고, 질문이 하나 있는 데
아래 함수에서 str_port 생성하기 위해 NI_MAXSERV 크기의 char array 사용합니다.
NI_MAXSERV 사용하는 이유는 무엇일까요?

static memcached_return_t set_hostinfo(memcached_server_st *server) { if (server->address_info) { freeaddrinfo(server->address_info); server->address_info= NULL; server->address_info_next= NULL; } char str_port[NI_MAXSERV]; int length= snprintf(str_port, NI_MAXSERV, "%u", (uint32_t)server->port); if (length >= NI_MAXSERV or length < 0) { return MEMCACHED_FAILURE; } . . . }

jhpark816 · 2022-03-07T06:20:09Z

libmemcached/server.cc

+
+  return self->port_string;
+}
+#endif


이 함수는 호출하는 데가 없으므로, 제거하도록 하시죠.

@jhpark816

NI_MAXSERV 사용하는 이유는 무엇일까요?

getaddrinfo() 함수의 첫 인자는 hostname이고, 다음 인자는 service name입니다.
일반적으로 port 번호를 주로 사용하지만, 그렇지 않은 경우도 있습니다.
미리 정의된 포트인 ftp, ssh, http 등과 같은 경우가 이에 해당합니다.
service name으로 ftp를 주면 21로, ssh를 주면 22로, http를 주면 80으로 변환됩니다.
이 때 service name의 최대 길이를 NI_MAXSERV로 정의해둔 것입니다.
port 번호를 문자열에 담기 위해 service name의 최대 길이인 NI_MAXSERV를 사용한 것으로 보입니다.

uhm0311 · 2022-03-07T07:22:23Z

@jhpark816

리뷰 반영했습니다.

jhpark816

review 완료

jhpark816 · 2022-03-07T07:14:36Z

libmemcached/hosts.cc

    return 0;
+#endif


사소한 것이지만, 아래 형태로 코드를 작성하시죠.
이유는 ifdef KETAMA_HASH_COLLSION 문장을 하나만 넣으려고 합니다.

```C { #ifdef KETAMA_HASH_COLLSION #ifdef ENABLE_REPLICATION if (rgroups != NULL) { return strcmp(rgroups[ct1->index].groupname, rgroups[ct2->index].groupname); } #endif if (servers != NULL) { int return_value= strcmp(servers[ct1->index].hostname, servers[ct2->index].hostname); if (return_value == 0) return_value= strcmp(servers[ct1->index].port_string, servers[ct2->index].port_string); return return_value; } #endif return 0; }

jhpark816 · 2022-03-07T07:18:50Z

libmemcached/server.cc

@@ -75,6 +75,9 @@ static inline void _server_init(memcached_server_st *self, memcached_st *root,
 #ifdef IMMEDIATELY_RECONNECT
  self->immediate_reconnect= false;
 #endif
+#ifdef KETAMA_HASH_COLLSION
+  snprintf(self->str_port, MEMCACHED_MAX_PORT_LENGTH, "%u", (uint32_t)port);
+#endif


코드 위치를 hostname 설정 직후로 옮깁시다.

jhpark816 · 2022-03-07T07:26:19Z

libmemcached/server.h

@@ -106,6 +106,9 @@ struct memcached_server_st {
  char read_buffer[MEMCACHED_MAX_BUFFER];
  char write_buffer[MEMCACHED_MAX_BUFFER];
  char hostname[MEMCACHED_NI_MAXHOST];
+#ifdef KETAMA_HASH_COLLSION
+  char str_port[MEMCACHED_MAX_PORT_LENGTH];
+#endif


기존 코드가 port 문자열을 담기 위해 NI_MAXSERV 사용하고 있으므로,
일관되게 아래와 같이 정의하여 사용하시죠.

참고로, 이 파일에 MEMCACHED_NI_MAXHOST가 define 되어 있으므로,
다음 위치에 MEMCACHED_NI_MAXSERV 정의하여 사용하시죠.

#ifdef NI_MAXHOST #define MEMCACHED_NI_MAXHOST NI_MAXHOST #else #define MEMCACHED_NI_MAXHOST 1025 #endif #ifdef NI_MAXSERV #define MEMCACHED_NI_MAXSERV NI_MAXSERV #else #define MEMCACHED_NI_MAXSERV 32 #endif

…r#113

uhm0311 · 2022-03-07T07:50:56Z

@jhpark816

리뷰 반영했습니다.

jhpark816 · 2022-03-07T07:55:46Z

리뷰 완료

@SuhwanJang 최종 리뷰 바랍니다.

SuhwanJang · 2022-03-08T02:10:09Z

@uhm0311 @jhpark816 PR 확인하였습니다.

uhm0311 requested review from jhpark816 and SuhwanJang February 24, 2022 09:10

uhm0311 force-pushed the uhm0311/hash_collision branch from 9bbc588 to 3a7cb39 Compare February 25, 2022 01:36

SuhwanJang reviewed Feb 25, 2022

View reviewed changes

uhm0311 changed the title ~~ENHANCE: Select smallest owner when ketama hash collided #113~~ ENHANCE: Select smallest host as owner when ketama hash collided #113 Feb 25, 2022

uhm0311 force-pushed the uhm0311/hash_collision branch from 3a7cb39 to b3b8970 Compare February 25, 2022 08:07

uhm0311 force-pushed the uhm0311/hash_collision branch 6 times, most recently from 94052a5 to e529693 Compare February 25, 2022 10:38

uhm0311 force-pushed the uhm0311/hash_collision branch 2 times, most recently from ca7287d to 9f78f04 Compare February 28, 2022 04:16

SuhwanJang reviewed Feb 28, 2022

View reviewed changes

uhm0311 force-pushed the uhm0311/hash_collision branch 3 times, most recently from efd73c5 to 3d63f42 Compare February 28, 2022 12:40

uhm0311 force-pushed the uhm0311/hash_collision branch from 3d63f42 to 32a94d4 Compare March 1, 2022 04:00

SuhwanJang reviewed Mar 2, 2022

View reviewed changes

uhm0311 force-pushed the uhm0311/hash_collision branch from 32a94d4 to 6f8a01f Compare March 2, 2022 05:49

jhpark816 reviewed Mar 2, 2022

View reviewed changes

uhm0311 force-pushed the uhm0311/hash_collision branch from 6f8a01f to ce80798 Compare March 2, 2022 10:56

jhpark816 reviewed Mar 7, 2022

View reviewed changes

uhm0311 force-pushed the uhm0311/hash_collision branch from ce80798 to 1d53c36 Compare March 7, 2022 06:48

jhpark816 reviewed Mar 7, 2022

View reviewed changes

uhm0311 force-pushed the uhm0311/hash_collision branch from 1d53c36 to b7fcc34 Compare March 7, 2022 07:35

ENHANCE: Select smallest host as owner when ketama hash collided nave…

49dcab9

…r#113

uhm0311 force-pushed the uhm0311/hash_collision branch from b7fcc34 to 49dcab9 Compare March 7, 2022 07:49

jhpark816 merged commit 83073f1 into naver:develop Mar 8, 2022

uhm0311 deleted the uhm0311/hash_collision branch March 14, 2022 08:07

	char str_port[NI_MAXSERV];
	int length= snprintf(str_port, NI_MAXSERV, "%u", (uint32_t)server->port);
	if (length >= NI_MAXSERV or length < 0)
	{
	return MEMCACHED_FAILURE;
	}

	struct addrinfo hints;
	memset(&hints, 0, sizeof(struct addrinfo));

	#if 0
	hints.ai_family= AF_INET;
	#endif
	if (server->type == MEMCACHED_CONNECTION_UDP)
	{
	hints.ai_protocol= IPPROTO_UDP;
	hints.ai_socktype= SOCK_DGRAM;
	}
	else
	{
	hints.ai_socktype= SOCK_STREAM;
	hints.ai_protocol= IPPROTO_TCP;
	}

	server->address_info= NULL;
	int errcode;
	switch(errcode= getaddrinfo(server->hostname, str_port, &hints, &server->address_info))

	LIBMEMCACHED_API
	in_port_t memcached_server_port(const memcached_server_instance_st self);

ENHANCE: Select smallest host as owner when ketama hash collided #113 #237

ENHANCE: Select smallest host as owner when ketama hash collided #113 #237

Conversation

uhm0311 commented Feb 24, 2022

SuhwanJang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uhm0311 commented Feb 25, 2022

SuhwanJang commented Feb 28, 2022

uhm0311 commented Feb 28, 2022

SuhwanJang left a comment

Choose a reason for hiding this comment

SuhwanJang Feb 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uhm0311 commented Feb 28, 2022

SuhwanJang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SuhwanJang Mar 2, 2022 • edited Loading

Choose a reason for hiding this comment

SuhwanJang Mar 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SuhwanJang Mar 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uhm0311 commented Mar 2, 2022

SuhwanJang commented Mar 2, 2022

jhpark816 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhpark816 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uhm0311 commented Mar 2, 2022

jhpark816 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uhm0311 commented Mar 7, 2022

jhpark816 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uhm0311 commented Mar 7, 2022

jhpark816 commented Mar 7, 2022

SuhwanJang commented Mar 8, 2022

SuhwanJang Feb 28, 2022 •

edited

Loading

SuhwanJang Mar 2, 2022 •

edited

Loading

SuhwanJang Mar 2, 2022 •

edited

Loading

SuhwanJang Mar 2, 2022 •

edited

Loading