-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP]Hopscotch concurrent map #106
base: master
Are you sure you want to change the base?
Conversation
http://people.csail.mit.edu/shanir/publications/disc2008_submission_98.pdf |
Что бросается в глаза при беглом просмотре:
|
You should change the authors to whom you copied this code from. This code bears a striking resemblance replete with bugs and deficiencies to this and this. The header clearly states that this code is free to use so long as due acknowledgement is given. You even have the same comments, the same dependencies to pthread, same variable names, other oddities like taking the values by pointer, and you also never use the value in the key/value hashmap. Correct me if I'm wrong but this is similar to the level of extreme suspicion. Edit: Grammar. |
@DaKellyFella we apoligise for this misunderstandings and surely will change it soon. |
Another issue is that the paper you link starts off by describing Hopscotch Hashing with each bucket having a fixed size neighbourhood ( For my research I use the implementation given here, which was written by the paper's original authors. The code can be quite hard to understand but it would be great if you modify it in accordance with what @khizmax stated and change your underlying algorithm so that is uses offsets rather than bit-masks. Edit: I added that the code I linked is written by the original authors. |
We don't see what paper you exactly want us to consider looking at. But, we agree that looking at source you provided link to is worth it. It would also be great if we could find some kind of comparison why we should use infinite size neighbourhoods. |
Sorry, I didn't explain it very well. Here's a more thorough explanation: There are two versions of Hopscotch.
The code you wrote/found is the first version. The second version is the code I linked. The second version is superior as it cannot suffer from bucket saturation where too many items get added to a single bucket, overflowing the bit-mask. If more than H items are added to a single bucket then the algorithm breaks. If you check the appendix of their paper you can see each bucket has two |
@DaKellyFella Thank you for the last link, that is what we were looking for, we will look into it soon. As fas as I understand for the first kind of realisation ("first version"), if it happened so that 32 items were added into the same bucket, then we just resize the table, rearrange items with new table size and try again. It can happen that it won't help, but it is almost impossible, isn't it? |
Yes, you're right. It should be noted the authors make the same argument but their final code is implemented with version 2. Perhaps there were other concerns relating to performance that they found and as such stuck with that version. You could measure the performance of both implementations and see for yourself. |
@khizmax, объясните пожалуйста как должным образом добавить тестирование в проект. По образу и подобию не можем сообразить - куда смотреть вообще? |
Начнем с того, что ваш set на самом деле является map'ом, раз у него в явном виде задаются Key и Value через template-параметры. У set данные не разделены на key и value. Но это в данном случае не принципиально. Есть два типа тестов:
Для добавления unit-тестов следует просто добавить новый файл hopschotch_map.cpp сюда: libcds\test\unit\striped-map\ и добавить этот файл в солюшн и в CMakeList.txt. По большому счету Вы вольны написать в hopschotch_map.cpp все что угодно, используя google test framework. Для добавления stress-тестов критерии пожестче: интерфейс вашего map'а должен соответствовать принятому в libcds интерфейсу map'а. Здесь libcds\test\stress\map\ находятся реализации stress-тестов для map. Вам необходимо создать файл map_type_hopscotch.h, в котором объявляются все интересные специализации Вашего map'а (см. в качестве примера map_type_cuckoo.h). Далее для каждого stress-теста (это подкаталоги libcds\test\stress\map) создается файл с вашей реализацией, в котором просто прописывается тот define, который вы объявили в map_type_hopscotch.h. См, например, libcds\test\stress\map\del3\map_del3_cuckoo.cpp). Libcds test framework разработан так, чтобы при добавлении новых реализаций контейнеров не нужно было изменять сам framework - файл main.cpp, файлы из libcds\test\stress\framework\ и c:\Works\libcds\test\include\cds_test. |
cds/container/hopscotch_hashmap.h
Outdated
template <typename K, typename... Args> | ||
bool emplace(K&& key, Args&&... args) | ||
{ | ||
mapped_type val = std::forward<Args>(args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Странная конструкция... Это работает в случае, если args - это 10 аргументов?..
Мне кажется, проще вызвать
insert( std::forward<Key>( key ), mapped_type( std::forward<Args>(args)... ))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Пока пишем остальной интерфейс не могу проверить, скорее всего так и есть.
cds/container/hopscotch_hashmap.h
Outdated
@@ -125,7 +125,8 @@ namespace cds { | |||
template <typename K> | |||
bool contains(K const& key) | |||
{ | |||
return get(key, [=](K const& one, K const& two) { return one != two; }) != NULL; | |||
DATA data = get(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khizmax, почему-то не хочет компилировать на этом моменте, второй день понять не можем почему, есть идеи?
MSVC: d:\sources\libcds\cds\container\hopscotch_hashmap.h(128): error C2672: 'cds::container::hopscotch_hashmap<cds_test::striped_map_fixture::key_type,cds_test::striped_map_fixture::value_type>::get': no matching overloaded function found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Во-первых, надо просто убрать = из [=]
, он тут не нужен:
return get(key, [](K const& one, K const& two) { return one != two; }) != nullptr;
Вторая причина - в template-параметре K функции: если этот тип отличен от template-параметра KEY класса, то конечно же такая лямбда не подойдет.
cds/container/hopscotch_hashmap.h
Outdated
DATA data = get(key); | ||
return data != NULL; | ||
{ | ||
return find_with(key, [=](K const& one, K const& two) { return one != two; }, [](mapped_type&) {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khizmax, посмотрите ещё пожалуйста вот эту конструкцию. Нет вопроса почему это не компилируется, но есть вопрос как это лучше разрулить в этой структуре: на место K из тестов подставляется структура cds_test::striped_map_fixture::key_type, для которой не отпределён оператор !=, а использовать cds_test::striped_map_fixture::cmp в файле реализации, очевидно, неправильно.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Как разрулить в этой структуре - я не знаю.
Во всех контейнерах типа map в libcds есть дополнительный третий template-параметр - traits
. Он задает характеристики мапы. Одна из характеристик - compare functor: это функтор, задающий сравнение элементов:
cmp( a, b )
< 0, если a < bcmp( a, b )
== 0, если a == bcmp( a, b )
> 0, если a > b
Например, если тип KEY естьstd::string
, этот функтор мог бы выглядеть как-то так:
struct compare {
int operator()( std::string const& lhs, std::string const& rhs ) const
{
return lhs.compare ( rhs );
}
int operator()( std::string const& lhs, char const* rhs ) const
{
return lhs.compare( rhs );
}
int operator()( chr const* lhs, std::string const& rhs ) const
{
return -rhs.compare( lhs );
}
};
Заметьте, что этот compare построен так, что в качестве ключа принимает один из типов std::string
или char const*
- компилятор сам разрулит, какой оператор следует вызвать.
Вообще, в traits выносятся все стратегии, чтобы можно было гибко настраивать контейнер под свои нужды, - hash-функтор, item_counter и т.п.
С таким подходом вам вообще не надо будет никаких лямбд.
И перестаньте везде использовать [=]
в вызовах лямбды, это до добра не доведет ;-)
[=]
значит, что вы в лямбду передаете копии всех локальных данных текущей функции, - это довольно редко нужно. В данном конкретном случае
[=](K const& one, K const& two) { return one != two; }
достаточно []
, так как тело лямбды использует только аргументы функции и ничего более. См. http://en.cppreference.com/w/cpp/language/lambda
|
||
~hopscotch_hashmap() { | ||
std::free(BUSY); | ||
std::free(segments_arys); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be delete[]
and not std::free
right? Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DaKellyFella, the aspect of memory allocation is a point to consider rewriting in this implementation. Probably, power-of-2-policy will be enough as a default policy, not MAX_SEGMENTS, for sure.
In general, you are right - delete should stand here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, power-of-2-policy will be enough as a default policy
Awesome, then you can have a mask instead of a modulus for calculating the table index.
cds/container/hopscotch_hashmap.h
Outdated
template <typename K, typename Predicate, typename Func> | ||
bool find_with(K const& key, Predicate pred, Func f) | ||
{ | ||
unsigned int hash = calc_hash(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hash function should probably return a std::size_t
so that all indexes can be reached using the hash. If they are capped at the max int then a table whose size is max int + 1
will never have those slots touched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DaKellyFella, I actually already changed this in a local copy, haven't committed it yet. 😄
cds/container/hopscotch_hashmap.h
Outdated
unsigned int hash = calc_hash(key); | ||
Bucket* start_bucket = segments_arys + hash; | ||
unsigned int try_counter = 0; | ||
unsigned int timestamp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same goes for timestamp as is for the hash. In order to prevent any misbehaviour or ABA on timestamp checking they should be as large as possible.
cds/container/hopscotch_hashmap.h
Outdated
++check_bucket; | ||
} | ||
++try_counter; | ||
} while (timestamp != start_bucket->_timestamp && try_counter < MAX_TRIES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MAX_TRIES
really shouldn't exist. It's a holdover from the initial version and it isn't a good addition. It breaks linearisability under heavy insertion and deletion contention. Keep trying until you get a timestamp match, there's no reason to have a limit on the times you check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DaKellyFella, since hop_info is limited to the sizeof(unsigned int), we can't keep tring because we can get undefined behaviour. In case of offsets' implementation that could be true, but it seems that not in this implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since hop_info is limited to the sizeof(unsigned int), we can't keep tring because we can get undefined behaviour.
No I mean MAX_TRIES
shouldn't exist. You should keep rereading the bucket until you have completed a scan of the buckets hop_info
indicates as useful until the timestamps haven't changed. That way you can guarantee nothing has changed since you scanned the table.
Does that make sense? Am I missing something? Apologies if I am.
cds/container/hopscotch_hashmap.h
Outdated
|
||
if (temp & 1) { | ||
if (cmp(key, *(check_bucket->_key)) == 0) { | ||
return *(check_bucket->_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lines are incorrect, again a bad holdover from the initial version. It could be the case that in between reading a matching key and then reading the data at that address that someone could have deleted that bucket and inserted a new item there - therefore reading the incorrect data from the table. This is because reading both the key and the value is not an atomic operation and can change during each.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DaKellyFella, see it. If we stay with hop_info-based implementation, may be we can do optimistic synchronization in current bucket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! You can read at will and then check the timestamp at the very end. It relates to the comment above. Once the before and after timestamps match you know for sure everything matches up.
@khizmax, из-за проблемы с Traits видится, что имеет смысл как-то заново синтегрировать алгоритм в библиотеку. Собственно, в этом основная проблема и состоит у нас сейчас: тесты регулярно требуют каких-то новых typedef, а после добавления всё ломается по новой, и до непосредственно проверки дойти даже не получается. 😞 |
Думаю, это потому, что вместо того, чтобы изучить внутреннее строение библиотеки перед кодированием и:
было принято распространенное на сегодняшний день решение - взять нечто уже готовое и выдать за свое, авось прокатит. Что ж, ok.
Честно говоря, ваша бригада первая, кто задает подобные вопросы. Остальные как-то сами разбираются... Contribution guide никогда не будет, - главное понять смысл, почему так сделано, а не следовать гайдам. Но если вы по результатам своих работ напишете что-то вроде такого guide, - обещаю, я выложу его сюда. На конкретные вопросы я готов ответить. |
Check, please, sychronization in get() method, is it correct now? Also, as I understand, the same problem is before calling fuction f in find_with() method. |
By @LeoSko, @Dron642 and me
Leonid Skorospelov 2303
Chulanov Andrey 2381
Stetskevich Roman 2381