Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recap of all issues + Other problems #52

Open
iColgateZz opened this issue Jul 16, 2024 · 0 comments
Open

Recap of all issues + Other problems #52

iColgateZz opened this issue Jul 16, 2024 · 0 comments

Comments

@iColgateZz
Copy link

iColgateZz commented Jul 16, 2024

What are the values for HT_PRIME_1 and HT_PRIME_2?

I used the following:

#define HT_INITIAL_BASE_SIZE 53
#define HT_PRIME_1 163
#define HT_PRIME_2 157

#24 Hash function computing not the way described in the pseudocode

Just take the modulus operator out of the loop and get the following:

static int ht_hash(const char* s, const int a, const int m) {
    long hash = 0;
    const int len_s = strlen(s);
    for (int i = 0; i < len_s; i++)
        hash += (long)pow(a, len_s - (i+1)) * s[i];
    hash = hash % m;
    return (int)hash;
}

#51 #45 #12 Infinite loop caused by the double hashing

the solution is explained in issue #35. The code should be as follows:

static int ht_get_hash(const char* s, const int num_buckets, const int attempt) {
    const int hash_a = ht_hash(s, HT_PRIME_1, num_buckets);
    int hash_b = ht_hash(s, HT_PRIME_2, num_buckets);
    if (hash_b % num_buckets == 0)
        hash_b = 1;
    return (hash_a + (attempt * hash_b)) % num_buckets;
}

It handles 2 cases:

  1. When hash_b % num_buckets == 0. That was the problematic place causing the infinite loop.
  2. When hash_b == 0. This was the problem described by the author. It is also handled by the condition
    if (hash_b % num_buckets == 0) hash_b = 1; .

Modifying the delete function

Here is the one provided by the author:

void ht_delete(ht_hash_table* ht, const char* key) {
    int index = ht_get_hash(key, ht->size, 0);
    ht_item* item = ht->items[index];
    int i = 1;
    while (item != NULL) {
        if (item != &HT_DELETED_ITEM) {
            if (strcmp(item->key, key) == 0) {
                ht_del_item(item);
                ht->items[index] = &HT_DELETED_ITEM;
            }
        }
        index = ht_get_hash(key, ht->size, i);
        item = ht->items[index];
        i++;
    } 
    ht->count--;
}

As a hashtable does not support same keys, why do we then keep on iterating if we found the item with the key that matches the provided one? I believe it should be:

void ht_delete(ht_hash_table* ht, const char* key) {
    int index = ht_get_hash(key, ht->size, 0);
    ht_item* item = ht->items[index];
    int i = 1;
    while (item != NULL) {
        if (item != &HT_DELETED_ITEM) {
            if (strcmp(item->key, key) == 0) {
                ht_del_item(item);
                ht->items[index] = &HT_DELETED_ITEM;
                ht->count--;
                return;
            }
        }
        index = ht_get_hash(key, ht->size, i);
        item = ht->items[index];
        i++;
    } 
}

The problem with resizing down

First of all, I think the per cent when the hash table should be resized should be greater. 10% seems to be too small. I might be wrong here, but imagine your hash table has space for 100 key-value pairs and you wait for it to only have 9 items to resize it. Isn't that a huge loss of memory space? I changed it to 30%. So the delete function now looks like this:

void ht_delete(ht_hash_table* ht, const char* key) {
    const int load = ht->count * 100 / ht->size;
    if (load < 30)
        ht_resize_down(ht);
    int index = ht_get_hash(key, ht->size, 0);
    ht_item* item = ht->items[index];
    int i = 1;
    while (item != NULL) {
        if (item != &HT_DELETED_ITEM) {
            if (strcmp(item->key, key) == 0) {
                ht_del_item(item);
                ht->items[index] = &HT_DELETED_ITEM;
                ht->count--;
                return;
            }
        }
        index = ht_get_hash(key, ht->size, i);
        item = ht->items[index];
        i++;
    } 
}

Secondly, increasing the per cent to 30 helps with cleaning the dictionary. The author explained very well why we do not delete an item from the dictionary but free it and then change it to a pointer to the sentinel value. But the problem here is that the dictionary gets dirty. The more pointers to sentinel values there are, the less space there is for the actual key-value pairs and the more unnecessary computations need to be done in order to find the right index in the list to insert, search or delete.

The problem with resizing down. Part 2

In the resize function, we check whether the provided base_size is less than HT_INITIAL_BASE_SIZE. If the condition is true, we just return. But coming back to the part 1, I would allow it to resize to get rid of all space consuming and useless sentinel values. I propose the following:

if (base_size < HT_INITIAL_BASE_SIZE)
        base_size = HT_INITIAL_BASE_SIZE;

Do not finish function execution there , but set the base_size value to the allowed minimum. In order to do that you need to change base_size to int. I guess, you might argue that resizing the dictionary more often is not that performant, but while playing with this hash table I once encountered a situation when I could not insert a key-value pair for quite a while because of the amount of sentinel values and collisions.

#42 Dead items

In the issue #42 a situation is described where a dead item might occur. I believe, there is no problem with that. The author handles this with the sentinel value. After inserting 2 items that collide and deleting the first one, in order to not break the collision chain the author uses the sentinel value. In all methods that we implement there are checks for the sentinel value. If the item we are currently inspecting is a pointer to the sentinel value, take the next one. So, updating the second value should not be an issue. After first hashing we get the index that has the pointer to the sentinel value, we iterate once more, find a new index with a pointer to a struct that has the key we are looking for, we free this struct and replace it with a new one.

It was a great repository to work through, I liked it a lot. If you have any comments regarding my comments, please post them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant