Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ht_get_hash not working correctly #45

Open
elgin9507 opened this issue Dec 3, 2022 · 6 comments
Open

ht_get_hash not working correctly #45

elgin9507 opened this issue Dec 3, 2022 · 6 comments

Comments

@elgin9507
Copy link

ht_get_hash returns the same slot subsequently up to very high attempts in some cases. Example:

(gdb) p ht_get_hash("mybrujmaonutk", 211, 0)
$32 = 38
(gdb) p ht_get_hash("mybrujmaonutk", 211, 1)
$33 = 38
(gdb) p ht_get_hash("mybrujmaonutk", 211, 2)
$34 = 38
// and so on up to ...
(gdb) p ht_get_hash("mybrujmaonutk", 211, 10177647)
$35 = 38
(gdb) p ht_get_hash("mybrujmaonutk", 211, 10177648)
$36 = -13

My definitions:

static int ht_hash(const char* s, const int a, const int m) {
    long hash = 0;
    const int len_s = strlen(s);
    for (int i = 0; i < len_s; i++) {
        hash += (long)pow(a, len_s - (i+1)) * s[i];
        hash = hash % m;
    }
    return (int)hash;
}

static int ht_get_hash(
    const char* s, const int num_buckets, const int attempt
) {
    const int hash_a = ht_hash(s, HT_PRIME_1, num_buckets);
    const int hash_b = ht_hash(s, HT_PRIME_2, num_buckets);

    return (hash_a + (attempt * (hash_b + 1))) % num_buckets;
}
@nouritsu
Copy link

What have you set HT_PRIME_1 and HT_PRIME_2 to? I am facing the same issue

@nouritsu
Copy link

I have found the solution to this issue.
Make sure the HT_PRIME values are as follows ->

#define HT_PRIME_1 151;
#define HT_PRIME_2 163;

The ht_get_hash function no longer returns negative values and also works in far fewer attempts.

@elgin9507
Copy link
Author

elgin9507 commented Apr 21, 2024

@nouritsu my HT_PRIME_1 and HT_PRIME_2 were set to 129 and 131. After changing them to 151 and 163, the issue still persists. See my driver code below for reproducing:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#include "hash_table.h"

char* rand_word(int min_len, int max_len) {
    int i, length;
    char c;

    // seed random
    struct timespec ts;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    /* using nano-seconds instead of seconds */
    srand((time_t)ts.tv_nsec);

    length = rand() % (max_len - min_len + 1) + min_len;
    char* word = calloc((size_t)(length+1), sizeof(char));

    for (i = 0; i < length; i++) {
        c = 'a' + rand() % 26;
        word[i] = c;
    }
    word[i] = 0;

    return word;
}

int main() {
    ht_hash_table* ht = ht_new();

    for (int i = 0; i < 5000; i++) {
        char *k = rand_word(2, 30);
        char *v = rand_word(10, 100);
        ht_insert(ht, k, v);
    }

    ht_del_hash_table(ht);

    return 0;
}

@winterrdog
Copy link

Do you ever get out-of-bounds writes?

I see you get negative indices🤔

@elgin9507
Copy link
Author

@winterrdog yes, program ends with "Segmentation fault" in some cases (not always though). It means that negative index works in some cases (or at least it doesn't cause segmentation fault) while not in others. I have no idea how this happens.

@winterrdog
Copy link

winterrdog commented Apr 30, 2024

ht_get_hash returns the same slot subsequently up to very high attempts in some cases. Example:

(gdb) p ht_get_hash("mybrujmaonutk", 211, 0)
$32 = 38
(gdb) p ht_get_hash("mybrujmaonutk", 211, 1)
$33 = 38
(gdb) p ht_get_hash("mybrujmaonutk", 211, 2)
$34 = 38
// and so on up to ...
(gdb) p ht_get_hash("mybrujmaonutk", 211, 10177647)
$35 = 38
(gdb) p ht_get_hash("mybrujmaonutk", 211, 10177648)
$36 = -13

My definitions:

static int ht_hash(const char* s, const int a, const int m) {
    long hash = 0;
    const int len_s = strlen(s);
    for (int i = 0; i < len_s; i++) {
        hash += (long)pow(a, len_s - (i+1)) * s[i];
        hash = hash % m;
    }
    return (int)hash;
}

static int ht_get_hash(
    const char* s, const int num_buckets, const int attempt
) {
    const int hash_a = ht_hash(s, HT_PRIME_1, num_buckets);
    const int hash_b = ht_hash(s, HT_PRIME_2, num_buckets);

    return (hash_a + (attempt * (hash_b + 1))) % num_buckets;
}

the reason is because of an edge case that was not handled with hash_b i.e. when hash_b is equivalent to number of buckets. so i dealt with it in my code like so. When it goes unhandled it can lead very weird hashes that lead to negative indices which should never happen at all.

You could deal with it in two ways e.g. either by always checking if the calculated index is within bounds of the table's items member before using the index to access any element in items member. sth like this:

static int is_valid_index(ht_hash_table* table, int index)
{
    return index >= 0 && index < table->size;
}

or

do a single check like the one i did in the ht_get_hash function in my code repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants