`ht_get_hash` not working correctly #45

elgin9507 · 2022-12-03T11:31:37Z

ht_get_hash returns the same slot subsequently up to very high attempts in some cases. Example:

(gdb) p ht_get_hash("mybrujmaonutk", 211, 0)
$32 = 38
(gdb) p ht_get_hash("mybrujmaonutk", 211, 1)
$33 = 38
(gdb) p ht_get_hash("mybrujmaonutk", 211, 2)
$34 = 38
// and so on up to ...
(gdb) p ht_get_hash("mybrujmaonutk", 211, 10177647)
$35 = 38
(gdb) p ht_get_hash("mybrujmaonutk", 211, 10177648)
$36 = -13

My definitions:

static int ht_hash(const char* s, const int a, const int m) {
    long hash = 0;
    const int len_s = strlen(s);
    for (int i = 0; i < len_s; i++) {
        hash += (long)pow(a, len_s - (i+1)) * s[i];
        hash = hash % m;
    }
    return (int)hash;
}

static int ht_get_hash(
    const char* s, const int num_buckets, const int attempt
) {
    const int hash_a = ht_hash(s, HT_PRIME_1, num_buckets);
    const int hash_b = ht_hash(s, HT_PRIME_2, num_buckets);

    return (hash_a + (attempt * (hash_b + 1))) % num_buckets;
}

The text was updated successfully, but these errors were encountered:

nouritsu · 2023-04-19T12:04:15Z

What have you set HT_PRIME_1 and HT_PRIME_2 to? I am facing the same issue

nouritsu · 2023-04-20T04:45:00Z

I have found the solution to this issue.
Make sure the HT_PRIME values are as follows ->

#define HT_PRIME_1 151;
#define HT_PRIME_2 163;

The ht_get_hash function no longer returns negative values and also works in far fewer attempts.

elgin9507 · 2024-04-21T09:54:37Z

@nouritsu my HT_PRIME_1 and HT_PRIME_2 were set to 129 and 131. After changing them to 151 and 163, the issue still persists. See my driver code below for reproducing:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#include "hash_table.h"

char* rand_word(int min_len, int max_len) {
    int i, length;
    char c;

    // seed random
    struct timespec ts;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    /* using nano-seconds instead of seconds */
    srand((time_t)ts.tv_nsec);

    length = rand() % (max_len - min_len + 1) + min_len;
    char* word = calloc((size_t)(length+1), sizeof(char));

    for (i = 0; i < length; i++) {
        c = 'a' + rand() % 26;
        word[i] = c;
    }
    word[i] = 0;

    return word;
}

int main() {
    ht_hash_table* ht = ht_new();

    for (int i = 0; i < 5000; i++) {
        char *k = rand_word(2, 30);
        char *v = rand_word(10, 100);
        ht_insert(ht, k, v);
    }

    ht_del_hash_table(ht);

    return 0;
}

winterrdog · 2024-04-27T06:27:15Z

Do you ever get out-of-bounds writes?

I see you get negative indices🤔

elgin9507 · 2024-04-30T14:14:19Z

@winterrdog yes, program ends with "Segmentation fault" in some cases (not always though). It means that negative index works in some cases (or at least it doesn't cause segmentation fault) while not in others. I have no idea how this happens.

winterrdog · 2024-04-30T16:51:46Z

ht_get_hash returns the same slot subsequently up to very high attempts in some cases. Example:

(gdb) p ht_get_hash("mybrujmaonutk", 211, 0)
$32 = 38
(gdb) p ht_get_hash("mybrujmaonutk", 211, 1)
$33 = 38
(gdb) p ht_get_hash("mybrujmaonutk", 211, 2)
$34 = 38
// and so on up to ...
(gdb) p ht_get_hash("mybrujmaonutk", 211, 10177647)
$35 = 38
(gdb) p ht_get_hash("mybrujmaonutk", 211, 10177648)
$36 = -13

My definitions:

static int ht_hash(const char* s, const int a, const int m) {
    long hash = 0;
    const int len_s = strlen(s);
    for (int i = 0; i < len_s; i++) {
        hash += (long)pow(a, len_s - (i+1)) * s[i];
        hash = hash % m;
    }
    return (int)hash;
}

static int ht_get_hash(
    const char* s, const int num_buckets, const int attempt
) {
    const int hash_a = ht_hash(s, HT_PRIME_1, num_buckets);
    const int hash_b = ht_hash(s, HT_PRIME_2, num_buckets);

    return (hash_a + (attempt * (hash_b + 1))) % num_buckets;
}

the reason is because of an edge case that was not handled with hash_b i.e. when hash_b is equivalent to number of buckets. so i dealt with it in my code like so. When it goes unhandled it can lead very weird hashes that lead to negative indices which should never happen at all.

You could deal with it in two ways e.g. either by always checking if the calculated index is within bounds of the table's items member before using the index to access any element in items member. sth like this:

static int is_valid_index(ht_hash_table* table, int index)
{
    return index >= 0 && index < table->size;
}

or

do a single check like the one i did in the ht_get_hash function in my code repo

iColgateZz mentioned this issue Jul 16, 2024

Recap of all issues + Other problems #52

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ht_get_hash` not working correctly #45

`ht_get_hash` not working correctly #45

elgin9507 commented Dec 3, 2022

nouritsu commented Apr 19, 2023

nouritsu commented Apr 20, 2023

elgin9507 commented Apr 21, 2024 •

edited

Loading

winterrdog commented Apr 27, 2024

elgin9507 commented Apr 30, 2024

winterrdog commented Apr 30, 2024 •

edited

Loading

ht_get_hash not working correctly #45

ht_get_hash not working correctly #45

Comments

elgin9507 commented Dec 3, 2022

nouritsu commented Apr 19, 2023

nouritsu commented Apr 20, 2023

elgin9507 commented Apr 21, 2024 • edited Loading

winterrdog commented Apr 27, 2024

elgin9507 commented Apr 30, 2024

winterrdog commented Apr 30, 2024 • edited Loading

`ht_get_hash` not working correctly #45

`ht_get_hash` not working correctly #45

elgin9507 commented Apr 21, 2024 •

edited

Loading

winterrdog commented Apr 30, 2024 •

edited

Loading