Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect handling of hash collisions in ctrie #194

Closed
stephane-martin opened this issue Jun 1, 2018 · 1 comment
Closed

Incorrect handling of hash collisions in ctrie #194

stephane-martin opened this issue Jun 1, 2018 · 1 comment

Comments

@stephane-martin
Copy link
Contributor

Hello,

suppose you have two strings X and Y that have the same hash. We know that such strings exist with every hash function.

In the ctrie, when we insert X and Y, they will be stored in a common lNode. lNode as currently implemented is a simple persistent Linked List. That implementation is not correct for the ctrie, as duplicates will not be detected in the lNode.

package main

import (
	"fmt"
	"hash"

	"github.com/Workiva/go-datastructures/trie/ctrie"
)

type fakehash struct{}

func (h *fakehash) Sum32() uint32 {
	return 42
}

func (h *fakehash) Sum(b []byte) []byte {
	return nil
}

func (h *fakehash) Size() int {
	return 0
}

func (h *fakehash) BlockSize() int {
	return 0
}

func (h *fakehash) Reset() {

}

func (h *fakehash) Write(b []byte) (int, error) {
	return 0, nil
}

func factory() hash.Hash32 {
	return &fakehash{}
}

func main() {
	trie := ctrie.New(factory)
	trie.Insert([]byte("foobar"), 1)
	fmt.Println(trie.Lookup([]byte("foobar")))
	trie.Insert([]byte("zogzog"), 2)
	fmt.Println(trie.Lookup([]byte("zogzog")))
	trie.Insert([]byte("foobar"), 3)
	fmt.Println(trie.Lookup([]byte("foobar")))
	trie.Remove([]byte("foobar"))
	// foobar is supposed to be removed, but the next lookup returns 1
	fmt.Println(trie.Lookup([]byte("foobar")))
}
@stephane-martin
Copy link
Contributor Author

I don't really understand why the PR takes so long to be integrated. Is there a problem ?

Playing with hashes, it looks like that the words 'costarring' and 'liquid' have the same fnv32a hash.

So the previous POC can be simplified, without any "fakehash":

func main() {
	trie := ctrie.New(factory)
	trie.Insert([]byte("costarring"), 1)
	fmt.Println(trie.Lookup([]byte("costarring")))
	trie.Insert([]byte("liquid"), 2)
	fmt.Println(trie.Lookup([]byte("liquid")))
	trie.Insert([]byte("costarring"), 3)
	fmt.Println(trie.Lookup([]byte("costarring")))
	trie.Remove([]byte("costarring"))
	// foobar is supposed to be removed, but the next lookup returns 1
	fmt.Println(trie.Lookup([]byte("costarring")))
}

Thank you in advance for your feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant