Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What a magnificent project! #264

Open
dchristal opened this issue Sep 18, 2022 · 1 comment
Open

What a magnificent project! #264

dchristal opened this issue Sep 18, 2022 · 1 comment

Comments

@dchristal
Copy link

Hi Josh,
I invented a search system for a product that competed with Napster way back in the day.
I've been retired for several years and thought it might be fun to revisit and explore that technology as it has progressed, which led me to your software. I've tried to write it myself, but C# hardly resembles what it was in my day and neither does my brain. Instead of searching PI, I'm trying to index and search one really big string. It crashes when I try to create a suffix array, apparently due to maxValue blowing up. Long strings aside, as far as I can tell, it crashes on any string, "abracadabra" for example.

When I try to create a suffix array from the loaded string it crashes with
IBigArray bigArray = (IBigArray)Activator.CreateInstance(suffixArrayType, arrBigArrayArgs);

I don't think I necessarily need a big array for my project but see no harm in it. I've tried adding a maxValue so that a proper constructor exists, but the deeper I go, the more lost I become.

I'll pay you a reasonable amount if you can easily fix it reasonably soon. I'm hoping that it's just a simple oversight because it's not related to searching PI.

Kudos and kindest regards,

Dave

@JoshKeegan
Copy link
Owner

Hi Dave,
I haven’t touched the bulk of the code for this project in a long time, only having done basic patching to keep the PiSearch API & website online so I don’t remember many details of the code myself by now.
From what I remember though, the optimisations being made for the collections make the assumption that it’s a string of digits, not arbitrary characters. This allows multiple digits to be stored per byte of memory, whilst still keeping a fixed size per character. I was only focusing on Pi when writing this so it was a good trade-off for me.

Perhaps start by getting something working just using digits in the string. If you then wanted to expand to handling other characters it could certainly be done but would need the code changing.

If you’re hitting other issues generating the suffix array it might help to look at this https://github.com/JoshKeegan/PiSearch/blob/master/src/StringSearchConsole/Program.cs
I never documented the process and the code isn’t great, just a proof of concept really. That is my code for generating the suffix array though.

Let me know how you get on 👍
Josh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants