-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Anyone know why there is a Segmentation fault/core dump? after submitting a URL to crawl? #204
Comments
Hey I realize this is an old question, but the segfaults are due to a couple different issues. I fixed a few of them and am able to run a cluster of nodes doing lots of spidering (with and without proxies) and service random queries for 24 hours or so without a segfault so far. My code is here: https://code.moldybits.net/Forks/open-source-search-engine/commits/branch/devel The relevant commits are: My plan is to get the optimization turned on across the Makefile, get it to run reliably, then publish some new packages and tune up some of the documentation. All in all, I think this is a cool project and deserves a little love to keep it alive. |
Thanks for continuing the work. I cloned the dev URL and tried to compile on Debian 12, but it throws some errors
I changed it to It is also throwing a lot of warnings. Here is an example of one of the most common type warning:
|
update. After changing that one line, it compiled just fine. However, just like all the other times, as soon as a Press "submit" on the Spider search:
Better luck on this next time. |
if i download this and buiild it am i still able to browse the internet with this search engine ? |
Good luck on getting it running. It seems to be crashing for most people, and those wno have managed to get it running just won't explain how they did get it running. Anyway, to answer your question, it's a search engine, not a. browser. |
@tcreek , did you ever figure this out? |
No, I gave up. There is another search engine available which is called Qwazr. It was formerly called Open Search Server. |
hey @Overdrive5 & @tcreek - I just saw these messages now...if either of you are still interested in trying to use this codebase I might fork it on GitHub to maintain (so people can open issues there). Let me know if this is the case. I was able to build it on almalinux & ubuntu (and fedora in the past), but I haven't tried debian recently. I will try it and let you know. Just out of curiosity - what were you going to use this for? I personally played with it for a bit & thought the search was better than I have gotten from the other open source engines I've found. Let me know what OS's you are planning to use. I believe I had the RPM building a few months ago on alma & fedora but my memory of that is hazy now. |
@twistdroach You will be wasting your life, trust me. |
Negativity aside - I pushed a docker container for experimenting with this here: I also did fork this repo on GitHub & began mirroring my personal repo: I'd be interested in hearing about anyone playing with this and their experiences. The code is old & dusty, but I really do like how well the search & "gigabits" feature seems to work (at least with the small amount of data I have fed it). Anyway, I wouldn't use it for anything important, but it's a fun toy at this point. Next thing on my list is to revive a patch I had at one point that fixed the segfaults from setting optimization (-O3), but that is not applied in these changes as I haven't played with it in about 6 months and I don't remember where I left off. |
Thank you
~ We dont try our best. We do the best at Subhan Inc
…On Thu, Apr 11, 2024 at 12:03 AM twistdroach ***@***.***> wrote:
Negativity aside - I pushed a docker container for experimenting with this
here:
https://hub.docker.com/r/moldybits/open-source-search-engine
I also did fork this repo on GitHub & began mirroring my personal repo:
https://github.com/twistdroach/open-source-search-engine
I'd be interested in hearing about anyone playing with this and their
experiences. The code is old & dusty, but I really do like the how well the
search & "gigabits" feature seems to work (at least with the small amount
of data I have fed it). Anyway, I wouldn't use it for anything important,
but it's a fun toy at this point.
Next thing on my list is to revive a patch I had at one point that fixed
the segfaults from setting optimization (-O3), but that is not applied in
these changes as I haven't played with it in about 6 months and I don't
remember where I left off.
—
Reply to this email directly, view it on GitHub
<#204 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AWXX3JZGIFPEYX5RV6PW5ADY4YDQDAVCNFSM6AAAAAAX25ZBLCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBYHA4DSNBQGY>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
We spent years working with the code at Findx, and it is among the biggest regrets of my life. It never became production ready, despite our major rewrites and stability fixes. But as you say, as a toy, interesting enough.. |
Open Search Engine is an abandoned project for some years now https://sourceforge.net/p/opensearchserve/discussion/947147/thread/a2ef9cfb/?limit=25#3ba5 Seems SSO 2.0 was supposed to be out, but instead they started calling it Qwazr for some reason. Now there has been no activity with Qwazr in a couple of years . I plan on using Debian 12. For this GigaBlast I went back to Debian 9 to try to get it to work, and I still get the same result of a segmentation fault For for wanting to use it? https://en.wikipedia.org/wiki/Search_engine_manipulation_effect and Censorship of course. |
@tcreek, I am using a lightweight derivative of ubuntu fossa called FossaPup64 with dev environment added in. I got it to compile and work without segfault by editing this project Makefile. Changing -O2 to -O0 on lines 86,97 and 101. I have only tested it with 1 core. It seems my ISP or DNS clamped down on my account after I scanned 500k+ websites though. I am wanting censorship free searching as well. |
@brianrasmusson care to share where you left off so we don't have to reinvent the wheel? |
@brianrasmusson - ah - findx was previously privacore right? So then you were the source of the privacore repo? Sorry you had such a bad experience with it, but cool to see you still lurking :) @Overdrive5 - I think their repo is here: https://github.com/privacore/open-source-search-engine Somewhere I remember reading a comment from someone at privacore - maybe it was you - about the gigablast crawler having a bug that would get you banned when crawling sites - I'm assuming for not honoring robots.txt or going crazy and request bombing due to some bug. Do you know what the bug was? I saw you guys rewrote the crawler portion, so I was never clear on what went wrong there, but always makes me a littler nervous when I play with it... |
Several bugs resulting in both scenarios you describe. Sometimes not respecting robots.txt, other times bombarding a site with requests. Got our crawl servers blocked by firewalls multiple times, and it was a pain. Yes, the privacore fork is ours, but I won't comment on anything in it. That is all behind me. Just got an email notification from github about a case update here, which is what triggered me. So my advice is still - run away. |
@brianrasmusson understood - thanks for the response. Let me know if you have another open source web search/crawler that needs contributing to. |
It worked! Thanks so much! I guess the next update to it should be images @Overdrive5 Anyway to get it to use more than one core? |
@tcreek , I did get a second core running after quite a bit of frustration. amongst other things I can no longer remember. overall I'm not recommending it. Not worth the frustration. Do it at your own risk. I experimented with a fresh clean linux install instead of my production install. Also, the multi-core process used here was developed 10-25 years ago before other better technologies were developed. it will populate a separate database for each core in use. So it will fill up a drive much faster with some redundant replication I am guessing. Would love to figure out how to multi-core/single database spidering and single core searching. I am playing with this for my own personal censorship free search engine for select subject matters I am interested in. I need to improve filtering. My rough calc is "only" USA websites right now (~3.3 billion) would need 100TB+ before mirroring. Maybe more. I have ZERO interest in sucking up the whole internet for general searching. I have built @twistedroach 's fork and it compiles and works with "-O3" for me. So I will probably head in that direction. Good Luck in your endeavors! |
I had a system setup about 6 months ago that had 8 or 16 cluster members...it is fiddly. I'll reproduce and try to doc on my fork in a day or two. I find it a shame that this codebase is left mostly unusable due to lack of clear docs and a few good builds. Going to update my fork shortly to default to -O3, I have done minimal testing with it this way and it seemed mostly stable. A note about the segfaults - the technique of aborting when reaching some unrecoverable scenario in a server app is common, but it is taken to the extreme here - missing a config file or many other circumstances will result in a segfault. On top of that, there certainly are many real issues left that will cause legit segfaults (I fixed the ones that I ran into using the system lightly). Anyway, just wanted to say, don't be dismayed if you get the occasional crash. The app is built to do that and restart to recover from "unrecoverable" situations. If you get one that is reproducible, feel free to patch and submit a fix if you are able or file an issue on my fork. @Overdrive5 your use case is exactly what brought me to this codebase! |
Have you or @twistdroach tried Qwazr aka Open Search Server? https://sourceforge.net/p/opensearchserve/discussion/947147/thread/a2ef9cfb/?limit=25#3ba5 |
qwazr, is java based. I have near zero java experience. And I am slightly knowledgeable in C. So currently, I favor this codebase. Unless I find another C based crawler/search engine. |
Java is based on C++ so it should not be that hard to adapt to |
I am getting that, and seems others are also:
#199 (comment)
The text was updated successfully, but these errors were encountered: