-
Notifications
You must be signed in to change notification settings - Fork 443
Home
Steve Cook edited this page Aug 5, 2013
·
3 revisions
Welcome to the open-source-search-engine wiki!
Quick install instructions are available.
- Live demo at http://www.gigablast.com/
- Written in C/C++ for optimal performance.
- Over 500,000 lines of C/C++.
- 100% custom. A single binary. The Web Server, Database and everything else is all contained in this source code in a highly efficient manner.
- Scalable to thousands of servers. Has scaled to over 12 billion web pages on over 200 servers.
- Reliable. Has been tested in live production since 2002 on billions of queries on indexes of over 12 billion web pages.
- Track record. Has been used by many clients. Has been successfully used in distributed enterprise software.
- Cached web pages with query term highlighting.
- Supports any document conversion plugin to convert PDF, etc. to HTML
- Shows popular topics of search results (Gigabits)
- Email alert monitoring.
- "Synonyms" based on wiktionary data. Using query expansion method.
- Customizable "synonym" file: my-synonyms.txt
- Stores position and format information of each word in an indexed document.
- Complete scoring details are displayed in the search results.
- Indexes anchor text of inlinks to a web page.
- Can cluster results from same site.
- Duplicate removal from search results.
- Distributed web crawler.
- Crawler/Spider is highly programmable and URLs are binned into priority queues. Each priority queue has a throttle and max outstanding connection parms.
- Complete REST/XML API
- Can inject documents into the index in real time using XML or HTML.
- Automated data corruption detection and repair based on hardware failures.
- Boolean query support
- Spellchecker