-
What is the engine/algorithm behind the "Find similar documents" functionaloty? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Take a look at the updated manual: https://github.com/sepinf-inc/IPED/wiki/User-Manual#similar-document-search Yes, it uses LUCENE MoreLikeThis internally with some customizations. If you are interested, take a look at the implementation https://github.com/sepinf-inc/IPED/blob/master/iped-engine/src/main/java/dpf/sp/gpinf/indexer/search/SimilarDocumentSearch.java It is some years old and things could always be improved, help is welcome. |
Beta Was this translation helpful? Give feedback.
Take a look at the updated manual: https://github.com/sepinf-inc/IPED/wiki/User-Manual#similar-document-search
Yes, it uses LUCENE MoreLikeThis internally with some customizations. If you are interested, take a look at the implementation https://github.com/sepinf-inc/IPED/blob/master/iped-engine/src/main/java/dpf/sp/gpinf/indexer/search/SimilarDocumentSearch.java
It is some years old and things could always be improved, help is welcome.