Skip to content

Latest commit

 

History

History
52 lines (30 loc) · 2.58 KB

File metadata and controls

52 lines (30 loc) · 2.58 KB

Entity-Context-Based Search Suggestion

TU Darmstadt Summer Term 2019 Data Management Project

Introdcution

Elasticsearch is now a very popular search engine based on Lucene. But we think, in future the simple keywords search is not enough. Therefore we consider to merge the machine learning technology, especially word embeddings into traditional search engine.

Implementation

  1. Elasticsearch development framework was forked from https://github.com/panholly/esfilesearch.

  2. Tensorflow model was trained and saved in Google Colaboratory, and then loaded by java code locally.

  3. Loaded training model used for entity suggestion, the entity autocompletion function is implemented based on trieTree structure

Environment

Elasticsearch 7.2.0

Mysql 8.0.13

Entity Embedding

rather than using single word embedding in traiditional NLP task, in this assignment we use the specific entity embedding for model training and predicting.

Result

Valid result after 50 epochs with windows_size = 10 :

  • Nearest to donald_trump: toby_keith, yemen, superfund, eric_schmidt, max_rose, adam_goldman, united_states_office_of_special_counsel, appalachian_trail,
  • Nearest to china: xi_jinping, central_military_commission, china_daily, lindsay_kemp, dandong, ashok_rajagopalan, rupert_brooke, forum_on_chinaafrica_cooperation,
  • Nearest to barack_obama: melania_trump, bessie_coleman, werner_heisenberg, victor_trumper, howard_county, eileen_atkins, bobby_fischer, fayez_alsarraj,
  • Nearest to angela_merkel: christian_democratic_union_of_germany, germany, schlumberger, danube, berlin, arab_world, wiesbaden, friedrich_merz,
  • Nearest to harry_potter: j_k_rowling, shannon_hale, sheryl_crow, hogwarts, citizens_united_v_fec, ellen_muth, sofia, h_a_hellyer,
  • Nearest to olympic_games: international_olympic_committee, toshir_mut, yuriko_koike, board_of_audit, bykada, american_banker, berkeley_heights, uur_erdener,
  • Nearest to wikipedia: gerontology_research_group, konstantin_novoselov, wikipedia_community, battle_of_gettysburg, katherine_harris, college_of_william__mary, mv_tsgt_john_a_chapman, church_square,
  • Nearest to alibaba_group: jack_ma, shanghai, claude_taylor, saeb_erekat, qingdao, national_retail_federation, domain_name_system, tokyo_stock_exchange,

Figure

50 example entity vectors

entity_vector_similarity

Website demo

website_demo