Building a search engine for emojis
The inverse word frequency (
where
The median frequency (
The emoji-word frequency (
where
Finally, the score is computed as:
Given an input query
See repository https://github.com/MagnusCardell/emoji-whisperer for an implementation of index in node.js
example input sentence and output 5 top scoring emoji groups
Who else is excited for the new Avengers movie? #MarvelFan,"๐๐, โจ, ๐ค, ๐๐คฃ, ๐คทโโ๏ธ๐"
Can't believe how beautiful the sunset was today. #NaturePhotography,"๐, ๐๐๐ผ, ๐, ๐๐, ๐๐ผ๐๐ผ"
Dinner at my favorite sushi place #Foodie,"๐ญ, ๐๐๐, ๐ค๐ฝ, ๐๐ป, ๐"
Throwback to my trip to Paris last summer #TravelDiaries,"๐๐ค๐ฝ, ๐ค๐ฝ, ๐โ, ๐๐
, ๐๐๐"
Feeling so blessed to have such amazing people in my life #Blessed,"๐๐๐ผ, ๐ฅ, โค๏ธ๐๐พ๐ฏ, ๐โ๐ผ, ๐ธ๐ด"
That was the best concert ever!,"๐ฉ๐, ๐๐ฝ, ๐คฉ๐, ๐ค๐ผ, ๐ค๐ผ"
I'm scared of spiders.,"๐, โจ๐ค๐ผ, ๐๐ค, ๐๐, ๐โ๏ธ"
My heart is broken.,"๐, โค, ๐, ๐๐ฝ๐ฉ, ๐ฅฐ๐"
I can't wait for my birthday.,"๐๐, โจ, ๐ช๐พ๐ฅ, ๐ฅ๐ฅ๐ฅ๐๐๐, ๐ฌ๐น"
angry,"๐บ, ๐ข, ๐ฏ, ๐, ๐ฃ"
love,"๐๐ฝ๐, ๐, โฅ, โฃ, ๐ฉ"
hate,"๐๐ฝ๐ฏ๐, ๐๐๐ผ, ๐๐๐ป๐ฅ, โ๐ผ๐, ๐ช๐พ๐"
food,"๐ญ, ๐ฎ, ๐ฏ, ๐ถ, ๐ฝ"
hungry,"๐ญ๐, ๐๐ค๐ฝ, ๐๐๐๐ญ, ๐๐๐โค๏ธ๐, ๐๐ญ"
tired,"๐ซ, ๐, ๐, ๐, ๐ช"
excited,"๐ค๐๐ผ๐๐ผ, ๐ญ๐๐ป, ๐ฉ๐๐ผ, ๐คฉ๐๐ป๐๐ป, ๐คช๐๐ป"
work,"๐๐ผ๐๐ผ, โจ, ๐ข, ๐ป, ๐ผ"
home,"๐ด๓ ง๓ ข๓ ฅ๓ ฎ๓ ง๓ ฟโ๏ธ, ๐ ๐ ๐ , โ๏ธ, ๐, ๐ "
play,"โถ, ๐ด, ๐๐๐๐๐ฝ๐๐ฝ๐๐ฝ, ๐ฏ๐๐ฝ, ๐๐๐๐"
game,"๐๐๐๐ธ๐ป, โ , โฃ, โฅ, โฆ"
sports,"โฝ, โพ, โท, โธ, ๐ฑ"
music,"๐๐พ๐๐พ๐ฅบ, ๐, ๐, ๐, ๐ต"
movie,"๐, ๐, ๐ฅ, ๐ฆ, ๐ซ"
book,"๐, ๐, ๐, ๐, ๐"
travel,"โฉ, โฐ, โฑ, โฒ, โด"
adventure,"๐๐ฝ๐, ๐๐พ๐๐พ๐, ๐๐๐๐๐๐๐ฅ๐ฅ๐ฅ, ๐๐ค๐ผ, ๐๐๐ฝ"
family,"๐จโ๐ฉโ๐ฆ, ๐จโ๐จโ๐ฆ, ๐จโ๐จโ๐ฆโ๐ฆ, ๐จโ๐จโ๐ง, ๐จโ๐จโ๐งโ๐ฆ"
party,"๐ท, ๐พ, ๐, ๐, ๐"
Building index:
- collect documents (sentences with emojis)
- tokenize the documents
- preprocess the tokens. lowercase, cleanup, english
- Index documents with inverted index