Skip to content

My submission to Telegram Data Clustering contest (ranked 5th/122, team of 2)

Notifications You must be signed in to change notification settings

mbalesni/tgnews

Repository files navigation

Telegram Data Clustering Contest Submission

by Kooky Dragon (Andy and Mikita)

Demo

https://mbalesni.github.io/tgnews

Description

The task of the contest was to create a command-line application for classifying and sorting news articles. See more details on the official contest page.

Solution commentary

Our solutions to tasks 1, 2, 3 are based on supervised learning using fasttext

  • For the English dataset, we used the Google Cloud NLP text categorization service to label the Telegram-provided sample datasets.

  • For the Russian dataset, we used Google Translate to translate part of the English dataset into Russian.

Our solutions to task 4 and 5 are using DBSCAN from scikit-learn.


It was our first experience with NLP, so this contest was rather challenging for us. However, we had A LOT of fun! Thank you for this opportunity ;)

About

My submission to Telegram Data Clustering contest (ranked 5th/122, team of 2)

Resources

Stars

Watchers

Forks

Languages