Skip to content
/ TASE Public

TASE (Telegram Audio Search Engine): A lightning fast audio full-text search engine on top of Telegram

License

Notifications You must be signed in to change notification settings

appheap/TASE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

TASE (Telegram Audio Search Engine)


A lightning fast audio full-text search engine on top of Telegram

It allows users to quickly and easily find information that is of genuine interest or value, without the need to wade through numerous irrelevant channels. It provides users with search results that lead to relevant information on high-quality audio files.



What makes TASE special?

TASE is a growing open source full-text audio search engine platform that serves high-volume requests from users. Based on Python and Telegram, the latest major update introduces many new features among which a highly abstracted and modular design pattern powered by Elasticsearch and ArangoDB with support for parallel clusters on different servers located in different parts of the world.

TASE at a glance

  1. Advanced full-text search engine for audio files
  2. Extremely fast audio file indexer (benchmark: minimum 4 million songs per day per client)
  3. Support for multiple parallel clients as indexer
  4. Support for distributed parallel clusters on multiple servers (searching and indexing) (all audio files, graph and document models)
  5. Graph of users and items
  6. Dynamic URLs
  7. Asynchronous
  8. Reach admin tools
  9. Multilingual
  10. Audio file caching
  11. Easy configuration and customization
  12. Friendly look and feel

TASE is free and always will be. Help us outโ€ฆ If you love free stuff and great software, give us a star! :star::star2:


How to install and run

    * Note: please make sure to read the configuration and customization section before you run the project

    There are two different ways to use TASE

    (*note: before running the project make sure to configure the tase.json and .env files)
    1. Clone the repository

    2. Setting up services:

      1. Manually install the dependencies

        1. Install Elasticsearch (v8.3) (instructions)
        2. Install ArangoDB (v3.9.1) (instructions)
        3. Install RabbitMQ (instructions)
        4. Install Redis (instructions)
      2. Run using docker compose

        The easier method (recommended) (*note: before running the project make sure to configure the tase.json file)
        docker compose up -d
        * install docker compose if you haven't already (instructions)
    3. poetry install

      * install poetry if you haven't already (instructions)
    4. Run the tase_client.py file located in the tase package

    Configuration and customization

    Before you run your project you need to customize the tase.json file in the root directory which is used as the config file by TASE

    In order to run the project you have to provide basic information which the bot works with. For instance you must provide telegram bot token and your Telegram client authentication information to run your own clients.


Features

Features for developers

  1. Add new languages in locales (we recommend using Poedit)
  2. Easily add new buttons and functionalities (query and inline) by implementing the abstract methods in the base button class
  3. Realtime visualizations for graph models and audio files (Kibana, ArangoDB)
  4. Abstraction and facade design pattern

Wide Range of Features ๐Ÿ’ก

  1. Search engine

    • Search audio files through the direct bot search
    • Search audio files from groups and private chats using @bot_name mention and send them directly to the chat
    • Real-time search using @bot_name mention, by showing an inline list of results
    • Real-time search directly in the private and group chats
    • Search based on file-name, performer name, and audio-name
    • Shows the top 10 relevant results in a message and unlimited in the more results; returned as an inline list
    • Play the songs in the inline lists before downloading them
    • Caches searched audio files to avoid unnecessary redundant DB requests
    • Dynamic URL for the results
    • Allows the owner to trace the downloaded audio files
    • High accuracy and relevance
    • Search in a wide variety of languages
    • Show the source-channel name and the link to the file
    • Sort results in reverse mode (to make more relevant results in the bottom)
    Search example screenshot Search example screenshot


    Result audio example screenshot
  2. Indexing features

    • Automatically finds new channels in an optimistic way (first assumes it is a valid channel and validates it later before starting to index)
      1. Extract from texts and captions
      2. Extract from "forwarded mention"
      3. Extract from links
    • Automatically indexes new channels
    • Iterates through previous channels and resumes indexing from the previous checkpoint
    • Extremely fast indexing (minimum 4 million songs per day per client
    • Analyzes channels and calculates a score (0-5) based on their
      1. Density of audio files (ratio of audio files
      2. Activity of the channel (how frequent it shares new files)
      3. Number of members
    • Avoids getting banned by the Telegram servers
    • Support for parallel indexing using multiple Telegram clients
    • Hashes the file IDs in a specific way that avoids conflicts to a high degree and still keeps them as short as eight characters
    • Users and channel owners can send request to index a specific channel useing "/index channel_name"
    • Constructs a graph for users and audio files in real time which can be used for recommendation systems and link prediction tasks

  3. User limiting/controlling features

    • Handle user membership in your channel(s) in near real-time
    • Set limitations for users based on their membership status
    • Limits not-a-member users to search 5 audio files freely, and then they should wait for one minute until they receive their searched audio files
    • Not members have limitations with direct in-chat searches

  4. User interface

    • User guide
    • Multiple menus (home, help, playlist etc.)
    • A keyboard for each part to ease the process for users
    • Multilingual bot - currently supported:
      • ๐Ÿ‡บ๐Ÿ‡ธ English
      • ๐Ÿ‡ช๐Ÿ‡ธ Spanish
      • ๐Ÿ‡ท๐Ÿ‡บ Russian
      • ๐Ÿ‡ฆ๐Ÿ‡ช Arabic
      • ๐Ÿ‡ง๐Ÿ‡ท Portuguese
      • ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi
      • ๐Ÿ‡ฉ๐Ÿ‡ช German
      • ๐Ÿ‡น๐Ÿ‡ฏ Kurdish (Sorani)
      • ๐Ÿ‡น๐Ÿ‡ฏ Kurdish (Kurmanji)
      • ๐Ÿ‡ณ๐Ÿ‡ฑ Dutch
      • ๐Ÿ‡ฎ๐Ÿ‡น Italian
      • ๐Ÿ‡ฎ๐Ÿ‡ท Persian
    • Greeting messages to users based on their activity if they haven't been active for more than a week or more than two weeks
    • Shows search history for each user through a scrollable inline list by pressing history button in the home keyboard
    • Beautiful and vibrant user interface (messages and emojis)
    • Playlists
      1. Users can have unlimited playlists and save unlimited audio files in each
      2. Users can edit playlist meta-data
      3. Users can edit saved audio files
    Main menu screenshot
  5. Admin features

    • Real-time graph visualization (supports ArangoDB dashboard)
    • Real-time indexed audio file visualization (supports Kibana dashboard)
      * Kibana is a data visualization and exploration tool used for log and time-series analytics, application monitoring, and operational intelligence use cases. It offers powerful and easy-to-use features such as histograms, line graphs, pie charts, heat maps, and built-in geospatial support.
  6. Other

    • Extremely fast
    • Documentation is provided in the codes (docstring)
    • Handles database related exceptions
    • Multi-threaded search (searches multiple requests asynchronously)
    • Handles RTL texts perfectly

Technology stack

Main tools & technologies used in developing TASE are as following:

  • Elasticsearch โ€ƒ
  • ArangoDB โ€ƒโ€ƒโ€‚
  • Pyrogram โ€ƒโ€ƒโ€‚โ€‚
  • Python get_textโ€‚
  • Celery โ€ƒโ€ƒโ€ƒโ€ƒโ€‚
  • RabbitMQ โ€ƒโ€ƒโ€ƒ
  • Redis โ€ƒโ€ƒโ€ƒโ€ƒโ€‚
  • Pydantic โ€ƒโ€ƒโ€ƒโ€‚
  • Jinja โ€ƒโ€ƒโ€ƒโ€ƒโ€ƒ

Call for Contributions

We welcome your expertise and enthusiasm!

Ways to contribute to Telegram audio search engine:

  • Writing code
  • Review pull requests
  • Develop tutorials, presentations, documentation, and other educational materials
  • Translate documentation and readme contents

We love your contributions and do our best to provide you with mentorship and support. If you are looking for an issue to tackle, take a look at issues.

Issues

If you happened to encounter any issue in the codes, please report it here. A better way is to fork the repository on Github and/or to create a pull request.

Future work

  • Voice search
  • Add artist support
  • [ ]

If you found it helpful, please give us a โญ


License

TASE is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

Copyright ยฉ 2020-2022