It allows users to quickly and easily find information that is of genuine interest or value, without the need to wade through numerous irrelevant channels. It provides users with search results that lead to relevant information on high-quality audio files.
TASE is a growing open source full-text audio search engine platform that serves high-volume requests from users. Based on Python and Telegram, the latest major update introduces many new features among which a highly abstracted and modular design pattern powered by Elasticsearch and ArangoDB with support for parallel clusters on different servers located in different parts of the world.
- Advanced full-text search engine for audio files
- Extremely fast audio file indexer (benchmark: minimum 4 million songs per day per client)
- Support for multiple parallel clients as indexer
- Support for distributed parallel clusters on multiple servers (searching and indexing) (all audio files, graph and document models)
- Graph of users and items
- Dynamic URLs
- Asynchronous
- Reach admin tools
- Multilingual
- Audio file caching
- Easy configuration and customization
- Friendly look and feel
-
* Note: please make sure to read the configuration and customization section before you run the project
-
-
- Install Elasticsearch (v8.3) (instructions)
- Install ArangoDB (v3.9.1) (instructions)
- Install RabbitMQ (instructions)
- Install Redis (instructions)
- The easier method (recommended) (*note: before running the project make
sure to configure the tase.json file)
docker compose up -d
* install docker compose if you haven't already (instructions)
-
- * install poetry if you haven't already (instructions)
- Run the tase_client.py file located in the tase package
Before you run your project you need to customize the tase.json file in the root directory which is used as the config file by TASE
In order to run the project you have to provide basic information which the bot works with. For instance you must provide telegram bot token and your Telegram client authentication information to run your own clients.
- Add new languages in locales (we recommend using Poedit)
- Easily add new buttons and functionalities (query and inline) by implementing the abstract methods in the base button class
- Realtime visualizations for graph models and audio files (Kibana, ArangoDB)
- Abstraction and facade design pattern
- Search audio files through the direct bot search
- Search audio files from groups and private chats using @bot_name mention and send them directly to the chat
- Real-time search using @bot_name mention, by showing an inline list of results
- Real-time search directly in the private and group chats
- Search based on file-name, performer name, and audio-name
- Shows the top 10 relevant results in a message and unlimited in the more results; returned as an inline list
- Play the songs in the inline lists before downloading them
- Caches searched audio files to avoid unnecessary redundant DB requests
- Dynamic URL for the results
- Allows the owner to trace the downloaded audio files
- High accuracy and relevance
- Search in a wide variety of languages
- Show the source-channel name and the link to the file
- Sort results in reverse mode (to make more relevant results in the bottom)
- Automatically finds new channels in an optimistic way
(first assumes it is a valid channel and validates it later
before starting to index)
- Extract from texts and captions
- Extract from "forwarded mention"
- Extract from links
- Automatically indexes new channels
- Iterates through previous channels and resumes indexing from the previous checkpoint
- Extremely fast indexing (minimum 4 million songs per day per client
- Analyzes channels and calculates a score (0-5) based on their
- Density of audio files (ratio of audio files
- Activity of the channel (how frequent it shares new files)
- Number of members
- Avoids getting banned by the Telegram servers
- Support for parallel indexing using multiple Telegram clients
- Hashes the file IDs in a specific way that avoids conflicts to a high degree and still keeps them as short as eight characters
- Users and channel owners can send request to index a specific channel useing "/index channel_name"
- Constructs a graph for users and audio files in real time which can be used for recommendation systems and link prediction tasks
- Handle user membership in your channel(s) in near real-time
- Set limitations for users based on their membership status
- Limits not-a-member users to search 5 audio files freely, and then they should wait for one minute until they receive their searched audio files
- Not members have limitations with direct in-chat searches
- User guide
- Multiple menus (home, help, playlist etc.)
- A keyboard for each part to ease the process for users
- Multilingual bot - currently supported:
- 🇺🇸 English
- 🇪🇸 Spanish
- 🇷🇺 Russian
- 🇦🇪 Arabic
- 🇧🇷 Portuguese
- 🇮🇳 Hindi
- 🇩🇪 German
- 🇹🇯 Kurdish (Sorani)
- 🇹🇯 Kurdish (Kurmanji)
- 🇳🇱 Dutch
- 🇮🇹 Italian
- 🇮🇷 Persian
- Greeting messages to users based on their activity if they haven't been active for more than a week or more than two weeks
- Shows search history for each user through a scrollable inline list by pressing history button in the home keyboard
- Beautiful and vibrant user interface (messages and emojis)
-
- Users can have unlimited playlists and save unlimited audio files in each
- Users can edit playlist meta-data
- Users can edit saved audio files
- Real-time graph visualization (supports ArangoDB dashboard)
- Real-time indexed audio file visualization (supports Kibana dashboard)
* Kibana is a data visualization and exploration tool used for log and time-series analytics, application monitoring, and operational intelligence use cases. It offers powerful and easy-to-use features such as histograms, line graphs, pie charts, heat maps, and built-in geospatial support. - Extremely fast
- Documentation is provided in the codes (docstring)
- Handles database related exceptions
- Multi-threaded search (searches multiple requests asynchronously)
- Handles RTL texts perfectly
Result audio example screenshot
Main tools & technologies used in developing TASE are as following:
We welcome your expertise and enthusiasm!
Ways to contribute to Telegram audio search engine:
- Writing code
- Review pull requests
- Develop tutorials, presentations, documentation, and other educational materials
- Translate documentation and readme contents
We love your contributions and do our best to provide you with mentorship and support. If you are looking for an issue to tackle, take a look at issues.
If you happened to encounter any issue in the codes, please report it here. A better way is to fork the repository on Github and/or to create a pull request.
- Voice search
- Add artist support
- [ ]
TASE is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.
Copyright © 2020-2022
-
Soran Ghaderi (soran.gdr.cs@gmail.com)
- Personal website: soran-ghaderi.github.io
- Linkedin: Soran-Ghaderi
- Twitter: SoranGhadri
-
Taleb Zarhesh (taleb.zarhesh@gmail.com)
- Linkedin: Taleb Zarhesh
- Twitter: Taleb Zarhesh