Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telegram content proxy #102

Closed
4 of 6 tasks
Tracked by #149
ForNeVeR opened this issue Apr 5, 2020 · 1 comment · Fixed by #163
Closed
4 of 6 tasks
Tracked by #149

Telegram content proxy #102

ForNeVeR opened this issue Apr 5, 2020 · 1 comment · Fixed by #163

Comments

@ForNeVeR
Copy link
Member

ForNeVeR commented Apr 5, 2020

As an improvement from #26 (see #99), which doesn't always work that reliable, I think we could provide an actual Telegram content proxy.

Whenever anyone sends us a piece of Telegram-only content (e.g. a photo, an audio log, a file, whatever), we receive a content identifier of said material. We may afterwards send something called getFile to the Telegram server, and eventually we'll receive a link to download the content. That link is only valid for a short amount of time, so it won't work for long-term storage inside of our logs.

I suggest we do the following:

  • Reap the content identifiers from incoming messages, and save them to a persistent storage. So, each Telegram content identifier effectively gets mapped to an id from our storage. The internal storage id should be resistant to brute force (i.e. no sequential ids, no pseudo-random or GUIDs).
  • Provide a simple HTTP access to the Telegram content, that will receive our internal id, like codingteam.org.ru/content/{internalid}, and include this content link into a message sent to XMPP.
  • When receiving a HTTP request for content, asynchronously send a getFile Telegram content request, and await for a response.
  • When got the response, then, depending on the file size:
    • for smaller files (say, less than 5 MiB): get the contents and stream them to the requesting user directly, while saving the contents to the private LRU cache of some small size (say, several hundreds MiB) to faster serve multiple subsequent requests for the same content
    • for bigger files: await for a Telegram link, and then answer the user with a HTTP redirect to that link (if that's technically viable)

The bot should also accumulate an anonymized file delivery statistics: count of requests for every file, average download/upload speed, cache hits. No client IPs should be saved (we value users' privacy).

Q&A:

  1. Why not just use the Telegram content identifiers as-is: receive them from the user, and then proxy them as follows? Why ever use our own ids instead of Telegram-provided ones?
    This would open a direct way of abusing the bot, which would effectively provide a proxy to all Telegram content ever. Instead of that, we'll filter to only the Telegram content posted in our chat, and the simplest way of filtering it would be just to remap the identifiers.
  2. Why use our internal cache at all?
    To not overload Telegram infrastructure (otherwise they could simply ban us), and to improve the content delivery speed for bunch of smaller requests to the same resource.
  3. What's the use of the file download statistics?
    To monitor and prevent any abuse of our content delivery system: e.g. if someone shares the content links to a wide audience, this could leech our traffic limits pretty fast, and/or lead to us being banned on Telegram.

Depends on:

@ForNeVeR
Copy link
Member Author

Some implementation notes.

  1. Why not just use the Telegram content identifiers as-is: receive them from the user, and then proxy them as follows? Why ever use our own ids instead of Telegram-provided ones?
    This would open a direct way of abusing the bot, which would effectively provide a proxy to all Telegram content ever. Instead of that, we'll filter to only the Telegram content posted in our chat, and the simplest way of filtering it would be just to remap the identifiers.

We may choose to use nanoid for that purpose.

Also, for data storage, I'd like to try EFCore.FSharp. I'm already evaluating it in a separate branch.

ForNeVeR added a commit that referenced this issue Oct 2, 2021
Microsoft.Extensions.Configuration.Abstractions package is referenced by
the platform anyway.
ForNeVeR added a commit that referenced this issue Oct 5, 2021
@ForNeVeR ForNeVeR mentioned this issue Aug 7, 2022
15 tasks
ForNeVeR added a commit that referenced this issue Aug 21, 2022
ForNeVeR added a commit that referenced this issue Aug 27, 2022
ForNeVeR added a commit that referenced this issue Aug 28, 2022
ForNeVeR added a commit that referenced this issue Aug 28, 2022
ForNeVeR added a commit that referenced this issue Aug 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant