Telegram content proxy #102

ForNeVeR · 2020-04-05T06:55:04Z

As an improvement from #26 (see #99), which doesn't always work that reliable, I think we could provide an actual Telegram content proxy.

Whenever anyone sends us a piece of Telegram-only content (e.g. a photo, an audio log, a file, whatever), we receive a content identifier of said material. We may afterwards send something called getFile to the Telegram server, and eventually we'll receive a link to download the content. That link is only valid for a short amount of time, so it won't work for long-term storage inside of our logs.

I suggest we do the following:

Reap the content identifiers from incoming messages, and save them to a persistent storage. So, each Telegram content identifier effectively gets mapped to an id from our storage. The internal storage id should be resistant to brute force (i.e. no sequential ids, no pseudo-random or GUIDs).
Provide a simple HTTP access to the Telegram content, that will receive our internal id, like codingteam.org.ru/content/{internalid}, and include this content link into a message sent to XMPP.
When receiving a HTTP request for content, asynchronously send a getFile Telegram content request, and await for a response.
When got the response, then, depending on the file size:
- for smaller files (say, less than 5 MiB): get the contents and stream them to the requesting user directly, while saving the contents to the private LRU cache of some small size (say, several hundreds MiB) to faster serve multiple subsequent requests for the same content
- for bigger files: await for a Telegram link, and then answer the user with a HTTP redirect to that link (if that's technically viable)

The bot should also accumulate an anonymized file delivery statistics: count of requests for every file, average download/upload speed, cache hits. No client IPs should be saved (we value users' privacy).

Q&A:

Why not just use the Telegram content identifiers as-is: receive them from the user, and then proxy them as follows? Why ever use our own ids instead of Telegram-provided ones?
This would open a direct way of abusing the bot, which would effectively provide a proxy to all Telegram content ever. Instead of that, we'll filter to only the Telegram content posted in our chat, and the simplest way of filtering it would be just to remap the identifiers.
Why use our internal cache at all?
To not overload Telegram infrastructure (otherwise they could simply ban us), and to improve the content delivery speed for bunch of smaller requests to the same resource.
What's the use of the file download statistics?
To monitor and prevent any abuse of our content delivery system: e.g. if someone shares the content links to a wide audience, this could leech our traffic limits pretty fast, and/or lead to us being banned on Telegram.

Depends on:

The text was updated successfully, but these errors were encountered:

ForNeVeR · 2021-09-26T15:29:56Z

Some implementation notes.

Why not just use the Telegram content identifiers as-is: receive them from the user, and then proxy them as follows? Why ever use our own ids instead of Telegram-provided ones?
This would open a direct way of abusing the bot, which would effectively provide a proxy to all Telegram content ever. Instead of that, we'll filter to only the Telegram content posted in our chat, and the simplest way of filtering it would be just to remap the identifiers.

We may choose to use nanoid for that purpose.

Also, for data storage, I'd like to try EFCore.FSharp. I'm already evaluating it in a separate branch.

Microsoft.Extensions.Configuration.Abstractions package is referenced by the platform anyway.

…indows

ForNeVeR added kind:feature kind:infrastructure status:up-for-grabs labels Apr 5, 2020

ForNeVeR removed the status:up-for-grabs label Sep 26, 2021

ForNeVeR self-assigned this Sep 26, 2021

ForNeVeR added a commit that referenced this issue Sep 26, 2021

(#102) Database: use EFCore to initialize a database

2ca51af

ForNeVeR mentioned this issue Sep 26, 2021

Telegram message processing refactoring #145

Closed

ForNeVeR added a commit that referenced this issue Sep 30, 2021

(#102) Database: use EFCore to initialize a database

04d6346

ForNeVeR mentioned this issue Sep 30, 2021

HTTP redirector service for Telegram content #147

Closed

ForNeVeR added a commit that referenced this issue Oct 2, 2021

(#102) Emulsion: update Configuration dependencies version

e671c7b

Microsoft.Extensions.Configuration.Abstractions package is referenced by the platform anyway.

ForNeVeR added a commit that referenced this issue Oct 2, 2021

(#102) LinkGenerator: extract to a separate module

85a2459

ForNeVeR added a commit that referenced this issue Oct 2, 2021

(#102) LinkGenerator: extract to a separate module

4816be7

ForNeVeR added a commit that referenced this issue Oct 3, 2021

(#102) LinkGenerator: migrate to async

1177910

ForNeVeR added a commit that referenced this issue Oct 3, 2021

(#102) LinkGeneratorTests: test basic generation

2e063dc

ForNeVeR added a commit that referenced this issue Oct 5, 2021

(#102) ContentProxy: add an empty project

c3ab60d

ForNeVeR added a commit that referenced this issue Oct 5, 2021

(#102) LinkGenerator: more test stubs

68b4d18

ForNeVeR added a commit that referenced this issue Oct 10, 2021

(#102) LinkGenerator: more test stubs

c1f5cdc

ForNeVeR added a commit that referenced this issue Oct 10, 2021

(#102) LinkGenerator: more test stubs

a65aa70

ForNeVeR mentioned this issue Nov 7, 2021

Telegram content proxy: messages deleted from Telegram #149

Open

1 task

ForNeVeR added a commit that referenced this issue Jul 31, 2022

(#102) Telegram, Messaging: extract to separate projects

0c7e5b6

ForNeVeR mentioned this issue Jul 31, 2022

(#102) Telegram, Messaging: extract to separate projects #162

Merged

ForNeVeR added a commit that referenced this issue Aug 6, 2022

(#102) Web: add a dependency on Telegram link resolver

9eb8f30

ForNeVeR mentioned this issue Aug 7, 2022

Telegram content proxy #163

Merged

15 tasks

ForNeVeR added a commit that referenced this issue Aug 7, 2022

(#102) ContentProxy: add a FileCache (WIP)

55137e9

ForNeVeR added a commit that referenced this issue Aug 19, 2022

[WIP] (#102) Content Proxy: add initial cache settings

1270ab1

ForNeVeR added a commit that referenced this issue Aug 20, 2022

[WIP] (#102) FileCache: quick code sync with a broken MailboxProcessor

a762d65

ForNeVeR added a commit that referenced this issue Aug 20, 2022

(#102) ContentProxy: finally, make it compile

52e8715

ForNeVeR added a commit that referenced this issue Aug 20, 2022

(#102) FileCacheTests: preliminary test API

4966c7e

ForNeVeR added a commit that referenced this issue Aug 20, 2022

(#102) TestFramework: extract the code from TestUtils

37fed71

ForNeVeR added a commit that referenced this issue Aug 21, 2022

(#102) ContentProxy: finish working FileCache

0e2be47

ForNeVeR added a commit that referenced this issue Aug 21, 2022

(#102) ContentProxy: finish working FileCache

4c9895f

ForNeVeR added a commit that referenced this issue Aug 21, 2022

(#102) ContentProxy: add a FileCache

afbe2f8

ForNeVeR added a commit that referenced this issue Aug 21, 2022

(#102) ContentProxy: finally, make it compile

800546a

ForNeVeR added a commit that referenced this issue Aug 21, 2022

(#102) FileCacheTests: preliminary test API

3e7b422

ForNeVeR added a commit that referenced this issue Aug 21, 2022

(#102) TestFramework: extract the code from TestUtils

97db22a

ForNeVeR added a commit that referenced this issue Aug 21, 2022

(#102) ContentProxy: finish working FileCache

4d6bfc5

ForNeVeR added a commit that referenced this issue Aug 25, 2022

(#102) FileCacheTests: implement an ordering test

a5e26a1

ForNeVeR added a commit that referenced this issue Aug 25, 2022

(#102) FileCache: cache directory validation tests

6ea4892

ForNeVeR added a commit that referenced this issue Aug 27, 2022

(#102) FileCache: additional tests

cb218b6

ForNeVeR added a commit that referenced this issue Aug 27, 2022

(#102) FileCache: finish the last tests

b50d615

ForNeVeR added a commit that referenced this issue Aug 27, 2022

(#102) ContentController: test redirect mode

9100471

ForNeVeR added a commit that referenced this issue Aug 27, 2022

(#102) ContentController: last test groundwork

107c4be

ForNeVeR added a commit that referenced this issue Aug 28, 2022

(#102) FileCache: async stream optimization

e8e8153

ForNeVeR added a commit that referenced this issue Aug 28, 2022

(#102) ContentController: add last tests

067da2d

ForNeVeR added a commit that referenced this issue Aug 28, 2022

(#102) ContentController: make it work in manual tests

9292428

ForNeVeR added a commit that referenced this issue Aug 28, 2022

(#102) ContentProxy: some small fixes

b02512c

ForNeVeR added a commit that referenced this issue Aug 28, 2022

(#102) ContentProxy: add file names and MIME types

5d954d6

ForNeVeR added a commit that referenced this issue Aug 28, 2022

(#102) CI: upgrade to a new Windows version

27d0db9

ForNeVeR added a commit that referenced this issue Aug 28, 2022

(#102) FileCache: support older versions of Windows

3977248

ForNeVeR added a commit that referenced this issue Aug 28, 2022

(#102) FileCache: drop redundant rec

a58f54e

ForNeVeR added a commit that referenced this issue Aug 28, 2022

(#102) FileCacheTests: more workarounds for Windows

6952a9f

ForNeVeR added a commit that referenced this issue Aug 28, 2022

(#102) FileCache: improve the workarounds for the older versions of W…

2a3c54d

…indows

ForNeVeR added a commit that referenced this issue Aug 28, 2022

(#102) FileCache: improve the workarounds for the older versions of W…

b2cccee

…indows

ForNeVeR added a commit that referenced this issue Aug 28, 2022

(#102) ContentProxy: redesign the attribute optionality

f6bcfff

ForNeVeR added a commit that referenced this issue Aug 28, 2022

(#102) ContentProxy: redesign the attribute optionality

2861ee8

ForNeVeR added a commit that referenced this issue Aug 28, 2022

(#102) Settings: update the example

1735749

ForNeVeR added a commit that referenced this issue Aug 28, 2022

(#102) Settings: update the example

7936682

ForNeVeR closed this as completed in #163 Aug 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Telegram content proxy #102

Telegram content proxy #102

ForNeVeR commented Apr 5, 2020 •

edited

Loading

ForNeVeR commented Sep 26, 2021

Telegram content proxy #102

Telegram content proxy #102

Comments

ForNeVeR commented Apr 5, 2020 • edited Loading

ForNeVeR commented Sep 26, 2021

ForNeVeR commented Apr 5, 2020 •

edited

Loading