Skip to content

Latest commit

 

History

History
722 lines (567 loc) · 18.5 KB

README.md

File metadata and controls

722 lines (567 loc) · 18.5 KB

Overview

Creates virtual file system with files posted on telegram.

VERY ALPHA SO FAR

Table of Contents

Requirements

  • Linux
  • Python 3.10 (not sure about 3.9 and less)

Installation:

pip install tgmount

Basic usage

To obtain your API id follow official manual. Running the program for the first time will require authentication.

$ export TGAPP=1234567:deadbeef0d04a3efe93e1af778773d6f0 TGSESSION=tgfs

To mount a channel/chat/group

tgmount mount tgmounttestingchannel ~/mnt/tgmount1/

To mount an entity that doesn't have a username you will need to get its id.

tgmount list dialogs | grep 'my friends private chat'

To mount zip files as directories use UnpackedZip producer

tgmount mount tgmounttestingchannel ~/mnt/tgmount1/ --producer UnpackedZip

Use config file to create a more complex vfs structure

tgmount mount tgmounttestingchannel ~/mnt/tgmount1/ --root-config examples/root_config.yaml

Mounting multiple entities

To mount multiple entities use mount-config command

tgmount mount-config examples/config.yaml

Sample config

# can be overwritten by --mount-dir argument
mount_dir: /home/horn/mnt/tgmount1

client:
  session: tgfs
  api_id: 123
  api_hash: deadbeed121212121

# 
message_sources:

  ru2chmu:
    entity: ru2chmu
    updates: False
    limit: 1000

  friends:
    entity: -388004022
    limit: 1000
    
caches:
  memory1:
    type: memory
    capacity: 300MB
    block_size: 128KB

root:
  muzach:
    # A document will not be mounted more than once when it appears in a 
    # different messages. `recursive` means this filter will also be applied 
    # down the folders tree
    filter: { filter: OnlyUniqueDocs, recursive: True }
    # Messages from `ru2chmu` will be used to produce content in the nested folders
    source: { source: ru2chmu, recursive: True }
    # creates subfolder named `music`
    music:
      filter: 
        # the directory will contain music and zip archives
        Union: [MessageWithMusic, MessageWithZip]
      # zip archives will be mounted as folders
      producer: UnpackedZip
      # using cache speeds up reading from the archives
      cache: memory1
    texts:
      # messages with text 
      filter: MessageWithText
      # this commands tgmount to treat messages with both document and text
      # as text files
      treat_as: MessageWithText

  friends:
    source: {source: friends, recursive: True}
    music-by-senders:
      producer:
        # this producer creates a separate directory for every sender in the entity
        BySender:
          dir_structure:
            # these directories will only contain music 
            filter: MessageWithMusic
    liked-music:
      # this directory will be containing all music with thumb up reaction
      filter: 
        And: 
          - MessageWithMusic
          - ByReaction: 
              reaction: 👍
    images:
      filter: 
        Union: [MessageWithCompressedPhoto, MessageWithDocumentImage]

More about config structure read in Config file structure

Client commands

tgmount mount

tgmount mount [--filter FILTER] [--root-config ROOT_CONFIG]
[--producer PRODUCER] [--offset-date OFFSET_DATE] [--offset-id OFFSET_ID] 
[--max-id MAX_ID] [--min-id MIN_ID] [--wait_time WAIT_TIME] [--limit LIMIT] 
[--reply-to REPLY_TO] [--from-user FROM_USER] [--reverse] [--mount-texts] [--no-updates] 
[--debug-fuse] [--min-tasks MIN_TASKS] entity mount-dir

Define the structure of the mounted folder by one of these options

--producer PRODUCER
--root-config ROOT_CONFIG

Available producers:

PlainDir    # just a list of files (default)
UnpackedZip # PlainDir but zips are mounted as folders
BySender    # files grouped in folders by sender
ByForward   # forwarded files grouped by source entity
ByPerformer # music grouped by performers
ByReactions # files grouped by reaction

The following arguments work as described in TelegramClient.get_messages.

--filter [FILTER]
--offset-date OFFSET_DATE
--offset-id OFFSET_ID
--max-id MAX_ID
--min-id MIN_ID
--wait_time WAIT_TIME
--limit LIMIT
--reply-to REPLY_TO
--from-user FROM_USER
--reverse

Available telegram filters:

InputMessagesFilterDocument
InputMessagesFilterPhotos
InputMessagesFilterVideo
InputMessagesFilterPhotoVideo
InputMessagesFilterUrl
InputMessagesFilterGif
InputMessagesFilterVoice
InputMessagesFilterMusic
InputMessagesFilterRoundVoice
InputMessagesFilterRoundVideo
InputMessagesFilterMyMentions

Using these filter speeds up fetching process but these filter cannot be composed.

If you don't need updates

--no-updates

If you want to also to mount text messages as text files

--mount-texts

Other arguments

--debug-fuse
--min-tasks MIN_TASKS

tgmount mount-config

tgmount mount-config [--mount-dir MOUNT_DIR] CONFIG_FILE MOUNT_DIR

tgmount list dialogs

tgmount list dialogs

tgmount list documents

tgmount list documents [--filter FILTER] [--offset-date OFFSET_DATE] [--offset-id OFFSET_ID]
[--max-id MAX_ID] [--min-id MIN_ID] [--wait_time WAIT_TIME] [--limit LIMIT] 
[--reply-to REPLY_TO] [--from-user FROM_USER] [--reverse] [--json]
[--print-message] [--include-unsupported] [--only-unsupported] [--all-types]
[--only-unique-docs] entity

--print-message

Include stringified message object in the output

--all-types

Print all classes a message matches

--only-unique-docs

Exclude repeating documents

--include-unsupported

Include messages that are not supported for mounting

--only-unsupported

Print only them

--json

Print in json format

tgmount download

tgmount download [--output-dir OUTPUT_DIR] [--keep-filename] [--request-size REQUEST_SIZE] entity ids [ids ...]

--keep-filename

Keep original filenames

--output-dir

Destination folder for files

--request_size

How much data to fetch per request

entity

Entity to download from

ids

Messages ids

Example:

tgmount download -O /tmp -R 256KB tgmounttestingchannel 532 11 51 18 

Im combination with list documents

tgmount download ru_python $(tgmount list documents ru_python --filter InputMessagesFilterDocument --limit 10 --json | jq  '.[]|.id') -O /tmp

Config file structure

Config file has the following sections:

  • client
  • message_sources
  • caches
  • root

caches section is optional.

Top level properties

# optional. can be overwritten by --mount-dir argument 
mount_dir: ~/mnt/tgmount

client

Contains settings for the telegram client

client:
  # telethon session name 
  session: session_name

  # telegram api credentials
  api_id: int
  api_hash: str

  # optional field
  request_size: 128KB

  # optional field. Default: False
  use_ipv6: True

message_sources

A message source defines a list of messages that will be used in vfs tree construction. Every message source is a separate TelegramClient.get_messages request. Message source is also subscribed to events of posting, removing and editing messages in the entity it is sourced from.

message_sources:
  # key defines id of the message source to reference in the `root` section 
  source1: 
    # channel/group/chat id to fetch messages from
    # string or int
    entity: tgmounttestingchannel
    
    # all the following fields are optional

    # whether to listen for updates. Default: true
    updates: True

    # Filter for message types. If not set all the messages types including text 
    # messages will be fetched
    filter: MessageWithMusic

    # limits the number of messages
    limit: 1000

    # format is `31/12/2023` or '31/12/2023 13:00'
    offset_date: `31/12/2023`

    offset_id: 0
    min_id: 0
    max_id: 0
    wait_time: None
    reply_to: int
    from_user: str | int
    reverse: False

caches

Defines cache storages for documents. Cached parts of a document will not be fetched twice. Usually this is not needed because OS file system does caching by itself. Cache is needed in couple with UnpackedZip producer since the OS file system cache is not applied in case of using this producer.

caches:
  # the key defines cache id to be referenced in `root` section
  cache1:
    # currently only memory cache is supported 
    type: memory
    # The size of the cache
    capacity: 300MB
    # optional block size, default: 128KB
    block_size: 256KB

root

This section defines the structure of the mounted folder.

root:
  # optional. sets the message source for the current directory. If this is not
  # set and there is no recursive filter has been defined before, the folder 
  # will not contain any files
  source: source1
  source: {source: source1}

  # sets the message source for the current and for nested folders
  source: {source: source1, recursive: True}
  
  # optional. sets a filter for the current folder. Default is no filter
  filter: MessageWithMusic
  filter: {filter: MessageWithMusic}

  # sets a filter for the current folder and subfolders
  filter: {filter: MessageWithMusic, recursive: True}

  # sets a filter for the current folder and subfolders overwriting another 
  # recursive filter if any 
  filter: {filter: MessageWithMusic, overwright: True, recursive: True}

  # the following combines multiple filters. Only messages that match every filter
  # in the list will pass. The filter below allows all documents that
  # that are not video, photo or audio and not a zip file 
  filter: 
    - MessageWithOtherDocument
    - Not:
      - ByExtension: .zip

  # in one line
  filter: {filter: [MessageWithOtherDocument, Not: {ByExtension: .zip}], overwright: True, recursive: True}

  # defines a producer that controls the content of the folder. 
  # Default is PlainDir
  producer: BySender

  # producer may have properties
  producer: 
    BySender:
      dir_structure:
        music: 
          filter: MessageWithMusic
        voices: 
          filter: MessageWithVoice
      use_get_sender: true

  # sets a cache for the current folder
  # referencing a cache defined in `caches` folder 
  cache: memory1

  # dynamically creates a cache to use in this folder
  cache: 
    type: memory
    capacity: 300MB

  # optional. wrapper that modifies the resulting content of the folder 
  wrapper: ExcludeEmptyDirs

  # optional. Defines the priority of how to classify a message if multiple classes
  # match its type. E.g. a message with both a document and a text message  
  treat_as: MessageWithText

  # to define subfolders
  documents:
    # 'documents' folder will only contain the two following subfolders 
    docs_from_source1:
      source: source1
      filter: MessageWithDocument
    docs_from_source2:  
      source: source2
      filter: MessageWithDocument

source

Message source is a list of messages which is used to produce a directory content. Message source is initialized from get_messages() request and is updated by events of posting message, removing message and editing message in the corresponding entity.

Producer is subscribed to a message source and takes care of the directory it is responsible for. It manages it by adding and removing files and subfolders.

The content of a folder is defined by a combination of properties source, filter, producer and treat_as.

This will create a tree of empty folders

root:
  everything:
  photos:
  texts:
  round-and-voice:
    rounds:
    voices:

The config will result into

/everything
/photos
/texts
/round-and-voice
/round-and-voice/rounds
/round-and-voice/voices

To fill the directories with files we need to specify a source for every folder that is supposed to contain files

root:
  everything:
    source: source1 
  photos:
    source: source1
  texts:
    source: source1
  round-and-voice:
    rounds:
      source: source1
    voices:
      source: source1

In result every directory that has source property will contain all the files from the specified source.

Let's add filters

root:
  everything:
    # don't need filter here
    source: source1
  photos:
    source: source1
    filter: MessageWithCompressedPhoto
  texts:
    source: source1
    filter: MessageWithText
    treats_as: MessageWithText
  round-and-voice:
    rounds:
      source: source1
      filter: MessageWithKruzhochek
    voices:
      source: source1
      filter: MessageWithVoice

As soon as the only source used in the structure is "source1" we can get rid of repeating it by using recursive property of source.

root:
  source: {source: source1, recursive: True}
  everything:
    filter: All
  photos:
    filter: MessageWithCompressedPhoto
  texts:
    filter: MessageWithText
    treats_as: MessageWithText
  round-and-voice:
    rounds:
      filter: MessageWithKruzhochek
    voices:
      filter: MessageWithVoice

Note that

  1. The root itself will not contain any files because source with recursive flag doesn't trigger file producing
  2. We had to specify filter in "everything" to trigger file producer. For the same effect we could have specified a producer instead.
everything:
  # triggers producing from the recursive source
  producer: PlainDir

The complete rules:

A folder will be produced with content from a message source in cases when:

  1. source is specified and it's not recursive
  2. recursive source is in the context and filter property specified and it's not recursive
  3. recursive source is in the context and producer prop is specified

filter

By message type:

MessageWithDocument # Message with a document attached (message with compressed
#  image doesn't match) 
MessageWithCompressedPhoto # with a compressed image (photo)
MessageDownloadable # `MessageWithDocument` or `MessageWithCompressedPhoto`
MessageWithAnimated # stickers, gifs
MessageWithAudio # voices and music
MessageWithVoice # voice
MessageWithKruzhochek # round video
MessageWithDocumentImage # uncompressed image
MessageWithFilename # document with a filename attribute
MessageWithMusic # music
MessageWithVideo # round video, video documents, stickers, gifs
MessageWithVideoFile # video documents
MessageWithSticker # sticker
MessageWithOtherDocument # Any document that doesn't fall in the previous categories
MessageWithZip # zip file
MessageWithText # message with text message
MessageWithoutDocument # message with no document and no photo
MessageWithReactions # message with reactions
MessageForwarded # forwarded message

# Telegram filters
InputMessagesFilterPhotos     # MessageWithCompressedPhoto
InputMessagesFilterVideo      # MessageWithVideo
InputMessagesFilterPhotoVideo # MessageWithCompressedPhoto | MessageWithVideo
InputMessagesFilterDocument   # MessageWithOtherDocument | MessageWithDocumentImage
InputMessagesFilterGif        # MessageWithAnimated
InputMessagesFilterVoice      # MessageWithVoice
InputMessagesFilterMusic      # MessageWithMusic
InputMessagesFilterRoundVideo # MessageWithKruzhochek
InputMessagesFilterRoundVoice # MessageWithKruzhochek | MessageWithVoice

Other filters

# Filter wrapper to reverse a filter. 
Not: MessageWithReactions

# Combines multiple filters. If any matches
Union: 
  - MessageWithDocumentImage
  - MessageWithCompressedPhoto

# Combines multiple filters. If every matches
And:
  - MessageForwarded
  - MessageWithVideo

# same as
filter: [MessageForwarded, MessageWithVideo]

# takes first `count` messages
First:
  count: 10

# takes last `count` messages
Last:
  count: 10

# Filter by a filename extension
ByExtension: .zip

# will only leave unique docs
OnlyUniqueDocs:
  # optional. Control which document, first appeared or last appeared, will stay.
  # default: first
  picker: last 
  picker: first 

# passthrough filter. Used to trigger tgmount to produce content in the folder
# or to reset recursive filter
All

# sequentially filters messages. E.g. last 10 unique documents 
Seq:
  - MessageWithDocument
  - OnlyUniqueDocs
  - Last: 10

# matches reactions
ByReaction:
  reaction: 👍
  # optional. default: 1
  minimum: 5

producer

PlainDir
BySender
ByForward
ByPerformer
ByReactions
SysInfo
UnpackedZip

Playing flac and mp3 from a zip archive

  1. Seeking in files which are stored in a zip archive only works by reading the offset bytes.
  2. id3v1 tags are stored in the end of a media file :) https://github.com/quodlibet/mutagen/blob/master/mutagen/id3/_id3v1.py#L34

And most of the players try to read it. So just adding a mp3 or flac to a player will fetch the whole file from the telegram cloud.

In current moment this is solved by custom read function for mp3 and flac files in archives. The read call returns 4096 zero bytes when

  1. less than max_total_read = 128KB bytes has been read from the file so far
  2. file_size - offset < distance_to_file_end = 16KB
  3. size == 4096 (usually players read this amount looking for id3v1 (requires further investigation to find a less hacky way))

See FileContentZipFixingId3v1 class

To disable this behavior use --no-fix-id3v1 argument with mount command. In case of mounting a config set fix_id3v1 property of UnpackedZip to False:

producer: {UnpackedZip: {fix_id3v1: False}}

Known bugs

  • No updates received during reconnection
  • Combination of --filter, --offset-date and --reverse always returns empty result