Skip to content
This repository has been archived by the owner on Feb 9, 2023. It is now read-only.

Save share on file rather in memory and allow share index update #9

Open
drd33m opened this issue Feb 23, 2020 · 4 comments
Open

Save share on file rather in memory and allow share index update #9

drd33m opened this issue Feb 23, 2020 · 4 comments
Labels
enhancement New feature or request

Comments

@drd33m
Copy link

drd33m commented Feb 23, 2020

Hi, I know I already have another issue open but I am just putting this here for when you get time.

I would like the files.xml.bz2 to be stored on disk in the directory that bot is running as its not viable to share and index larger shares everytime the bot starts up. Another feature also is to force a re-index for added files

@drd33m drd33m changed the title Save share on file rather in memory Save share on file rather in memory and allow share index update Feb 23, 2020
@aler9 aler9 added the enhancement New feature or request label Feb 23, 2020
@drd33m
Copy link
Author

drd33m commented Mar 25, 2020

UPDATE: I thought I might explain this more since what I have is very vague.

When indexing large folders >40G indexing takes a long time + the memory usage used is not viable for very large indexes at 20G indexed the memory usage was already at 1.5G so there would be no way for people to index large folders without running out of memory.I am not to sure how to solve the indexing taking up large amounts of memory as I have little experience in Go.

But to solve having to reindex the share every time the bot starts, saving the filelist to the filelist.xml.bz2 format to disk would solve it. Loading this file into memory at startup (then call a ShareUpdate if it detects a filelist.xml.bz2) and after a share refresh would also allow us to not thrash the disk.

From what I see and understand in the code there is no way to refresh the filelist I only see ShareAdd and ShareDel. ShareAdd causes the whole filelist to be reindexed again not viable for large shares. I propose the addition of a ShareUpdate function which looks for file updates. Now this function can be called manually by the user allowing for them to have control over when share updates happen.

I know this is a large addition and might be a bit hard to follow. It is hard for me to convey it. But in general I would like to see the filelist functionality of normal DC clients added

UPDATE:

This could also allow you to run multiple bots and share a local files.xml.bz2 instead of every instance indexing there own

@aler9
Copy link
Owner

aler9 commented Mar 25, 2020

Hello, i'll handle this feature when i'll have some free time again, but in the meantime, if you want to help, feel free to try to write a patch yourself.

From what i remember, file indexing can consume a lot of RAM for two reasons:

  • the file list, that can have a size of 100MB and over, and is kept in the RAM, but this isn't the bigger piece of the cake
  • the hash algorithm, that depends on the file size and requires 3 bytes (24 bits) in RAM for each 1024 bytes, so if you have a 100M file, the consumed RAM is 100 * 1024 * 1024 / 1024 * 3 = 307K

then there's the question of progressive share updates, a feature that isn't implemented at all.

So, i'd start with implementing progressive share updates, by writing a function ShareRescan() that

  • scans the share directories
  • detects changed files by size and edit time
  • recomputes their hashes and adds them to the list

@drd33m
Copy link
Author

drd33m commented Mar 25, 2020

Thankyou! I will try to give the ShareRescan() function ago but my knowledge the how TTH and the filelist structure is limited so no promises :).

One thing I forgot to mention before is I am running multiple client instances in go routines. Say if all 4 of them would rescan at the same time whats to stop them adding the same file twice? I thought maybe adding a option to specify the filelist name per client would also be handy. So that ShareRescan could be called via a channel and staggered

@drd33m
Copy link
Author

drd33m commented Aug 17, 2020

Hey just bumping this as this is still a feature I would love to see in this since this is the only NMDC/ADC lib out there. Maybe ncdc sources might help out https://g.blicky.net/ncdc.git/tree/src I see you are a busy person. So I hope that you can find some room for this. 👍

The main idea have a files.xml.bz2 like all DC++ clients. I had rigged together the load of a files.xmlbz2 from ncdc and sending it was fine it was just sending the files in that list I stopped at. This tth stuff is well beyond my level of knowledge

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants