[features] I would like to see, and some questions. #865

bruno40 · 2023-05-24T22:32:53Z

bruno40
May 24, 2023

EDIT: I think this actually a bug, so I made a report #866.

I've gotten good use out of this program, so well done to whoever has helped to make it so feature rich.

I have read the documentation, by the way.

This is what I really want to know: Should clone take longer than download, all things being equal? Is it just saving data instead of deleting it, or could it be downloading more from praw than in download mode?

There are 4 features that I would find useful, to download more efficiently. They are in order of importance:

An option for maximum total time to spend on a post. It's possible that this is not needed anymore, but in the past, I've had the problem of getting stuck on a post. I should have tried the -v option, but I didn't think of it.. Anyway, it's frustrating to leave a program to run overnight, only to find that it got stuck an hour or so in. It would be nice to be able to set: Skip any post that takes longer that 10m, or 1m. If it was an option, I would definitely use it just in case.

An inclusive file extension filter. The key with this, is that it should completely skip a post, unless the url field links to a matching file. As it is, you can blacklist the files you don't want, but when there isn't any file linked, bdfr seems to save a txt containing the selftext of the post. I haven't found a way to stop that from happening.

An option for clone to not archive, unless also downloading. I use clone, but I actually only want the data if it also comes with a file I want. It would be nice to have that option.

As others have suggested, a no-comments option for archive and clone. I don't think this matters that much though, as I don't think it could actually save you from downloading them. Just automatically delete them afterwards. But I may be wrong.

The last three suggestions were aimed at reducing bandwidth, to download faster. But I'm not sure that any of them could actually work that way, or if they would just be avoiding saving data that must be downloaded anyway. I suppose it depends on praw. I don't think they should be added unless they would reduce wasted bandwidth, as you could just process the output with a separate program to get the same result.

End of feature suggestions. The rest is general asides.

I used to have problem with high ram usage, with 10 instances. But it improved a lot by switching from unlimited, to a limit of 12. So that's a tip.

This is how I'm using it, by the way:

#!/bin/bash

echo 0$1

ls newdata > newdataids

python3 -m bdfr clone --file-scheme "$(date --utc +%Y-%m-%d-%H-%M-%S)-gmt_{POSTID}" --opts new.yaml --disable-module Imgur --skip mp4 --skip gif --skip mkv --exclude-id-file newdataids --user -m 0$1 ~/bdfr/newdata/

new.yaml: folder_scheme: "{POSTID}" sort: new limit: 12

$ while :; do ./script.sh 1; done

In my current setup, exclude_id_file isn't having the effect it should. I think it's from using clone mode, because is was working fine before in download more. And now, it does work for downloading, but not archiving: It will do both the first time, but once the id is excluded, it will only archive. It doesn't really matter to me, but reddit must be annoyed by me requesting the same data repeatedly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[features] I would like to see, and some questions. #865

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

[features] I would like to see, and some questions. #865

bruno40 May 24, 2023

Replies: 0 comments

bruno40
May 24, 2023