Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Score and Awards #3

Open
profucius opened this issue Dec 19, 2022 · 11 comments
Open

Added Score and Awards #3

profucius opened this issue Dec 19, 2022 · 11 comments

Comments

@profucius
Copy link

profucius commented Dec 19, 2022

Hi again, just a quick request here. I'm not yet versed in pull requests on Github, else I would submit this as one of those instead.

I wanted to also grab Score and Awards from my saved reddit posts. These are useful datapoints when sorting in the Dataview query. Would you be willing to add those to the script for future updates? I do have it working successfully on my end.

I added the bottom two lines just after this section of text (although I don't think the placement matters);

	for(let child in obj.data.children) {
		let thisSub = obj.data.children[child].data.subreddit;
		let thisAuthor = obj.data.children[child].data.author;
		let thisUrl = 'https://www.reddit.com' + obj.data.children[child].data.permalink;
		let thisId = obj.data.children[child].data.id;
		let thisTimestamp = new Date(obj.data.children[child].data.created_utc * 1000);
		let thisScore = obj.data.children[child].data.score;
		let thisAwards = obj.data.children[child].data.total_awards_received;

And I added the bottom two lines just after this section of text (although I don't think the placement matters);

		let thisNote = '---'
			+ '\nsub: ' + thisSub
			+ '\ntitle: ' + escChars(thisTitle)
			+ '\nscore: ' + thisScore
			+ '\nawards: ' + thisAwards
@erohtar
Copy link
Owner

erohtar commented Dec 20, 2022

Hi there, Thank you for these snippets!

They are undoubtedly very useful additions. And I was about to merge these with the main code when it occurred to me why I didn't add these in the first place - they are only true at any given point in time. The awards and score of a post can/will be different based on the time the script downloads it.

And while it's true that an old post would likely not have those numbers move very much, but it's also less likely that majority of posts/comments the user would be saving currently would be inactive ones.

While it's possible to refresh data in previously downloaded posts going all the way back (and not just last 25), it's way too much overhead and runs the risk of corrupting any user edited content in those notes/files.

If you could show me a way around these concerns, I'm happy to merge your changes here.

@profucius
Copy link
Author

profucius commented Dec 20, 2022

Thanks for replying. I have a few questions so I can understand more about what you're mentioning here.

The awards and score of a post can/will be different based on the time the script downloads it.

This is precisely why I want to include the Score and Awards in my saved posts. I save a lot of things on Reddit (probably 10-20 per day minimum) and now I have your script running on an hourly basis to ensure I do not miss grabbing any. However the downside of this method is that I am only capturing "fresh" data; I want to be able to capture "mature" data, after the post as had say a week to gather 90% of its upvotes and awards. Then, I use a customized Dataview query to show all the reddit posts sorted by the highest Scores, and another one for the highest Award count. This is invaluable for sifting through the large dataset, and is why I want them included in my capturing.

runs the risk of corrupting any user edited content in those notes/files.

I was unaware this would be possible, do you mean in regards to the difference of HDD users vs SSD users? Afaik the OS reads and writes data far faster and more frequently than your script does, so I am not sure as of now what the increased risk of corruption is that you script would pose were you to include the Score and Awards per saved post.

Also, if a user the overwrite = 0 setting, that would avoid any risk mentioned.

it's possible to refresh data in previously downloaded posts going all the way back (and not just last 25)

I'm very curious about this! I thought that 25 was the hard limit imposed by Reddit and there was no way around it. Is this something that I could try for myself, as a power user?

@erohtar
Copy link
Owner

erohtar commented Dec 20, 2022

I was unaware this would be possible, do you mean in regards to the difference of HDD users vs SSD users?

Oh no nothing that complicated. I'm just saying if I download and recreate new .md files for old posts, if the user had made edits to their notes, they'd get overwritten. And if I just try to edit the YAML, I might accidentally corrupt something.

Also, if a user the overwrite = 0 setting, that would avoid any risk mentioned.

Yes, but that would also mean I'm not touching old note files in any way, including updating awards/scores.

I'm very curious about this! I thought that 25 was the hard limit imposed by Reddit and there was no way around it. Is this something that I could try for myself, as a power user?

No, that actually is a hard limit - what I meant was traversing the reddit folder, reading all the note files, pick up link and trying to grab the award/comments from the post itself. It's a ton of effort and overhead, with gains that I can't justify. Also, we again run into the issue of editing existing notes.

I save a lot of things on Reddit (probably 10-20 per day minimum) and now I have your script running on an hourly basis to ensure I do not miss grabbing any

That's truly very impressive!

So right now I can't think of any solution to include award/score data that's all around great - and I'm saying that with a lot better understanding of your particular use case and how important those data points would be for you.

Just brainstorming with you now: lets say if the scores/awards were saved in an entirely separate file that would get updated for all the entries on each run, would Dataview somehow be able to read scores/awards from this file and the rest of the data like it does currently? This solution, if possible, would also avoid editing old note files, while still keeping the scores updated.

@profucius
Copy link
Author

profucius commented Dec 20, 2022

if the user had made edits to their notes, they'd get overwritten
that would also mean I'm not touching old note files in any way, including updating awards/scores.

I totally understand the concern for this, in fact in the beginning I was sure that I would want my Overwrite=0 so that any edits I made to the file wouldn't be overwritten. But the longer I thought about it, the more I considered that I would rather let the saved reddit file be a source of absolute truth to the source material, allowing any edits the poster/commenter made, and now any score/awards updates. I am going to be pulling data out of these files into my own note files, referencing and using plugins that create "blocks" from other files. This way, it doesn't matter if the file is updated/overwritten, my separate notes of those files will not.

This is the ethos I am working from here, and I know that I'm not the only one, as this is considered part of the Zettlekasten method; To slip everything into a box, and reference from it as a source of always-updated truth. So I hope that helps you understand why I am requesting the Score and Awards be included, although I can also understand it from your perspective too.


Just brainstorming with you now: lets say if the scores/awards were saved in an entirely separate file that would get updated for all the entries on each run, would Dataview somehow be able to read scores/awards from this file and the rest of the data like it does currently? This solution, if possible, would also avoid editing old note files, while still keeping the scores updated.

This is a very interesting idea! Unfortunately I don't know enough about Dataview to know if this would be possible. If it were to be possible, however, I would love to know about it, as I could use this concept in other areas of my vault!


Here is a proposal, if you are still on the fence: Would you include the snips in my first post into your script, drop them in place but leave them commented out? This would give new users, and myself, the freedom of choice to uncomment them so that we can use these feature on our own, without needing to manually edit your script to (re)add them anytime you update the original script in the future. I can write myself an automation to handle that any time the modification date on the .js file is changed, so that the features are always is available to me.

Perhaps in a future update it would be possible to have the script be customizable with some type of UI that allows for checkboxing features on and off, so that users can customize their own frontmatter choices. But I understand this is not something that happens overnight, and so my pre-commented-out proposal could be a stop-gap in the meantime. What do you think?

@erohtar
Copy link
Owner

erohtar commented Dec 20, 2022

But the longer I thought about it, the more I considered that I would rather let the saved reddit file be a source of absolute truth to the source material

That's actually very risky. I'm not sure if you considered this, but posts and comments get deleted all the time on reddit, and if a note was created okay initially but some time back I go to it and turns out it got deleted in an attempt to refresh it to latest copy, I'm sure any user would be upset.

So even if you don't want to make any edits to your downloaded saves, it's still a good idea to keep overWrite=0

Unfortunately I don't know enough about Dataview to know if this would be possible. If it were to be possible, however, I would love to know about it, as I could use this concept in other areas of my vault!

That's the thing - I've found same Qs on Dataview keep getting asked again and again by newer users because its documentation is sorely lacking and scant. A pity, given how amazing the plugin is. Anyway, if you happen to find out this possibility, I'm willing to look into adding this feature.

What do you think?

Well, even if I were to add those lines, and user kept overwrite to 1, and say no posts got deleted, we'd still only be refreshing the top 25 posts for scores on each run. Once you have a fresh set, which for you is much sooner than others, the scores are stuck to the old values. So you see, even if we were to ignore all those downsides, the upside is severely handicapped.
That's why I'm more inclined into finding out the feasibility of the other solution.

@profucius
Copy link
Author

profucius commented Dec 20, 2022

posts and comments get deleted all the time on reddit

I did consider this in setting up my system, and I settled on using a combination of the Obsidian Core plugin "File recovery" with a backup of online reddit archiving sites to find text that was removed. I also take nightly snapshots of my Obsidian vault to an external drive. It might seem like more work than it's worth, and technically it is if compared to your idea of splitting off the dynamic data (scores, awards, etc) to be updated in a separate file somehow. If we can figure out a way to implement that idea, that will definitely eliminate the need to overwrite with changes.


I'll give it some thought as to how this might be possible. Would you find it acceptable if it were possible by leaning on another plugin to service the function? That is, to allow for writing certain frontmatter to a separate file, which could then be accessed by Dataview in some way.

I am brainstorming here, perhaps your script could have a function added that writes to this separate file the dynamic data (score, awards, etc) and then another function that calls this file and checks the newer score/awards data using the public reddit api (?) to then update those fields in the dynamic file. This would then be read by Dataview or another plugin to create a single view of data, between the static reddit saved md files, and the dynamic score/awards/etc. Thoughts?

@erohtar
Copy link
Owner

erohtar commented Dec 21, 2022

I settled on using a combination of the Obsidian Core plugin "File recovery" with a backup of online reddit archiving sites to find text that was removed. I also take nightly snapshots of my Obsidian vault to an external drive

That's a very elaborate backup setup you got going there. And while it's commendable and motivating, I can't expect other users to be so committed to the cause - they'll likely lose some of their important saves and would not be very happy about it.

I'll give it some thought as to how this might be possible. Would you find it acceptable if it were possible by leaning on another plugin to service the function? That is, to allow for writing certain frontmatter to a separate file, which could then be accessed by Dataview in some way.

I'm open to consider anything. If it adds positive value with a reasonable effort from my and the end users' side, then it should be worth adding. Lets see what you come up with - though you might want to look into using DataviewJS first : I've seen people do very creative things with it, and it doesn't require leaning on another plugin.

@profucius
Copy link
Author

profucius commented Dec 21, 2022

I think I may have found a solution already! I am still exploring it, and I need to write up some documentation for you on the steps I followed to create this example. In the meantime, here is the summary:

I'm using an Obsidian plugin called Database Folders. This allows for creating Notion-like databases as a file within Obsidian, which can then reference your Obsidian vault via a Dataview query, or simply reference the Tags, etc. It's still in early development, but is already quite far along feature-wise, and I can see the potential in using this to implement a working system with your script.

The very general summary of how this works is that I've created a Database Folder file that references two files which would be generated by your script: a Content file, and a Metadata file. In this system, your script would generate two files per reddit content saved, and put them in separate folders under the same root folder. One folder can be called such as "Content" and the other can be called "Metadata", for example.

The files that your script saves to the Metadata folder contain only the frontmatter of the reddit content being saved. The opposite is true for the files being saved to the Content folder, no frontmatter, only the reddit content in the body. However, the caveat is that the Metadata file will also have a line in the body (below frontmatter section) which the Database Folders plugin will use for the "Reference" field.

This is very similar to how Notion works, if you've ever used Notion before, you have two tables, one references the other, and then you can do things like Rollup fields from one into the other.

So now, we create a Database Folder file, which is simply a Dataview query that references the Metadata folder. This makes it possible to create column headers in the table of all the frontmatter fields. There is also this Reference field, which then references the text in the body of the Metadata file, which can be generated text by your script in the same format as the plugin expects.

The end result here is that your script automatically generates the two files (Content, and Metadata). Then, inside the Database Folders plugin, it loads this table of Metadata, and has a column which serves as a link to open the Content itself.

This means that all of the Reddit content itself can be Overwrite=0, but all of the metadata of that post is handled in a separate file (named exactly the same in a cousin folder) which can be Overwrite=1 to be updated at any time.

Even better, the Metadata file can have some unique ID for that post, within the frontmatter, so that a different part of your script can be written to grab the updated Metadata via the Reddit API, making it possible to update any of the Metadata across all saved content, and not be limited to just 25 items.

After typing this out, I realize how complex it may seem simply from explanation. If it helps you understand, I can prepare a downloadable vault with the demonstration, so you can play around with the idea. Also I can create a screen capture.

What are your thoughts so far?

@erohtar
Copy link
Owner

erohtar commented Dec 22, 2022

No you did a great job explaining it, and I'm with you partially on this. The part I'm not willing to go with is moving ALL frontmatter to separate files and refresh them on each run - for the same reasons as earlier:

  • Deleted posts would risk losing frontmatter if refreshed
  • Still only last 25 would get refreshed
    And the added reason:
  • Doubles the number of files saved per note, which is hard to explain why, but I find unappealing so far

I've not used Notion myself though I hear of it all the time, and Database Folders is a plugin I meant to take for a spin since last week or so - so thank you for bringing it up.

Here's my take on how this whole thing would work ideally:

  • The note files work exactly like they work right now
  • There is just one more file (maybe in json but that's flexible), that has awards/score data for each of the post/comment saved
  • This file could be (optionally) entirely refreshed on each run and carries no risk of affecting the YAML or note contents in case anything breaks or is deleted on reddit
  • The Dataview table (somehow) would read the above file along with the individual files and display the results

Did you look into DataviewJS? I strongly suspect using that could achieve the above.

@profucius
Copy link
Author

Hey, thanks for the reply. Here are my thoughts so far, great ideas you had!

Still only last 25 would get refreshed

The idea I presented previously might be able to avoid this limitation. My thought is that by having the permanent URL or unique post ID of the reddit post/comment, your script can update its score, awards (any dynamic values) against the reddit API itself, which should provide live data of any post/comment you request from it. Your script would simply go through a list of unique IDs/URLs for all of the user's saved posts, and check for new score/awards/etc values. I'm not fully aware of how this API works or where to access it, but I see other apps utilizing some form of reddit data API calls.

Deleted posts would risk losing frontmatter if refreshed

I think deleted posts preserve the scores, but might lose awards? I know that on my 3rd party Reddit android app (Boost) I can still see scores on posts/comments that are deleted, so perhaps that data is still available via the API, and therefore would not be lost? But a better idea would be to write a validation, if the script finds a post is now deleted while updating values on an existing note, then skips it.

one more file (maybe in json)

JSON is a great idea here. In fact I'm surprised I didn't think of that rather than my previous idea! JSON can serve as a basic database if given proper forethought, and in the scope of your script I believe it could work nicely.

Did you look into DataviewJS?

I have not yet peeked at that. I don't think I'll be able to until after the new year, as the holidays are going to get quite busy for me here. But if you think that this could serve the function of pulling from a JSON while rendering a Dataview table, then that is a promising idea! Hopefully it will also be compatible with the likes of "Database Folders" plugins and the other similar ones popping up, too.

@erohtar
Copy link
Owner

erohtar commented Dec 25, 2022

I think deleted posts preserve the scores, but might lose awards?
...But a better idea would be to write a validation, if the script finds a post is now deleted while updating values on an existing note, then skips it.

I have been considering running a little test. Make a post in the sub with the title "Mods - Test Post. Pls delete 5 mins after posting" and make a test comment to it that is also requesting to be deleted. Then I save and download both of them. And once they are deleted, I check what exactly does that json look like compared to original.
Once I have all that info, I can update my code to look for deletions and skip updating those notes even if overwrite is set to 1.

I'm guessing what I described above is pretty much what you meant as well, right?

I have not yet peeked at that. I don't think I'll be able to until after the new year, as the holidays are going to get quite busy for me here.

No worries, we can put a pin in it till then.

Btw I got a chance to play with 'Database Folders' plugin and now I have switched to using that for my Books/Reading list instead of a Dataview table. So far I've only used a limited set of its features, but the part that really sold me on it is that I can edit frontmatter in it directly and also have some columns converted to a dropdown, which works great in switching a book between Unread/Reading/Done.

Happy Holidays

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants