-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPI local installation caches Reddit exported data and does not refresh #187
Comments
This is probably intended behavior and the sqlite files are created in the .cache folder by cachew per design. Question is how do I get those file recreated after re-running the rexport script, perhaps removing them as part of the script execution is the most logical approach |
On line 88 in diff --git a/my/reddit/rexport.py b/my/reddit/rexport.py
index cca3e35..5c4d045 100755
--- a/my/reddit/rexport.py
+++ b/my/reddit/rexport.py
@@ -85,7 +85,7 @@ Upvote = dal.Upvote
def _dal() -> dal.DAL:
inp = list(inputs())
return dal.DAL(inp)
-cache = mcachew(depends_on=inputs) # depends on inputs only
+cache = mcachew(depends_on=inputs, logger=logger) # depends on inputs only
@cache
If you modify the line to add the logger (this should actually probably be done by default), you can then see what cachew is doing by settings the
In most cases you'll see the same If you then add a new one by running the
You should hopefully see it recalculating ( |
Oh -- The only case where I see an issue if the filesnames of the new data are the same as the old, and you seem to be using
so it may be expecting that exports made by If you change the date command to be specific to the second rather than the date, to something like:
... may fix this issue, unsure. |
Yep, I think @seanbreckenridge is right -- it would be due to There is something experimental to use the file modification time, but still need thing how/if we should rely on it by default https://github.com/karlicoss/cachew/blob/49d349f5c32ae25d6f5a36279c8f0c5090242da2/src/cachew/__init__.py#L623-L626 And yeah, IMO it's best to keep full timstamp.. either by |
I am experimenting with HPI as I was looking for a system that would allow me to create a repository of my digital traces: cool stuff.
I've installed HPI according as per the local/editable option.
I'm testing it with Reddit.
I've configured the path to the Reddit export file in $HOME/.config/my/my/init.py by adding:
export_path = "/home/ubuntu/hpi/reddit/*.json"
Rexport is using the information in secret.py to dump the Reddit data:
python3 -m rexport.export --secrets $HOME/git/rexport/secrets.py > ./reddit/"export-$(date -I).json"
This piece of code I've found in the documentation should report the list of the 4 subreddits with most saved posts:
But what happens is that the information processed by my.reddit gets cached in $HOME/.cache and does not update when I rerun the rexport script
To see the refreshed dump I must first delete the cached files.
What am I missing?
Thanks
s.
The text was updated successfully, but these errors were encountered: