Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Do existence check for linked articles instead of downloads #222

Closed
mandolyte opened this issue Oct 13, 2021 · 7 comments · Fixed by #217
Closed

Enhancement: Do existence check for linked articles instead of downloads #222

mandolyte opened this issue Oct 13, 2021 · 7 comments · Fixed by #217

Comments

@mandolyte
Copy link
Contributor

Consider adding "existence checks" as an option.

Rationale: Would gain speed and retain benefit of checking linked articles, but not checking the content of the linked article.

Method:
This Gitea API returns a JSON structure that can be kept in memory to check if a TW article exists:

https://git.door43.org/api/v1/repos/unfoldingword/en_tw/git/trees/master?recursive=true&per_page=99999

@RobH123
Copy link
Contributor

RobH123 commented Oct 13, 2021

Will definitely investigate this -- thanks @mandolyte and @richmahn!

@RobH123 RobH123 linked a pull request Oct 19, 2021 that will close this issue
@RobH123
Copy link
Contributor

RobH123 commented Oct 19, 2021

Hmmh, even adding one more 9 to that, still can't fetch entire tree for en_ugl -- only fetches the first set out of the 29,969 total entries! Will have to work out how to loop to get the next page(s) and then how to combine the JSON!

@RobH123
Copy link
Contributor

RobH123 commented Oct 20, 2021

Ok, it seems that Gitea supplies a maximum of 12,000 entries at one time. I was a bit confused that the truncated flag is set even for the last page, so it really seems to mean not all entries are there in this fetch rather than more entries still to come after these ones.

@mandolyte
Copy link
Contributor Author

mandolyte commented Oct 20, 2021 via email

@richmahn
Copy link
Member

richmahn commented Dec 9, 2021

It is configurable, but for the whole API. It is usually best practice to avoid making the server serve everything but rather just keep querying and append results to an array until you get no results (or truncated is false, as mentioned above). It also tells you the total count when querying the first page, so you can also know when you have them all by that.

image

@richmahn
Copy link
Member

richmahn commented Dec 9, 2021

Actually it looks like you will need to get to a page that does not return anything (i.e. "tree": null) to get "truncated": false. Not sure if that is common API procedure or not. I didn't write this.

image

@RobH123
Copy link
Contributor

RobH123 commented Dec 9, 2021

@richmahn cc @mandolyte Yes, I ended up writing an append loop and that wasn't hard once I realised that even the last page has truncated set to false (as mentioned above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants