arxiv-ml-reviews

arxiv-ml-reviews mainly uses a keyword-based search to extract a list of review articles from arXiv's various categories on machine learning and artificial intelligence.

Links

Setup

In a new Python 3.9 virtual environment or container, run make install from the project's directory.

Usage commands

refresh

Running python -m arxivmlrev refresh will rerun the full online search and write the results to data/articles.csv and data/articles.md.

Use git to discern whether the diff of this updated CSV file looks acceptable. If the CSV file is smaller for any reason, it means the search query failed, in which case it should be rerun. This command should not be run excessively as it burdens the arXiv search server.

If there is any extraneous new entry in data/articles.csv, update either arxivmlrev/_config/articles.csv and/or arxivmlrev/_config/terms.csv with a new blacklist entry. This is expected to be be done rarely. Blacklisted entries are those with Presence = 0. Before committing these updated configuration files to revision control, consider running scripts/sort_config_articles.py and/or scripts/sort_config_terms.py respectively. If a configuration file was updated, rerun the command. Note that a sufficiently longer query can very possibly lead to arXiv returning incomplete results, and this will require a rearchitecture of the search.

refresh-and-publish

Running python -m arxivmlrev refresh-and-publish will refresh and also conditionally publish the results. Specifically, if the data/results.csv file changed but didn't decrease in its number of rows, the command will publish the written markdown file to GitHub per the GitHub-specific configuration in config.py. In this configuration file, refer to parameters starting with the prefix GITHUB_. The environment variable GITHUB_ACCESS_TOKEN is also required.

write-feed

Running python -m arxivmlrev write-feed will perform an online search to write the XML file data/feed.xml. This file is excluded from git.

write-md

Running python -m arxivmlrev write-md will perform an offline refresh of the markdown file data/articles.md from data/articles.csv.

publish-md

Running python -m arxivmlrev publish-md will publish the markdown file data/articles.md to GitHub. This requires GitHub-specific configuration in config.py. In this configuration file, refer to parameters starting with the prefix GITHUB_. The environment variable GITHUB_ACCESS_TOKEN is also required.

Deployment

Serverless deployment of the RSS feed to Google Cloud Functions is configured. It requires the following files:

requirements.txt
main.py (having callable serve(request: flask.Request) -> Tuple[bytes, int, Dict[str, str]])

Deployment version updates are not automated. They can be performed manually by editing and saving the function configuration.

These deployment links require access:

To do

By default, run an incremental update, and provide an option to do a full rerun. An incremental update assumes an unchanged configuration. This requires query results to be sorted by lastUpdatedDate.

Name		Name	Last commit message	Last commit date
Latest commit History 691 Commits
archived/scripts		archived/scripts
arxivmlrev		arxivmlrev
data		data
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
main.py		main.py
requirements-dev.in		requirements-dev.in
requirements.in		requirements.in
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

arxiv-ml-reviews

Links

Setup

Usage commands

refresh

refresh-and-publish

write-feed

write-md

publish-md

Deployment

To do

About

Releases

Packages

Languages

License

ml-feeds/arxiv-ml-reviews

Folders and files

Latest commit

History

Repository files navigation

arxiv-ml-reviews

Links

Setup

Usage commands

refresh

refresh-and-publish

write-feed

write-md

publish-md

Deployment

To do

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages