Article Feeds

Get articles from tech-blog's RSS feeds. The library is using feedparser as dependency.

Blog list:

Blog	Object (class)	Argument	Mandatory?
AWS Blog	`Aws()`	`topic`	Yes
Microsoft Azure Blog	`Azure()`	`category`	No
Comparitech	`Comparitech()`	-	-
DataHen	`DataHen()`	-	-
Dev Community	`Devto()`	-	-
DZone	`Dzone()`	`category`	No
Hashnode	`Hashnode()`	`tag`	Yes
Machine Learning Mastery	`MachineLearningMastery()`	`category`	No
O'Reilly	`OReilly()`	`topic`	No
Medium	`Medium()`	`publication`, `tag`	Yes (either)
Meta (Facebook)	`Meta()`	`category`	No
Salesforce Engineering Blog	`Salesforce()`	-	-
Software Engineering Daily	`SoftwareEngDaily()`	`category`	No
Toptal Blog	`Toptal()`	-	-
Uber Development Blog	`Uber()`	-	-

Listing on progress (spreadsheet)

Installation

Clone or download repository and save to working directory:

git clone https://github.com/tesserakh/articlefeeds.git

Usage

Each blog has its own BlogFeed object which will pull the XML file from its website. When executed, it will give us raw XML structured by feedparser.

from articlefeeds import Azure

azure = Azure()
print(azure.rawfeed)

In order to make it simpler and ready to use, leave only the fields needed, parse() method is provided for this work. The result will store in feed.

azure.parse()
print(azure.feed)

For some blogs, arguments can be used to filter posts according to category or tag. Others even require using tag.

from articlefeeds import Aws

aws = Aws("database").parse()
print(aws.feed)

More Example

AWS blog, Azure blog, Hashnode, DZone, etc. can use category, tag, or topic as an input. See blog list table for detail. Below is sample for AWS blog:

import articlefeeds
import json

aws_topics = ["storage", "database"]
for topic in aws_topics:
    feed = articlefeeds.Aws(topic=topic).parse()
    filename = articlefeeds.create_filename(topic.lower(), prefix="aws")
    filepath = articlefeeds.create_storage_path(filename, path="data")
    with open(filepath, "w") as fout:
        json.dump(feed.feed, fout)

Medium can use a publication name, a tag, or both (a tag under certain publication) as argument:

import articlefeeds
import json

medium_pubs = [
    {"publication": "airbnb-engineering", "tag": None},
    {"publication": "Lyft Engineering", "tag": "Data"},
    {"publication": "medium engineering"},
    {"publication": None, "tag": "web scraping"},
    {"tag": "python"},
]
for pub in medium_pubs:
    feed = articlefeeds.Medium(**pub).parse()
    filename = articlefeeds.create_filename(pub, prefix="medium")
    filepath = articlefeeds.create_storage_path(filename, path="data")
    with open(filepath, "w") as fout:
        json.dump(feed.feed, fout)

Others are need to manually filter using given tags:

import articlefeeds
import json

def fetch(tags, feeds):
    food = []
    for tag in tags:
        for i, feed in enumerate(feeds):
            avail_tags = feed.get("tags")
            if avail_tags is not None:
                avail_tags = [tag.lower() for tag in avail_tags]
                if tag.lower() in avail_tags:
                    food.append(feed)
                    del feeds[i]
    return food

toptal_tags = [
    "artificial-intelligence",
    "big-data",
    "data-science",
    "api-development",
    "database",
    "json",
    "r",
    "python",
]
toptal_feeds = articlefeeds.Toptal().parse()
toptal_filter = fetch(toptal_tags, toptal_feeds.feed)

toptal_filepath = articlefeeds.create_storage_path("toptal.json", path="data")
with open(toptal_filepath, "w") as fout:
    json.dump(toptal_filter, fout)

Result

Sample result using tag "python" for Medium blog:

[
    {
        "title": "Let’s Practice Python in AWS",
        "author": "Cristin Jenkins",
        "link": "https://medium.com/@cristinj/lets-practice-python-in-aws-ca27379d786c",
        "published": "2023-06-19 14:10:37",
        "tags":
        [
            "aws",
            "linux",
            "github",
            "cloud9",
            "python"
        ],
        "source": "Python on Medium"
    },
    ...
]

Published is in UTC/GMT.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
__init__.py		__init__.py
blogfeed.py		blogfeed.py
hashnode.py		hashnode.py
medium.py		medium.py
toptal.py		toptal.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Article Feeds

Installation

Usage

More Example

Result

About

Releases

Packages

Languages

tesserakh/articlefeeds

Folders and files

Latest commit

History

Repository files navigation

Article Feeds

Installation

Usage

More Example

Result

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages