Parses a podcast RSS feed and returns easy to use object
Takes an opinionated view on what should be included so not everything is. The goal is to have the result be as normalized as possible across multiple feeds.
{
"title": "<Podcast title>",
"description": {
"short": "<Podcast subtitle>",
"long": "<Podcast description>"
},
"link": "<Podcast link (usually website for podcast)>",
"image": "<Podcast image>",
"language": "<ISO 639 language>",
"copyright": "<Podcast copyright>",
"updated": "<pubDate or latest episode pubDate>",
"explicit": "<Podcast is explicit, true/false>",
"categories": [
"Category>Subcategory"
],
"author": "<Author name>",
"owner": {
"name": "<Owner name>",
"email": "<Owner email>"
},
"episodes": [
{
"guid": "<Unique id>",
"title": "<Episode title>",
"description": "<Episode description>",
"explicit": "<Episode is is explicit, true/false>",
"image": "<Episode image>",
"published": "<date>",
"duration": 120,
"categories": [
"Category"
],
"enclosure": {
"filesize": 5650889,
"type": "audio/mpeg",
"url": "<mp3 file>"
}
}
]
}
yarn add node-podcast-parser
const parsePodcast = require('node-podcast-parser');
parsePodcast('<podcast xml>', (err, data) => {
if (err) {
console.error(err);
return;
}
// data looks like the format above
console.log(data);
});
node-podcast-parser
only takes care of the parsing itself, you'll need to download the feed first yourself.
Download the feed however you want, for instance using request
Example:
const request = require('request');
const parsePodcast = require('node-podcast-parser');
request('<podcast url>', (err, res, data) => {
if (err) {
console.error('Network error', err);
return;
}
parsePodcast(data, (err, data) => {
if (err) {
console.error('Parsing error', err);
return;
}
console.log(data);
});
});
yarn install
yarn run test
yarn install
yarn run cover
A lot of podcasts have the language set something like en
.
The spec requires the language to be ISO 639 so it will be convered to en-us
.
A non-English language will be lang-lang
such as de-de
.
The language is always lowercase.
Most content is left as it is but whitespace at beginning and end of strings is trimmed.
Unfortunately not all podcasts contain all properties. If so they are simply ommited from the output.
These properties include:
- feed TTL
- episode categories
- episode image
- etc
Episode categories are included as an empty array if the podcast doesn't contain any categories.
This module is specifically aimed at parsing RSS feeds and doesn't cater for more generic feeds from blogs etc.
Use node-feedparser