From 036232c5bb3f404f5f658b5aa37f524d3b3c3704 Mon Sep 17 00:00:00 2001 From: Vagish Vela Date: Wed, 15 Feb 2023 08:57:10 -0500 Subject: [PATCH 1/2] Fix broken link to May 2015 reddit comment dump --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index ccec1a4..dc2e024 100644 --- a/README.md +++ b/README.md @@ -102,9 +102,10 @@ Alphabetical list of free/public domain datasets with text data for use in Natur * [Personae Corpus](http://www.clips.uantwerpen.be/datasets/personae-corpus): collected for experiments in Authorship Attribution and Personality Prediction. It consists of 145 Dutch-language essays by 145 different students. (on request) -* [Reddit Comments](https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/): every publicly available reddit comment as of july 2015. 1.7 billion comments (250 GB) +* [Reddit Comments](https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_ +_comment/): every publicly available reddit comment as of july 2015. 1.7 billion comments (250 GB) -* [Reddit Comments (May ‘15) [Kaggle]](https://www.kaggle.com/reddit/reddit-comments-may-2015): subset of above dataset (8 GB) +* [Reddit Comments (May ‘15) [Kaggle]](https://www.kaggle.com/datasets/kaggle/reddit-comments-may-2015): subset of above dataset (8 GB) * [Reddit Submission Corpus](https://www.reddit.com/r/datasets/comments/3mg812/full_reddit_submission_corpus_now_available_2006/): all publicly available Reddit submissions from January 2006 - August 31, 2015). (42 GB) From 16c3f8a1161628f0299082f84e555686ea9dcfcf Mon Sep 17 00:00:00 2001 From: Vagish Vela Date: Wed, 15 Feb 2023 08:58:33 -0500 Subject: [PATCH 2/2] Undo change to different link --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index dc2e024..52ba869 100644 --- a/README.md +++ b/README.md @@ -102,8 +102,7 @@ Alphabetical list of free/public domain datasets with text data for use in Natur * [Personae Corpus](http://www.clips.uantwerpen.be/datasets/personae-corpus): collected for experiments in Authorship Attribution and Personality Prediction. It consists of 145 Dutch-language essays by 145 different students. (on request) -* [Reddit Comments](https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_ -_comment/): every publicly available reddit comment as of july 2015. 1.7 billion comments (250 GB) +* [Reddit Comments](https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/): every publicly available reddit comment as of july 2015. 1.7 billion comments (250 GB) * [Reddit Comments (May ‘15) [Kaggle]](https://www.kaggle.com/datasets/kaggle/reddit-comments-may-2015): subset of above dataset (8 GB)