Non-authentic Hadith Corpus
Arabic Hadith corpus
It contains 452,624 words from different lesser-known Hadith books
It also included several annotated Hadith books, which help to determine the switch points between the Isnad, the Matan,and the comment to provide a ground truth.
Some of these books have both Hadiths (authentic and NAH), while others only contain NAH.
In NAH_Contents.csv file, you will find the list of all Hadith books in this corpus.
The annotating process was done to determine eight primary features for each Hadith in this corpus:
No.: The Hadith reference number.
Full Hadith: The Hadith as it appears in the book without annotations
Isnad: The chain of narrators.
Matan: The act of the Prophet Muhammad.
Authors Comments: The author describes the authenticity of each Hadith.
Hadith Type: The Hadith Type (Maqtu` مقطوع, Mawquf موقوف and Marfoʻ مرفوع) or Hadith degree (ضعيف, موضوع and so on).
Authenticity: Whether this Hadith is authentic or non-authentic.
Topic: The chapter title.
Tarmom T, Atwell E, Alsalka MA. 2020. Non-authentic Hadith Corpus: Design and Methodology. International Journal on Islamic Applications in Computer Science And Technology. 13-19 8.3