BBC-News-Classification-using-hugging-face

classify news into five categories: business, entertainment, politics, sport, and tech.

Introduction:
The task of the project was to classify news articles into five categories: business, entertainment, politics, sport, and tech.
The BBC News dataset was used for this task.
Preprocessing:
The dataset was loaded using the datasets library, and the text was cleaned by removing punctuations and stopwords.
The text was then tokenized using the Hugging Face tokenizer and the labels were one-hot encoded. The tokenized articles were split into training and validation sets, and the model was fine-tuned on the training set.
Architecture and Fine-tuning:
The pre-trained BERT model was used as the architecture for the text classification task.
The model was fine-tuned on the tokenized articles using the Adam optimizer, a learning rate of 2e-5, and a batch size of 32. The model was trained for 5 epochs, and a checkpoint was saved after each epoch.
Evaluation:
The trained model was evaluated on the validation set using accuracy, precision, recall, and F1-score metrics.
The model achieved an accuracy of 97.7%, precision of 97.9%, recall of 97.7%, and an F1-score of 97.7%.
Discussion:
The model achieved high accuracy on the validation set, indicating that it is performing well on this particular dataset.
However, it is possible that the model may not perform as well on other datasets or real-world data. Possible ways to improve the model could be to increase the size of the training set, use a different pre-trained model architecture, or try different hyperparameters.
Sample Predictions:
Here are a few sample predictions made by the trained model:
Text: "The new iPhone is set to be released next month."
Predicted label: tech
Text: "The government has proposed a new tax policy."
Predicted label: politics
Text: "The latest movie from Steven Spielberg has received mixed reviews."
Predicted label: entertainment
Text: "The Manchester United soccer team won the game yesterday."
Predicted label: sport
Text: "The company has announced record profits for the year."
Predicted label: business
These predictions demonstrate that the model is able to accurately classify news articles into their corresponding categories.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
UniAcco_assign.ipynb		UniAcco_assign.ipynb
dataset.csv		dataset.csv
unniacco asignment.pdf		unniacco asignment.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BBC-News-Classification-using-hugging-face

About

Releases

Packages

Languages

priyansh4320/BBC-News-Classification-using-hugging-face

Folders and files

Latest commit

History

Repository files navigation

BBC-News-Classification-using-hugging-face

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages