Skip to content
This repository has been archived by the owner on Jun 28, 2024. It is now read-only.

Commit

Permalink
Added final draft
Browse files Browse the repository at this point in the history
  • Loading branch information
dmnkf committed Jun 7, 2024
1 parent 0dd3346 commit 0004f5b
Showing 1 changed file with 26 additions and 4 deletions.
30 changes: 26 additions & 4 deletions notebooks/main.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@
"\n",
"plt.style.use('ggplot')"
],
"execution_count": null,
"outputs": []
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -271,8 +271,8 @@
"neg_reviews_text = \" \".join(eda_df[eda_df['label'] == \"negative\"]['content'].values)\n",
"generate_word_cloud(neg_reviews_text, \"Word Cloud for Negative Reviews\")"
],
"execution_count": null,
"outputs": []
"outputs": [],
"execution_count": null
},
{
"metadata": {},
Expand Down Expand Up @@ -630,6 +630,18 @@
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"Looking at the individual splits we can see a distinct difference between 0.25 split and the other ones. The 0.25 split has a very high true positive and false positive count, indicating that the model with little training data appears to show some kind of bias towards classifying reviews as positive.\n",
"\n",
"As soon as we increase training data size this bias appears to flip. For 0.5 and 0.75 we see favoritism towards classifying reviews as negative. This indicates that the decision threshold of the model with little training data is not optimal and the model is not able to generalize well.\n",
"\n",
"Only with using the 1.0 split we see the first signs of the model being able to pick up the differences between the classes. This is also reflected in the confusion matrix where the true positive and true negative counts are much higher than the false positive and false negative counts. "
],
"id": "cf00c0f2aa69ac80"
},
{
"cell_type": "markdown",
"id": "23d7a8e1",
Expand Down Expand Up @@ -675,6 +687,16 @@
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"With the difference of the model performing better than the transfer learning model, we can see that the exact same pattern as with the transfer learning model. The 0.25 split has a very high true positive and false positive count, indicating that the model with little training data appears to show some kind of bias towards classifying reviews as positive.\n",
"\n",
"Only with using the 1.0 split we see the first signs of the model being able to pick up the differences between the classes. This is also reflected in the confusion matrix where the true positive and true negative counts are much higher than the false positive and false negative counts. "
],
"id": "41820dcbed0a3e98"
},
{
"cell_type": "markdown",
"id": "22cfcf6e",
Expand Down

0 comments on commit 0004f5b

Please sign in to comment.