Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Add user feedback for responses #287

Merged
merged 4 commits into from
Feb 12, 2024

Conversation

azaylamba
Copy link
Contributor

@azaylamba azaylamba commented Dec 24, 2023

Issue #151
Description of changes: Added basic feedback mechanism for responses generated by the chatbot.
As part of this change, we have stored the user feedback in S3 bucket.

We are storing the feedback in the following format:
{
"feedbackId": "",
"sessionId": "",
"userId": "",
"key": "",
"prompt": "",
"completion": "",
"model": "",
"feedback": "1/0",
"createdAt": ""
}

This feedback can be used by AWS Athena/Glue etc to do analysis and also to train the model if required. We are keeping one object per feedback in S3 'STANDARD_IA' class.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Added basic feedback mechanism for responses generated by the chatbot.
The feedbacks are stored in DynamoDB which can be queried to do analysis as required by admin users.
In future we can add a UI page to display the feedbacks, but for now these are being stored and manual analysis would be required.
The feedbacks are not adding to the learning of the chatbot.
@azaylamba
Copy link
Contributor Author

@massi-ang could you please have a look?

@bigadsoleiman
Copy link
Collaborator

bigadsoleiman commented Jan 16, 2024

Hi @azaylamba thanks for your submission. I really like this feature as it opens up to offline analyses and more importantly fine tuning of models where possible and I think this should be the end goal.

Though, I think this should be stored in a form of dataset that then can be used for fine-tuning. See Bedrock's prepare for finetuning docs which requires dataset in the following format:

{"prompt": "<prompt text>", "completion": "<generated text>"}

Now, in this case, we can bring on more fields such as

{
   "prompt": "<prompt text>", 
   "completion": "<generated text>", 
   "model" ..., 
   "feedback": 0/1,
   "timestamp":  ..., 
   ...any other data of interest
}

and stored in S3 maybe with a Infrequent Access storage class.

The advantages would be:

  • Cheaper option
  • Admins or systems can leverage Athena to query this feedbacks
  • Glue can be used to re-format this feedbacks for model-specific finetuning
  • Given the above we could then leverage QuickSight and embed rich dashboard in the chatbot without coding custom charts

You can even store 1 object per feedback, Athena, Glue/EMR will do the heavy-lifting to merge them when needed.

@bigadsoleiman bigadsoleiman self-requested a review January 16, 2024 14:17
@azaylamba
Copy link
Contributor Author

@bigadsoleiman Thanks for your input. I will try the approach you suggested.

…mal information. This was good for beta phase but was not useful if we want to do analysis on it using other services like Athena/Glue etc.

As part of this change, we have changed the mechanism and the feedback is being stored in S3 bucket now.
We are storing the feedback in the following format:
{
"feedbackId": "<feedbackId>",
"sessionId": "<sessionId>",
"userId": "<userId>",
"key": "<key>",
"prompt": "<question>",
"completion": "<generated text>",
"model": "<model name>",
"feedback": "thumbsUp/thumbsDown",
"createdAt": "<timestamp>"
}

This feedback can be used by AWS Athena/Glue etc to do analysis and also to train the model if required. We are keeping one object per feedback in S3 'STANDARD_IA' class.
@azaylamba
Copy link
Contributor Author

@bigadsoleiman I have modified the code to store the user feedback in S3 bucket as per the suggested format. Please have a look.

Copy link
Collaborator

@bigadsoleiman bigadsoleiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!! going in right direction.

One thing I would change is the value of the feedback, I would avoid strings such as thumbsUp/thumbsDown as it's easier in a context of EMR, Athena to deal with numbers especially when it comes to aggregations so I'd rather use 0/1

Unless there is a specific reason I'm missing for using thumbsUp/thumbsDown, let me know.

Key=feedbackId,
Body=json.dumps(item),
ContentType="application/json",
StorageClass='STANDARD_IA',
Copy link
Collaborator

@bigadsoleiman bigadsoleiman Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's handle storage class in CDK through lifecycle rules instead.

something along these line should ensure that objects are transitioned to the right class and it's easier to handle more complex rules in future

    const userFeedbackBucket = new s3.Bucket(this, 'UserFeedbackBucket', {
       ...
      lifecycleRules: [{
        transitions: [{
          storageClass: s3.StorageClass.INFREQUENT_ACCESS,
          transitionAfter: cdk.Duration.days(0), // Transition immediately after creation
        }],
      }],
    });
  }

Copy link
Contributor Author

@azaylamba azaylamba Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The minimum allowed value of transitionAfter days is 30. So we cannot use transitionAfter: cdk.Duration.days(0),
The recommended approach to directly store the objects in STANDARD_IA is setting the storage class at object level.

@@ -39,7 +40,31 @@ export class ChatBotS3Buckets extends Construct {
],
});

const userFeedbackBucket = new s3.Bucket(this, "UserFeedbackBucket", {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's set the lifecycle rules here for simplicity and having them handled in one place (instead of object creation)

    const userFeedbackBucket = new s3.Bucket(this, 'UserFeedbackBucket', {
       ...
      lifecycleRules: [{
        transitions: [{
          storageClass: s3.StorageClass.INFREQUENT_ACCESS,
          transitionAfter: cdk.Duration.days(0), // Transition immediately after creation
        }],
      }],
    });
  }

transferAcceleration: true,
enforceSSL: true,
serverAccessLogsBucket: logsBucket,
cors: [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why CORS are enabled here? We are not dealing with objects directly from web clients

blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,
removalPolicy: cdk.RemovalPolicy.DESTROY,
autoDeleteObjects: true,
transferAcceleration: true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we need transfer acceleration here right?

Changed the user feedback type from string to number as number are easy to handle in Athena and other such tools.
@azaylamba
Copy link
Contributor Author

@bigadsoleiman Addressed the review comments related to cors, transfer acceleration and feedback type.

@bigadsoleiman bigadsoleiman merged commit 6f0ad46 into aws-samples:main Feb 12, 2024
@azaylamba
Copy link
Contributor Author

thanks for your inputs @bigadsoleiman

lloydclowes pushed a commit to lloydclowes/gen-ai-playground that referenced this pull request Oct 5, 2024
Enhancement: Add user feedback for responses
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants