Kendra context summary encoding/decoding issues. #700

cboin1996 · 2024-03-12T21:10:10Z

Describe the bug
I have a QnaBot that has a kendra index loading CSV's and PDF documents from an S3 bucket.
The bot is integrated with bedrock via the LLM plugin. The summarized responses include the Context dropdown. When I click the Context drop-down, it seems that â€¢'s are appearing in place of the proper characters.

I quickly read about encoding's here: https://stackoverflow.com/questions/2477452/%C3%A2%E2%82%AC-showing-on-page-instead-of

When checking the FulfillmentLambda logs, it appears Kendra is in fact returning it's queries with these
characters, so it gets passed along through bot fulfilliment, to the llm lambda, and then back to the client interface.

To Reproduce

Create a Kendra index, upload a PDF containing & or ’ and perhaps the issue will re-produce.

Expected behavior
I wonder if it is possible to intercept these symbols, or re-encode the queries produced by kendra in bot-fulfillment with UTF-8 before handing it off to the llm lambdas.

Please complete the following information about the solution:

Version: v5.5.0
Region: [e.g. ca-central-1]
Was the solution modified from the version published on this repository? Yes.

These changes were merged via a yaml merge.

Resources:
  # Modified UserPool including custom welcome message to point to the chat client.
  UserPool:
    Type: AWS::Cognito::UserPool
    Properties:
      UserPoolName:
        Fn::Join:
          - '-'
          - - UserPool
            - Ref: AWS::StackName
      AdminCreateUserConfig:
        AllowAdminCreateUserOnly:
          Fn::If:
            - AdminSignUp
            - true
            - false
        InviteMessageTemplate:
          EmailMessage:
            Fn::Sub: |
              <p>Hello {username},
              <p>Welcome to QnABot! Your temporary password is:
              <p>     {####}
              <p>
              <p>When the CloudFormation stack is COMPLETE, use the link belows to interact with qnabot.
              <p>
              <p>To access the chat client, use the below link:
              <p>     ${ApiUrl.Name}/pages/client
              <p>
              <p>The below link is only accessible by admins:
              <p>     ${ApiUrl.Name}/pages/designer
              <p>
              <p>Good luck!
              <p>QnABot (www.amazon.com/qnabot)
          EmailSubject: Welcome to QnABot!
      AliasAttributes:
        - email
      AutoVerifiedAttributes:
        - email
      Schema:
        - Required: true
          Name: email
          AttributeDataType: String
          Mutable: true
      LambdaConfig:
        CustomMessage:
          Fn::GetAtt:
            - MessageLambda
            - Arn
        PreSignUp:
          Fn::GetAtt:
            - SignupLambda
            - Arn

Outputs:

  # Additional output from qnabot-addons.yaml that creates
  # an output for the api gateway.
  QnaBotAddonApiGatewayId:
    Description: Id of the QnaBot ApiGateway
    Value: !Ref API
    Export:
      Name: !Sub "qnabot-addon-api-gateway-api-id"

  ## create an output for the api gateway stage name
  QnaBotAddonApiGatewayStageName:
    Description: Name of Qnabot ApiGateway Stage
    Value: !Ref Stage
    Export:
      Name: !Sub "qnabot-addon-api-gateway-stage-name"

If the answer to the previous question was yes, are the changes available on GitHub? No
Have you checked your service quotas for the services this solution uses? No, but this is unrelated.
Were there any errors in the CloudWatch Logs? No Errors.

Screenshots
If applicable, add screenshots to help explain your problem (please DO NOT include sensitive information).

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

fhoueto-amz · 2024-03-13T00:47:48Z

Hi @cboin1996
Thanks for reporting this.
Based on my initial read of this, it seems that the encoding issue is already at the Kendra level, therefore I do not see this as a bug in QnA but potentially as an enhancement. However we will look into this and revert back.

cboin1996 · 2024-03-14T14:50:43Z

Yea it does look to be at the Kendra level, but I still wonder if it could be addressed in the bot fulfillment lambda.

One potential idea I had, is to force specific encoding's within the bot fulfillment lambda.

Add a parameter to the qnabot designer called 'DecodeKendraResults'
Decode the search results in bot fulfillment prior to passing it off to the client interface or passing to LLM if LLM plugin is enabled.

ex:

EncodeKendraResults = 'cp1252'
DecodeKendraResults = 'utf-8'

in the bot-fulfillment lambda, do something like

'â€¢'.encode('cp1252').decode('utf-8')

but instead of just 'â€¢', you would have the kendra search results.

When I run the above code I get the correct character - '•'.

The caveat is this only would work for a single document encoding type.. so maybe I need to figure out if I can specify document encodings within Kendra so that when the fulfillment lambda call's kendra the document is forced to be encoded/decoded with the right encoding that matches the document.

abhirpat · 2024-09-06T01:25:36Z

Hi @cboin1996, have you tried implementing this logic in post-processing lambda hook which could improve this response prior to passing it off to the client interface? With post-processing lambda hooks, you will be able to customize output from QnABot as per your needs.

ajaysw · 2024-10-04T14:41:56Z

Hi @cboin1996 - I would suggest add that to post processing lambda hook and reference that lambda in the content designer under the field "LAMBDA_POSTPROCESS_HOOK" - you can catch all special characters as needed. Please close this issue if that resolves it. Thank you.

ajaysw · 2024-10-16T19:40:57Z

Closing this issue, thank you.

cboin1996 added the bug label Mar 12, 2024

fhoueto-amz added enhancement and removed bug labels Mar 15, 2024

abhirpat self-assigned this Sep 6, 2024

fhoueto-amz closed this as completed Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kendra context summary encoding/decoding issues. #700

Kendra context summary encoding/decoding issues. #700

cboin1996 commented Mar 12, 2024

fhoueto-amz commented Mar 13, 2024

cboin1996 commented Mar 14, 2024 •

edited

Loading

abhirpat commented Sep 6, 2024 •

edited

Loading

ajaysw commented Oct 4, 2024 •

edited

Loading

ajaysw commented Oct 16, 2024

Kendra context summary encoding/decoding issues. #700

Kendra context summary encoding/decoding issues. #700

Comments

cboin1996 commented Mar 12, 2024

fhoueto-amz commented Mar 13, 2024

cboin1996 commented Mar 14, 2024 • edited Loading

abhirpat commented Sep 6, 2024 • edited Loading

ajaysw commented Oct 4, 2024 • edited Loading

ajaysw commented Oct 16, 2024

cboin1996 commented Mar 14, 2024 •

edited

Loading

abhirpat commented Sep 6, 2024 •

edited

Loading

ajaysw commented Oct 4, 2024 •

edited

Loading