Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python Rebuff SDK #88

Merged
merged 7 commits into from
Jan 10, 2024
Merged

Python Rebuff SDK #88

merged 7 commits into from
Jan 10, 2024

Conversation

mehrinkiani
Copy link
Member

This PR adds a first class support for Python SDK for Rebuff.

Please note you will need the following credentials for testing the PR:
pinecone_environment
pinecone_apikey
pinecone_index
openai_model
openai_apikey

I will open new issues for creating support for Chroma and for adding tests for Python SDK.

@mehrinkiani mehrinkiani changed the base branch from main to python-sdk December 5, 2023 22:56
Copy link
Contributor

@ristomcgehee ristomcgehee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Mehrin, I don't think we've interacted yet, but I'm Risto and I've been contributing a little to this project. I think that's great that you're adding a Python SDK! I thought I'd go through and add some comments for where I think the code could be slightly improved. Let me know if you have any questions or clarifications on my review.

python-sdk/rebuff/rebuff_pythonsdk.py Outdated Show resolved Hide resolved
python-sdk/rebuff/rebuff_pythonsdk.py Outdated Show resolved Hide resolved
python-sdk/rebuff/rebuff_pythonsdk.py Outdated Show resolved Hide resolved
python-sdk/rebuff/rebuff_pythonsdk.py Outdated Show resolved Hide resolved
python-sdk/rebuff/rebuff_pythonsdk.py Outdated Show resolved Hide resolved
python-sdk/rebuff/rebuff_pythonsdk.py Outdated Show resolved Hide resolved
python-sdk/rebuff/rebuff_pythonsdk.py Outdated Show resolved Hide resolved
python-sdk/rebuff/detect_pi_vectorbase.py Outdated Show resolved Hide resolved
python-sdk/rebuff/detect_pi_vectorbase.py Outdated Show resolved Hide resolved
python-sdk/rebuff/detect_pi_vectorbase.py Outdated Show resolved Hide resolved
@mehrinkiani
Copy link
Member Author

Hi Mehrin, I don't think we've interacted yet, but I'm Risto and I've been contributing a little to this project. I think that's great that you're adding a Python SDK! I thought I'd go through and add some comments for where I think the code could be slightly improved. Let me know if you have any questions or clarifications on my review.

Hi Risto, Thank you so much for your review and valuable contributions to the project. I have tried to work on your review. The main changes include:

  1. Updated the API call to call Pinecone directly in log_leakage function.
  2. Updated type hints.
  3. Changed class name from Rebuff to RebuffPython to avoid possible conflict.

python-sdk/rebuff/rebuff_pythonsdk.py Outdated Show resolved Hide resolved
python-sdk/rebuff/rebuff_pythonsdk.py Outdated Show resolved Hide resolved
python-sdk/rebuff/rebuff_pythonsdk.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ristomcgehee ristomcgehee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments. I have a few more... 😁


rebuff_vector_score = 0
similarity_threshold = 0.3
vector_store._text_key = "input"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initialization of Pinecone client requires a _text_key field. I am not sure what is the _text_key exactly for though I have added the reference as a comment in code as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that link was helpful. _text_key is being use as the key for a metadata field that always gets added to a vector when it's inserted. If you don't specify it, it defaults to "text" as the metadata key. In the JS SDK, we're not specifying any value, so it's using the default value of "text". Since we want Python and JS to match, I think you should remove this line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for elaborating on it, it makes more sense now. I think _text_key is also being used when searching for similar documents. Hence, if I remove the said line, I get the error: Found document with no 'text' key. Skipping.

@seanpmorgan can probably shed more light on it but I think the previous/earlier entries in Pinecone have been made against self._text_key = 'input'

Copy link
Contributor

@ristomcgehee ristomcgehee Jan 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. In the particular Pinecone database you're testing against, the entries have a metadata field of "input" but not "text", so the results all get skipped unless you set self._text_key = 'input'.

I understand why Langchain has this behavior (when querying a vector database, you almost always want the text content of the vector), but in our use case, we're not actually doing anything with the text content of the vector, just the score. I think the correct way forward at this point is to submit a PR to Langchain to have the option to not skip results when the metadata doesn't contain the text key (I'll submit that PR).

I think you can still move forward with this PR in the meantime by instead of doing self._text_key = 'input', do:

vector_store = Pinecone.from_existing_index(index, openai_embeddings, text_key="input")

in init_pinecone. Whenever possible, it's better to set attributes via constructor or function arguments instead of directly on the instance (in Python, it's a convention that attributes that start with an underscore should not be accessed outside of that class).

I looked at the JS code, and Langchain does not skip vectors when this.textKey is not found in the metadata, so we don't need to make any JS changes in regards to this issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for elaborating on it, and for submitting the Langchain PR. I have updated init_pinecone as you suggested.

python-sdk/rebuff/rebuff_pythonsdk.py Outdated Show resolved Hide resolved
python-sdk/rebuff/rebuff_pythonsdk.py Outdated Show resolved Hide resolved
python-sdk/rebuff/rebuff_pythonsdk.py Outdated Show resolved Hide resolved
python-sdk/rebuff/rebuff_pythonsdk.py Outdated Show resolved Hide resolved
python-sdk/rebuff/rebuff_pythonsdk.py Outdated Show resolved Hide resolved
python-sdk/rebuff/rebuff_pythonsdk.py Outdated Show resolved Hide resolved
python-sdk/rebuff/rebuff_pythonsdk.py Outdated Show resolved Hide resolved
class RebuffSdk:
def __init__(
self,
openai_model: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to match the JS SDK and make this an optional parameter with a default of "gpt-3.5-turbo".

@@ -4,7 +4,7 @@
name="rebuff",
version="0.0.5",
packages=find_packages(),
install_requires=["pydantic>=1", "requests<3,>=2"],
install_requires=["pydantic>=1", "requests<3,>=2", "langchain>=0.0.100"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs pinecone-client and openai. You could then remove langchain and openai from "dev" below.

I also got an error about not having tiktoken installed. I believe you'll need to add that as well.

tiktoken error

Traceback (most recent call last):
  File "/home/risto/repos/protectai/rebuff/python-sdk/.venv/lib/python3.11/site-packages/langchain_community-0.0.9-py3.11.egg/langchain_community/embeddings/openai.py", line 450, in _get_len_safe_embeddings
    import tiktoken
ModuleNotFoundError: No module named 'tiktoken'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/risto/repos/protectai/rebuff/python-sdk/detect_rebuff.py", line 25, in <module>
    print(rb.detect_injection(user_input))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/risto/repos/protectai/rebuff/python-sdk/rebuff/sdk.py", line 87, in detect_injection
    vector_score = detect_pi_using_vector_database(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/risto/repos/protectai/rebuff/python-sdk/rebuff/detect_pi_vectorbase.py", line 26, in detect_pi_using_vector_database
    results = vector_store.similarity_search_with_score(input, top_k)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/risto/repos/protectai/rebuff/python-sdk/.venv/lib/python3.11/site-packages/langchain_community-0.0.9-py3.11.egg/langchain_community/vectorstores/pinecone.py", line 175, in similarity_search_with_score
    self._embed_query(query), k=k, filter=filter, namespace=namespace
    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/risto/repos/protectai/rebuff/python-sdk/.venv/lib/python3.11/site-packages/langchain_community-0.0.9-py3.11.egg/langchain_community/vectorstores/pinecone.py", line 93, in _embed_query
    return self._embedding.embed_query(text)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/risto/repos/protectai/rebuff/python-sdk/.venv/lib/python3.11/site-packages/langchain_community-0.0.9-py3.11.egg/langchain_community/embeddings/openai.py", line 697, in embed_query
    return self.embed_documents([text])[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/risto/repos/protectai/rebuff/python-sdk/.venv/lib/python3.11/site-packages/langchain_community-0.0.9-py3.11.egg/langchain_community/embeddings/openai.py", line 668, in embed_documents
    return self._get_len_safe_embeddings(texts, engine=engine)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/risto/repos/protectai/rebuff/python-sdk/.venv/lib/python3.11/site-packages/langchain_community-0.0.9-py3.11.egg/langchain_community/embeddings/openai.py", line 452, in _get_len_safe_embeddings
    raise ImportError(
ImportError: Could not import tiktoken python package. This is needed in order to for OpenAIEmbeddings. Please install it with `pip install tiktoken`.

"""
openai.api_key = api_key

completion = openai.ChatCompletion.create(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I installed the latest openai version, and I got this error when executing:

Traceback (most recent call last):
  File "/home/risto/repos/protectai/rebuff/python-sdk/detect_rebuff.py", line 25, in <module>
    print(rb.detect_injection(user_input))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/risto/repos/protectai/rebuff/python-sdk/rebuff/sdk.py", line 97, in detect_injection
    model_response = call_openai_to_detect_pi(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/risto/repos/protectai/rebuff/python-sdk/rebuff/detect_pi_openai.py", line 71, in call_openai_to_detect_pi
    completion = openai.ChatCompletion.create(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/risto/repos/protectai/rebuff/python-sdk/.venv/lib/python3.11/site-packages/openai/lib/_old_api.py", line 39, in __call__
    raise APIRemovedInV1(symbol=self._symbol)
openai.lib._old_api.APIRemovedInV1: 

You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742

Instead of using a deprecated API, can you use the updated API for openai?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for highlighting this. I have updated the code for updated openai API. Also, I noticed that the class OpenAIEmbeddings is deprecated in v0.1.0 and might get removed in v0.2.0 so also updated code with the alternative they have suggested: langchain_openai.OpenAIEmbeddings . I have also added langchain_openai in setup.py under install_requires

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great!


rebuff_vector_score = 0
similarity_threshold = 0.3
vector_store._text_key = "input"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ristomcgehee
Copy link
Contributor

This PR looks good from my perspective. Thanks for all your work on it, Mehrin!

@mehrinkiani mehrinkiani changed the base branch from python-sdk to main January 9, 2024 21:34
Copy link
Member

@seanpmorgan seanpmorgan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks @mehrinkiani and @ristomcgehee for the review!

@seanpmorgan seanpmorgan merged commit a8aba0a into main Jan 10, 2024
@seanpmorgan seanpmorgan deleted the initial-python-sdk branch January 10, 2024 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants