Python Rebuff SDK #88

mehrinkiani · 2023-12-05T22:45:55Z

This PR adds a first class support for Python SDK for Rebuff.

Please note you will need the following credentials for testing the PR:
pinecone_environment
pinecone_apikey
pinecone_index
openai_model
openai_apikey

I will open new issues for creating support for Chroma and for adding tests for Python SDK.

ristomcgehee

Hi Mehrin, I don't think we've interacted yet, but I'm Risto and I've been contributing a little to this project. I think that's great that you're adding a Python SDK! I thought I'd go through and add some comments for where I think the code could be slightly improved. Let me know if you have any questions or clarifications on my review.

python-sdk/rebuff/rebuff_pythonsdk.py

python-sdk/rebuff/detect_pi_vectorbase.py

mehrinkiani · 2024-01-03T18:57:17Z

Hi Mehrin, I don't think we've interacted yet, but I'm Risto and I've been contributing a little to this project. I think that's great that you're adding a Python SDK! I thought I'd go through and add some comments for where I think the code could be slightly improved. Let me know if you have any questions or clarifications on my review.

Hi Risto, Thank you so much for your review and valuable contributions to the project. I have tried to work on your review. The main changes include:

Updated the API call to call Pinecone directly in log_leakage function.
Updated type hints.
Changed class name from Rebuff to RebuffPython to avoid possible conflict.

python-sdk/rebuff/rebuff_pythonsdk.py

ristomcgehee

Thanks for addressing my comments. I have a few more... 😁

ristomcgehee · 2024-01-04T04:38:41Z

python-sdk/rebuff/rebuff_pythonsdk.py

+
+            rebuff_vector_score = 0
+            similarity_threshold = 0.3
+            vector_store._text_key = "input"


What's the purpose of this line?

The initialization of Pinecone client requires a _text_key field. I am not sure what is the _text_key exactly for though I have added the reference as a comment in code as well.

Ok, that link was helpful. _text_key is being use as the key for a metadata field that always gets added to a vector when it's inserted. If you don't specify it, it defaults to "text" as the metadata key. In the JS SDK, we're not specifying any value, so it's using the default value of "text". Since we want Python and JS to match, I think you should remove this line.

Thank you for elaborating on it, it makes more sense now. I think _text_key is also being used when searching for similar documents. Hence, if I remove the said line, I get the error: Found document with no 'text' key. Skipping.

@seanpmorgan can probably shed more light on it but I think the previous/earlier entries in Pinecone have been made against self._text_key = 'input'

Ah, I see. In the particular Pinecone database you're testing against, the entries have a metadata field of "input" but not "text", so the results all get skipped unless you set self._text_key = 'input'.

I understand why Langchain has this behavior (when querying a vector database, you almost always want the text content of the vector), but in our use case, we're not actually doing anything with the text content of the vector, just the score. I think the correct way forward at this point is to submit a PR to Langchain to have the option to not skip results when the metadata doesn't contain the text key (I'll submit that PR).

I think you can still move forward with this PR in the meantime by instead of doing self._text_key = 'input', do:

vector_store = Pinecone.from_existing_index(index, openai_embeddings, text_key="input")

in init_pinecone. Whenever possible, it's better to set attributes via constructor or function arguments instead of directly on the instance (in Python, it's a convention that attributes that start with an underscore should not be accessed outside of that class).

I looked at the JS code, and Langchain does not skip vectors when this.textKey is not found in the metadata, so we don't need to make any JS changes in regards to this issue.

Langchain PR: langchain-ai/langchain#15663

Thank you for elaborating on it, and for submitting the Langchain PR. I have updated init_pinecone as you suggested.

python-sdk/rebuff/rebuff_pythonsdk.py

ristomcgehee · 2024-01-07T02:50:04Z

python-sdk/rebuff/sdk.py

+class RebuffSdk:
+    def __init__(
+        self,
+        openai_model: str,


I think it would be good to match the JS SDK and make this an optional parameter with a default of "gpt-3.5-turbo".

ristomcgehee · 2024-01-07T04:13:13Z

python-sdk/setup.py

@@ -4,7 +4,7 @@
    name="rebuff",
    version="0.0.5",
    packages=find_packages(),
-    install_requires=["pydantic>=1", "requests<3,>=2"],
+    install_requires=["pydantic>=1", "requests<3,>=2", "langchain>=0.0.100"],


This also needs pinecone-client and openai. You could then remove langchain and openai from "dev" below.

I also got an error about not having tiktoken installed. I believe you'll need to add that as well.

tiktoken error

Traceback (most recent call last): File "/home/risto/repos/protectai/rebuff/python-sdk/.venv/lib/python3.11/site-packages/langchain_community-0.0.9-py3.11.egg/langchain_community/embeddings/openai.py", line 450, in _get_len_safe_embeddings import tiktoken ModuleNotFoundError: No module named 'tiktoken' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/risto/repos/protectai/rebuff/python-sdk/detect_rebuff.py", line 25, in <module> print(rb.detect_injection(user_input)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/risto/repos/protectai/rebuff/python-sdk/rebuff/sdk.py", line 87, in detect_injection vector_score = detect_pi_using_vector_database( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/risto/repos/protectai/rebuff/python-sdk/rebuff/detect_pi_vectorbase.py", line 26, in detect_pi_using_vector_database results = vector_store.similarity_search_with_score(input, top_k) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/risto/repos/protectai/rebuff/python-sdk/.venv/lib/python3.11/site-packages/langchain_community-0.0.9-py3.11.egg/langchain_community/vectorstores/pinecone.py", line 175, in similarity_search_with_score self._embed_query(query), k=k, filter=filter, namespace=namespace ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/risto/repos/protectai/rebuff/python-sdk/.venv/lib/python3.11/site-packages/langchain_community-0.0.9-py3.11.egg/langchain_community/vectorstores/pinecone.py", line 93, in _embed_query return self._embedding.embed_query(text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/risto/repos/protectai/rebuff/python-sdk/.venv/lib/python3.11/site-packages/langchain_community-0.0.9-py3.11.egg/langchain_community/embeddings/openai.py", line 697, in embed_query return self.embed_documents([text])[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/risto/repos/protectai/rebuff/python-sdk/.venv/lib/python3.11/site-packages/langchain_community-0.0.9-py3.11.egg/langchain_community/embeddings/openai.py", line 668, in embed_documents return self._get_len_safe_embeddings(texts, engine=engine) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/risto/repos/protectai/rebuff/python-sdk/.venv/lib/python3.11/site-packages/langchain_community-0.0.9-py3.11.egg/langchain_community/embeddings/openai.py", line 452, in _get_len_safe_embeddings raise ImportError( ImportError: Could not import tiktoken python package. This is needed in order to for OpenAIEmbeddings. Please install it with `pip install tiktoken`.

ristomcgehee · 2024-01-07T04:24:21Z

python-sdk/rebuff/detect_pi_openai.py

+    """
+    openai.api_key = api_key
+
+    completion = openai.ChatCompletion.create(


I installed the latest openai version, and I got this error when executing:

Traceback (most recent call last): File "/home/risto/repos/protectai/rebuff/python-sdk/detect_rebuff.py", line 25, in <module> print(rb.detect_injection(user_input)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/risto/repos/protectai/rebuff/python-sdk/rebuff/sdk.py", line 97, in detect_injection model_response = call_openai_to_detect_pi( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/risto/repos/protectai/rebuff/python-sdk/rebuff/detect_pi_openai.py", line 71, in call_openai_to_detect_pi completion = openai.ChatCompletion.create( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/risto/repos/protectai/rebuff/python-sdk/.venv/lib/python3.11/site-packages/openai/lib/_old_api.py", line 39, in __call__ raise APIRemovedInV1(symbol=self._symbol) openai.lib._old_api.APIRemovedInV1: You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API. You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28` A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742

Instead of using a deprecated API, can you use the updated API for openai?

Thank you for highlighting this. I have updated the code for updated openai API. Also, I noticed that the class OpenAIEmbeddings is deprecated in v0.1.0 and might get removed in v0.2.0 so also updated code with the alternative they have suggested: langchain_openai.OpenAIEmbeddings . I have also added langchain_openai in setup.py under install_requires

That's great!

ristomcgehee · 2024-01-07T23:30:34Z

python-sdk/rebuff/rebuff_pythonsdk.py

+
+            rebuff_vector_score = 0
+            similarity_threshold = 0.3
+            vector_store._text_key = "input"


Langchain PR: langchain-ai/langchain#15663

ristomcgehee · 2024-01-09T04:17:31Z

This PR looks good from my perspective. Thanks for all your work on it, Mehrin!

seanpmorgan

LGTM, Thanks @mehrinkiani and @ristomcgehee for the review!

mehrinkiani added 2 commits December 5, 2023 14:01

Added Python Rebuff SDK

4a32ca6

Merge branch 'main' into initial-python-sdk

e9be35d

mehrinkiani changed the base branch from main to python-sdk December 5, 2023 22:56

Added notebook to exhibit Rebuff Python SDK

71907ad

ristomcgehee reviewed Dec 20, 2023

View reviewed changes

ristomcgehee mentioned this pull request Dec 23, 2023

Modularize detection checks #90

Merged

Updated API call, fixed typehints, raised exceptions

524c6da

ristomcgehee reviewed Jan 4, 2024

View reviewed changes

python-sdk/rebuff/rebuff_pythonsdk.py Outdated Show resolved Hide resolved

python-sdk/rebuff/rebuff_pythonsdk.py Outdated Show resolved Hide resolved

python-sdk/rebuff/rebuff_pythonsdk.py Outdated Show resolved Hide resolved

ristomcgehee reviewed Jan 4, 2024

View reviewed changes

mehrinkiani added 2 commits January 4, 2024 15:35

New name RebuffSdk, added instance vector_store

c4a88df

Added helper function for initializing Pinecone

a4c6a4f

ristomcgehee reviewed Jan 7, 2024

View reviewed changes

Updated code to reflect changes in OpenAI API

68d09b9

mehrinkiani changed the base branch from python-sdk to main January 9, 2024 21:34

seanpmorgan approved these changes Jan 10, 2024

View reviewed changes

seanpmorgan merged commit a8aba0a into main Jan 10, 2024

seanpmorgan deleted the initial-python-sdk branch January 10, 2024 18:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python Rebuff SDK #88

Python Rebuff SDK #88

mehrinkiani commented Dec 5, 2023

ristomcgehee left a comment

mehrinkiani commented Jan 3, 2024

ristomcgehee left a comment

ristomcgehee Jan 4, 2024

mehrinkiani Jan 4, 2024

ristomcgehee Jan 5, 2024

mehrinkiani Jan 5, 2024

ristomcgehee Jan 6, 2024 •

edited

Loading

ristomcgehee Jan 7, 2024

mehrinkiani Jan 8, 2024

ristomcgehee Jan 7, 2024

ristomcgehee Jan 7, 2024

ristomcgehee Jan 7, 2024

mehrinkiani Jan 8, 2024

ristomcgehee Jan 9, 2024

ristomcgehee Jan 7, 2024

ristomcgehee commented Jan 9, 2024

seanpmorgan left a comment

Python Rebuff SDK #88

Python Rebuff SDK #88

Conversation

mehrinkiani commented Dec 5, 2023

ristomcgehee left a comment

Choose a reason for hiding this comment

mehrinkiani commented Jan 3, 2024

ristomcgehee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ristomcgehee Jan 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ristomcgehee commented Jan 9, 2024

seanpmorgan left a comment

Choose a reason for hiding this comment

ristomcgehee Jan 6, 2024 •

edited

Loading