Skip to content

Commit

Permalink
feat: add support to upload image and generate contents from image
Browse files Browse the repository at this point in the history
  • Loading branch information
HanaokaYuzu committed Mar 7, 2024
1 parent 6c48d50 commit d2274c2
Show file tree
Hide file tree
Showing 5 changed files with 160 additions and 34 deletions.
31 changes: 28 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,11 @@

# <img src="assets/logo.svg" width="35px" alt="Gemini Icon" /> Gemini-API

A reverse-engineered asynchronous python wrapper for [Google Gemini](https://gemini.google.com) web chat (formerly Bard).
A reverse-engineered asynchronous python wrapper for [Google Gemini](https://gemini.google.com) web app (formerly Bard).

## Features

- **(WIP) Auto Cookie Management**
- **ImageFx Support** - Supports retrieving images generated by ImageFx, Google's latest AI image generator.
- **Extension Support** - Supports generating contents with [Gemini extensions](https://gemini.google.com/extensions) on, like YouTube and Gmail.
- **Classified Outputs** - Auto categorizes texts, web images and AI generated images from the response.
Expand All @@ -33,7 +34,8 @@ A reverse-engineered asynchronous python wrapper for [Google Gemini](https://gem
- [Authentication](#authentication)
- [Usage](#usage)
- [Initialization](#initialization)
- [Generate contents from text inputs](#generate-contents-from-text-inputs)
- [Generate contents from text](#generate-contents-from-text)
- [Generate contents from image](#generate-contents-from-image)
- [Conversations across multiple turns](#conversations-across-multiple-turns)
- [Retrieve images in response](#retrieve-images-in-response)
- [Generate images with ImageFx](#generate-images-with-imagefx)
Expand All @@ -56,6 +58,7 @@ pip install gemini_webapi
- Click any request and copy cookie values of `__Secure-1PSID` and `__Secure-1PSIDTS`

> [!TIP]
>
> `__Secure-1PSIDTS` could get expired frequently if <https://gemini.google.com> is kept opened in the browser after copying cookies. It's recommended to get cookies from a separate session (e.g. a new login in browser's private mode) if you are building a keep-alive service with this package.
>
> For more details, please refer to discussions in [issue #6](https://github.com/HanaokaYuzu/Gemini-API/issues/6)
Expand All @@ -82,9 +85,10 @@ asyncio.run(main())
```

> [!TIP]
>
> `auto_close` and `close_delay` are optional arguments for automatically closing the client after a certain period of inactivity. This feature is disabled by default. In a keep-alive service like chatbot, it's recommended to set `auto_close` to `True` combined with reasonable seconds of `close_delay` for better resource management.
### Generate contents from text inputs
### Generate contents from text

Ask a one-turn quick question by calling `GeminiClient.generate_content`.

Expand All @@ -97,8 +101,21 @@ asyncio.run(main())
```

> [!TIP]
>
> Simply use `print(response)` to get the same output if you just want to see the response text
### Generate contents from image

Gemini supports image recognition and generate contents from image (currently only supports one image at a time). Optionally, you can pass image data in `bytes` or its path in `str` to `GeminiClient.generate_content` together with text prompt.

```python
async def main():
response = await client.generate_content("Describe the image", image="assets/banner.png")
print(response.text)

asyncio.run(main())
```

### Conversations across multiple turns

If you want to keep conversation continuous, please use `GeminiClient.start_chat` to create a `ChatSession` object and send messages through it. The conversation history will be automatically handled and get updated after each turn.
Expand All @@ -113,6 +130,10 @@ async def main():
asyncio.run(main())
```

> [!TIP]
>
> Same as `GeminiClient.generate_content`, `ChatSession.send_message` also accepts `image` as an optional argument.
### Retrieve images in response

Images in the API's output are stored as a list of `Image` objects. You can access the image title, URL, and description by calling `image.title`, `image.url` and `image.alt` respectively.
Expand All @@ -131,6 +152,7 @@ asyncio.run(main())
In February 2022, Google introduced a new AI image generator called ImageFx and integrated it into Gemini. You can ask Gemini to generate images with ImageFx simply by natural language.

> [!IMPORTANT]
>
> Google has some limitations on the image generation feature in Gemini, so its availability could be different per region/account. Here's a summary copied from [official documentation](https://support.google.com/gemini/answer/14286560) (as of February 15th, 2024):
>
> > Image generation in Gemini Apps is available in most countries, except in the European Economic Area (EEA), Switzerland, and the UK. It’s only available for **English prompts**.
Expand All @@ -149,6 +171,7 @@ asyncio.run(main())
```

> [!NOTE]
>
> by default, when asked to send images (like the previous example), Gemini will send images fetched from web instead of generating images with AI model, unless you specifically require to "generate" images in your prompt. In this package, web images and generated images are treated differently as `WebImage` and `GeneratedImage`, and will be automatically categorized in the output.
### Save images to local files
Expand All @@ -167,6 +190,7 @@ asyncio.run(main())
### Generate contents with Gemini extensions

> [!IMPORTANT]
>
> To access Gemini extensions in API, you must activate them on the [Gemini website](https://gemini.google.com/extensions) first. Same as image generation, Google also has limitations on the availability of Gemini extensions. Here's a summary copied from [official documentation](https://support.google.com/gemini/answer/13695044) (as of February 18th, 2024):
>
> > To use extensions in Gemini Apps:
Expand All @@ -191,6 +215,7 @@ asyncio.run(main())
```

> [!NOTE]
>
> For the available regions limitation, it actually only requires your Google account's **preferred language** to be set to one of the three supported languages listed above. You can change your language settings [here](https://myaccount.google.com/language).
### Check and switch to other reply candidates
Expand Down
89 changes: 61 additions & 28 deletions src/gemini_webapi/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

from .types import WebImage, GeneratedImage, Candidate, ModelOutput
from .exceptions import APIError, AuthError, TimeoutError, GeminiError
from .utils import upload_file
from .constant import HEADERS


Expand All @@ -34,16 +35,16 @@ async def wrapper(self: "GeminiClient", *args, **kwargs):

class GeminiClient:
"""
Async httpx client interface for gemini.google.com
Async httpx client interface for gemini.google.com.
Parameters
----------
secure_1psid: `str`
__Secure-1PSID cookie value
__Secure-1PSID cookie value.
secure_1psidts: `str`, optional
__Secure-1PSIDTS cookie value, some google accounts don't require this value, provide only if it's in the cookie list
__Secure-1PSIDTS cookie value, some google accounts don't require this value, provide only if it's in the cookie list.
proxy: `dict`, optional
Dict of proxies
Dict of proxies.
"""

__slots__ = [
Expand All @@ -65,12 +66,12 @@ def __init__(
):
self.cookies = {"__Secure-1PSID": secure_1psid}
self.proxy = proxy
self.client: AsyncClient | None = None
self.access_token: Optional[str] = None
self.client: AsyncClient = None
self.access_token: str = None
self.running: bool = False
self.auto_close: bool = False
self.close_delay: float = 300
self.close_task: Task | None = None
self.close_task: Task = None

if secure_1psidts:
self.cookies["__Secure-1PSIDTS"] = secure_1psidts
Expand All @@ -84,12 +85,12 @@ async def init(
Parameters
----------
timeout: `float`, optional
Request timeout of the client in seconds. Used to limit the max waiting time when sending a request
Request timeout of the client in seconds. Used to limit the max waiting time when sending a request.
auto_close: `bool`, optional
If `True`, the client will close connections and clear resource usage after a certain period
of inactivity. Useful for keep-alive services
of inactivity. Useful for keep-alive services.
close_delay: `float`, optional
Time to wait before auto-closing the client in seconds. Effective only if `auto_close` is `True`
Time to wait before auto-closing the client in seconds. Effective only if `auto_close` is `True`.
"""
try:
self.client = AsyncClient(
Expand Down Expand Up @@ -132,7 +133,7 @@ async def close(self, delay: float = 0) -> None:
Parameters
----------
delay: `float`, optional
Time to wait before closing the client in seconds
Time to wait before closing the client in seconds.
"""
if delay:
await asyncio.sleep(delay)
Expand All @@ -155,23 +156,28 @@ async def reset_close_task(self) -> None:

@running
async def generate_content(
self, prompt: str, chat: Optional["ChatSession"] = None
self,
prompt: str,
image: Optional[bytes | str] = None,
chat: Optional["ChatSession"] = None,
) -> ModelOutput:
"""
Generates contents with prompt.
Parameters
----------
prompt: `str`
Prompt provided by user
Prompt provided by user.
image: `bytes` | `str`, optional
File data in bytes, or path to the image file to be sent together with the prompt.
chat: `ChatSession`, optional
Chat data to retrieve conversation history. If None, will automatically generate a new chat id when sending post request
Chat data to retrieve conversation history. If None, will automatically generate a new chat id when sending post request.
Returns
-------
:class:`ModelOutput`
Output data from gemini.google.com, use `ModelOutput.text` to get the default text reply, `ModelOutput.images` to get a list
of images in the default reply, `ModelOutput.candidates` to get a list of all answer candidates in the output
of images in the default reply, `ModelOutput.candidates` to get a list of all answer candidates in the output.
"""
assert prompt, "Prompt cannot be empty."

Expand All @@ -184,7 +190,23 @@ async def generate_content(
data={
"at": self.access_token,
"f.req": json.dumps(
[None, json.dumps([[prompt], None, chat and chat.metadata])]
[
None,
json.dumps(
[
image
and [
prompt,
0,
None,
[[[await upload_file(image), 1]]],
]
or [prompt],
None,
chat and chat.metadata,
]
),
]
),
},
)
Expand Down Expand Up @@ -280,7 +302,7 @@ def start_chat(self, **kwargs) -> "ChatSession":
Returns
-------
:class:`ChatSession`
Empty chat object for retrieving conversation history
Empty chat object for retrieving conversation history.
"""
return ChatSession(geminiclient=self, **kwargs)

Expand All @@ -292,15 +314,15 @@ class ChatSession:
Parameters
----------
geminiclient: `GeminiClient`
Async httpx client interface for gemini.google.com
Async httpx client interface for gemini.google.com.
metadata: `list[str]`, optional
List of chat metadata `[cid, rid, rcid]`, can be shorter than 3 elements, like `[cid, rid]` or `[cid]` only
List of chat metadata `[cid, rid, rcid]`, can be shorter than 3 elements, like `[cid, rid]` or `[cid]` only.
cid: `str`, optional
Chat id, if provided together with metadata, will override the first value in it
Chat id, if provided together with metadata, will override the first value in it.
rid: `str`, optional
Reply id, if provided together with metadata, will override the second value in it
Reply id, if provided together with metadata, will override the second value in it.
rcid: `str`, optional
Reply candidate id, if provided together with metadata, will override the third value in it
Reply candidate id, if provided together with metadata, will override the third value in it.
"""

# @properties needn't have their slots pre-defined
Expand Down Expand Up @@ -339,23 +361,29 @@ def __setattr__(self, name: str, value: Any) -> None:
self.metadata = value.metadata
self.rcid = value.rcid

async def send_message(self, prompt: str) -> ModelOutput:
async def send_message(
self, prompt: str, image: Optional[bytes | str] = None
) -> ModelOutput:
"""
Generates contents with prompt.
Use as a shortcut for `GeminiClient.generate_content(prompt, self)`.
Use as a shortcut for `GeminiClient.generate_content(prompt, image, self)`.
Parameters
----------
prompt: `str`
Prompt provided by user
Prompt provided by user.
image: `bytes` | `str`, optional
File data in bytes, or path to the image file to be sent together with the prompt.
Returns
-------
:class:`ModelOutput`
Output data from gemini.google.com, use `ModelOutput.text` to get the default text reply, `ModelOutput.images` to get a list
of images in the default reply, `ModelOutput.candidates` to get a list of all answer candidates in the output
of images in the default reply, `ModelOutput.candidates` to get a list of all answer candidates in the output.
"""
return await self.geminiclient.generate_content(prompt, self)
return await self.geminiclient.generate_content(
prompt=prompt, image=image, chat=self
)

def choose_candidate(self, index: int) -> ModelOutput:
"""
Expand All @@ -364,7 +392,12 @@ def choose_candidate(self, index: int) -> ModelOutput:
Parameters
----------
index: `int`
Index of the candidate to choose, starting from 0
Index of the candidate to choose, starting from 0.
Returns
-------
:class:`ModelOutput`
Output data of the chosen candidate.
"""
if not self.last_output:
raise ValueError("No previous output data found in this chat session.")
Expand Down
2 changes: 2 additions & 0 deletions src/gemini_webapi/constant.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,5 @@
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"X-Same-Domain": "1",
}

UPLOAD_PUSHID = "feeds/mcudyrk2a4khkz"
41 changes: 41 additions & 0 deletions src/gemini_webapi/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
from httpx import AsyncClient
from pydantic import validate_call

from .constant import UPLOAD_PUSHID


@validate_call
async def upload_file(file: bytes | str) -> str:
"""
Upload a file to Google's server and return its identifier.
Parameters
----------
file : `bytes` | `str`
File data in bytes, or path to the file to be uploaded.
Returns
-------
`str`
Identifier of the uploaded file.
E.g. "/contrib_service/ttl_1d/1709764705i7wdlyx3mdzndme3a767pluckv4flj"
Raises
------
`httpx.HTTPStatusError`
If the upload request failed.
"""

if isinstance(file, str):
with open(file, "rb") as f:
file = f.read()

async with AsyncClient() as client:
response = await client.post(
url="https://content-push.googleapis.com/upload/",
headers={"Push-ID": UPLOAD_PUSHID},
files={"file": file},
follow_redirects=True,
)
response.raise_for_status()
return response.text
Loading

0 comments on commit d2274c2

Please sign in to comment.