Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using MIPROv2 max_bootstrapped_demos > 0 with dspy.Image causes context to blow up #1954

Open
vbeutner opened this issue Dec 18, 2024 · 2 comments

Comments

@vbeutner
Copy link

What (I think) is the problem

When using a vlm, optimizing using MIPROv2 & setting max_bootstrap_demos > 0. The optimizer will put the b64encoded image string into the prompt causing the context to blow up beyond the 200k context window available.

Code example

# setup LLM
lm = dspy.LM("bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
dspy.configure(lm=lm)

# setup signature and module
class ExtractInfo(dspy.Signature):
    """ 
    Please find signatures in this image.
    """
    #question: str = dspy.InputField()
    image: dspy.Image = dspy.InputField(desc="Base64 data of the image", is_image=True)
    
    signatures_found: bool = dspy.OutputField()
    
find_signature = dspy.Predict(ExtractInfo)

# dataset
[Example({'image': Image(url = data:image/png;base64,<IMAGE_BASE_64_ENCODED(626104)>), 'signatures_found': True}) (input_keys={'image'}),
 Example({'image': Image(url = data:image/png;base64,<IMAGE_BASE_64_ENCODED(85120)>), 'signatures_found': True}) (input_keys={'image'}),
 Example({'image': Image(url = data:image/png;base64,<IMAGE_BASE_64_ENCODED(212204)>), 'signatures_found': True}) (input_keys={'image'}),

# setup validator and optimizer
def validate_answer(truth, pred, trace=None):
    truth = truth.toDict()['signatures_found']
    pred = pred.toDict()['signatures_found']
    return truth == pred

teleprompter = MIPROv2(metric=validate_answer, init_temperature=0, verbose=True, num_candidates=4, max_bootstrapped_demos=1,
                max_labeled_demos=2,)

# compile
compiled = teleprompter.compile(find_signature, trainset=train, requires_permission_to_run=False,)

Output

2024/12/18 09:29:49 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2024/12/18 09:29:49 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2024/12/18 09:29:49 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=4 sets of demonstrations...

Bootstrapping set 1/4
Bootstrapping set 2/4
Bootstrapping set 3/4

  0%|                                                                                                                                                                                                                                                  | 0/19 [00:00<?, ?it/s]
  5%|████████████▎                                                                                                                                                                                                                             | 1/19 [00:03<01:03,  3.51s/it]

Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 4/4

  0%|                                                                                                                                                                                                                                                  | 0/19 [00:00<?, ?it/s]
  5%|████████████▎                                                                                                                                                                                                                             | 1/19 [00:03<00:55,  3.06s/it]
2024/12/18 09:29:55 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2024/12/18 09:29:55 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.

Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
SOURCE CODE: 



2024/12/18 09:30:18 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing instructions...


DATA SUMMARY: This is a curated dataset of high-resolution document scans designed for training signature detection models, with images encoded in base64 format and binary classification labels. The dataset shows a significant class imbalance with 90% of examples containing signatures, and exhibits varying file sizes (85KB-626KB) suggesting diverse document types. The consistent structure and high-resolution nature of the scans make it suitable for supervised learning applications in automated document processing systems.
Using a randomly generated configuration for our grounded proposer.
Selected tip: persona
PROGRAM DESCRIPTION: Unable to provide description due to missing input code and example.
  0%|                                                                                                                                                                                                                                                  | 0/76 [02:20<?, ?it/s]
  0%|                                                                                                                                                                                                                                                  | 0/25 [01:31<?, ?it/s]
task_demos 




[2024-12-18T09:30:27.124743]

System message:

Your input fields are:
1. `dataset_description` (str): A description of the dataset that we are using.
2. `program_code` (str): Language model program designed to solve a particular task.
3. `program_description` (str): Summary of the task the program is designed to solve, and how it goes about solving it.
4. `module` (str): The module to create an instruction for.
5. `task_demos` (str): Example inputs/outputs of our module.
6. `basic_instruction` (str): Basic instruction.
7. `tip` (str): A suggestion for how to go about generating the new instruction.

Your output fields are:
1. `proposed_instruction` (str): Propose an instruction that will be used to prompt a Language Model to perform this task.

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## dataset_description ## ]]
{dataset_description}

[[ ## program_code ## ]]
{program_code}

[[ ## program_description ## ]]
{program_description}

[[ ## module ## ]]
{module}

[[ ## task_demos ## ]]
{task_demos}

[[ ## basic_instruction ## ]]
{basic_instruction}

[[ ## tip ## ]]
{tip}

[[ ## proposed_instruction ## ]]
{proposed_instruction}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Use the information below to learn about a task that we are trying to solve using calls to an LM, then generate a new instruction that will be used to prompt a Language Model to better solve the task.


User message:

[[ ## dataset_description ## ]]
This is a curated dataset of high-resolution document scans designed for training signature detection models, with images encoded in base64 format and binary classification labels. The dataset shows a significant class imbalance with 90% of examples containing signatures, and exhibits varying file sizes (85KB-626KB) suggesting diverse document types. The consistent structure and high-resolution nature of the scans make it suitable for supervised learning applications in automated document processing systems.

[[ ## program_code ## ]]




[[ ## program_description ## ]]
Unable to provide description due to missing input code and example.

[[ ## module ## ]]
Predict(image) -> signatures_found

[[ ## task_demos ## ]]


[[ ## basic_instruction ## ]]
Please find signatures in this image.

[[ ## tip ## ]]
Include a persona that is relevant to the task in the instruction (ie. "You are a ...")

Respond with the corresponding output fields, starting with the field `[[ ## proposed_instruction ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


Response:

[[ ## proposed_instruction ## ]]
You are a professional document examiner with expertise in signature verification and detection. Please carefully analyze this document image and identify any handwritten signatures present. Pay special attention to the bottom sections of the document where signatures typically appear, as well as margins and designated signature lines. Indicate whether you find any signatures in the image, considering both cursive and printed signature styles.

[[ ## completed ## ]]
PROPOSED INSTRUCTION: Analyze this document image carefully and identify any handwritten signatures present. A signature typically appears as a personalized, stylized handwriting that serves as a unique identifier. Pay special attention to:
1. The bottom sections of the document where signatures are commonly placed
2. Areas near printed text that might require authorization
3. Margins and designated signature lines
4. Both cursive and printed-style signatures
5. Signatures in any color (black, blue, or other inks)

Please indicate whether you find any signatures in the image (yes/no). Consider both clear, well-defined signatures and partial or less distinct signature marks. Ignore other handwritten text or markings that don't appear to be signatures.
Using a randomly generated configuration for our grounded proposer.
Selected tip: persona
Error getting program description. Running without program aware proposer.
task_demos Image: url='[data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABqQAAAiYCAIAAAA+NVHk](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABqQAAAiYCAIAAAA+NVHkAAEAAElEQVR4nOzdd5xU1d0/8HPb9LKzbbb3Qlu6FEGaGLDxiJJYUMQWY6KIXRS7UcSo5NGoRI3lhwaxoaBIR3ovyy7be5ve+22/P47cZ5wtLLAg5ft+vUx279y599wyw85nvuccQhRFBAAAAAAAAAAAAAAAOP............

(I believe the above b64 string takes up most of the context window. The compilation eventually errors out due to context window exceeded)

Feature request (while im here)

It would be cool to get an image -> answer shorthand string notation

@okhat okhat changed the title Using MIPROv2 max_bootstrapped_demos > 0 with vlm causes context to blow up Using MIPROv2 max_bootstrapped_demos > 0 with dspy.Image causes context to blow up Dec 18, 2024
@okhat
Copy link
Collaborator

okhat commented Dec 18, 2024

Thanks @vbeutner ! I'm not entirely sure, since this is a pretty long issue, but I can offer two thoughts:

  1. We've had many reports of successful few-shot optimization with dspy.Image, so the error here might be fixable.
  2. Support for dspy.Image is experimental and undocumented, so we can't guarantee debugging help at the moment.

In a month from now, I expect dspy.Image to mature and we'll surely catch any typical issues by then.

@vbeutner
Copy link
Author

vbeutner commented Dec 18, 2024

Thanks @okhat
To be clear, it works well for me otherwise and isn't a blocker for me. I just wanted to put it on your radar. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants