You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using a vlm, optimizing using MIPROv2 & setting max_bootstrap_demos > 0. The optimizer will put the b64encoded image string into the prompt causing the context to blow up beyond the 200k context window available.
Code example
# setup LLM
lm = dspy.LM("bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
dspy.configure(lm=lm)
# setup signature and module
class ExtractInfo(dspy.Signature):
"""
Please find signatures in this image.
"""
#question: str = dspy.InputField()
image: dspy.Image = dspy.InputField(desc="Base64 data of the image", is_image=True)
signatures_found: bool = dspy.OutputField()
find_signature = dspy.Predict(ExtractInfo)
# dataset
[Example({'image': Image(url = data:image/png;base64,<IMAGE_BASE_64_ENCODED(626104)>), 'signatures_found': True}) (input_keys={'image'}),
Example({'image': Image(url = data:image/png;base64,<IMAGE_BASE_64_ENCODED(85120)>), 'signatures_found': True}) (input_keys={'image'}),
Example({'image': Image(url = data:image/png;base64,<IMAGE_BASE_64_ENCODED(212204)>), 'signatures_found': True}) (input_keys={'image'}),
# setup validator and optimizer
def validate_answer(truth, pred, trace=None):
truth = truth.toDict()['signatures_found']
pred = pred.toDict()['signatures_found']
return truth == pred
teleprompter = MIPROv2(metric=validate_answer, init_temperature=0, verbose=True, num_candidates=4, max_bootstrapped_demos=1,
max_labeled_demos=2,)
# compile
compiled = teleprompter.compile(find_signature, trainset=train, requires_permission_to_run=False,)
Output
2024/12/18 09:29:49 INFO dspy.teleprompt.mipro_optimizer_v2:
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2024/12/18 09:29:49 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.
2024/12/18 09:29:49 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=4 sets of demonstrations...
Bootstrapping set 1/4
Bootstrapping set 2/4
Bootstrapping set 3/4
0%| | 0/19 [00:00<?, ?it/s]
5%|████████████▎ | 1/19 [00:03<01:03, 3.51s/it]
Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 4/4
0%| | 0/19 [00:00<?, ?it/s]
5%|████████████▎ | 1/19 [00:03<00:55, 3.06s/it]
2024/12/18 09:29:55 INFO dspy.teleprompt.mipro_optimizer_v2:
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2024/12/18 09:29:55 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.
Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
SOURCE CODE:
2024/12/18 09:30:18 INFO dspy.teleprompt.mipro_optimizer_v2:
Proposing instructions...
DATA SUMMARY: This is a curated dataset of high-resolution document scans designed for training signature detection models, with images encoded in base64 format and binary classification labels. The dataset shows a significant class imbalance with 90% of examples containing signatures, and exhibits varying file sizes (85KB-626KB) suggesting diverse document types. The consistent structure and high-resolution nature of the scans make it suitable for supervised learning applications in automated document processing systems.
Using a randomly generated configuration for our grounded proposer.
Selected tip: persona
PROGRAM DESCRIPTION: Unable to provide description due to missing input code and example.
0%| | 0/76 [02:20<?, ?it/s]
0%| | 0/25 [01:31<?, ?it/s]
task_demos
[2024-12-18T09:30:27.124743]
System message:
Your input fields are:
1. `dataset_description` (str): A description of the dataset that we are using.
2. `program_code` (str): Language model program designed to solve a particular task.
3. `program_description` (str): Summary of the task the program is designed to solve, and how it goes about solving it.
4. `module` (str): The module to create an instruction for.
5. `task_demos` (str): Example inputs/outputs of our module.
6. `basic_instruction` (str): Basic instruction.
7. `tip` (str): A suggestion for how to go about generating the new instruction.
Your output fields are:
1. `proposed_instruction` (str): Propose an instruction that will be used to prompt a Language Model to perform this task.
All interactions will be structured in the following way, with the appropriate values filled in.
[[ ## dataset_description ## ]]
{dataset_description}
[[ ## program_code ## ]]
{program_code}
[[ ## program_description ## ]]
{program_description}
[[ ## module ## ]]
{module}
[[ ## task_demos ## ]]
{task_demos}
[[ ## basic_instruction ## ]]
{basic_instruction}
[[ ## tip ## ]]
{tip}
[[ ## proposed_instruction ## ]]
{proposed_instruction}
[[ ## completed ## ]]
In adhering to this structure, your objective is:
Use the information below to learn about a task that we are trying to solve using calls to an LM, then generate a new instruction that will be used to prompt a Language Model to better solve the task.
User message:
[[ ## dataset_description ## ]]
This is a curated dataset of high-resolution document scans designed for training signature detection models, with images encoded in base64 format and binary classification labels. The dataset shows a significant class imbalance with 90% of examples containing signatures, and exhibits varying file sizes (85KB-626KB) suggesting diverse document types. The consistent structure and high-resolution nature of the scans make it suitable for supervised learning applications in automated document processing systems.
[[ ## program_code ## ]]
[[ ## program_description ## ]]
Unable to provide description due to missing input code and example.
[[ ## module ## ]]
Predict(image) -> signatures_found
[[ ## task_demos ## ]]
[[ ## basic_instruction ## ]]
Please find signatures in this image.
[[ ## tip ## ]]
Include a persona that is relevant to the task in the instruction (ie. "You are a ...")
Respond with the corresponding output fields, starting with the field `[[ ## proposed_instruction ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.
Response:
[[ ## proposed_instruction ## ]]
You are a professional document examiner with expertise in signature verification and detection. Please carefully analyze this document image and identify any handwritten signatures present. Pay special attention to the bottom sections of the document where signatures typically appear, as well as margins and designated signature lines. Indicate whether you find any signatures in the image, considering both cursive and printed signature styles.
[[ ## completed ## ]]
PROPOSED INSTRUCTION: Analyze this document image carefully and identify any handwritten signatures present. A signature typically appears as a personalized, stylized handwriting that serves as a unique identifier. Pay special attention to:
1. The bottom sections of the document where signatures are commonly placed
2. Areas near printed text that might require authorization
3. Margins and designated signature lines
4. Both cursive and printed-style signatures
5. Signatures in any color (black, blue, or other inks)
Please indicate whether you find any signatures in the image (yes/no). Consider both clear, well-defined signatures and partial or less distinct signature marks. Ignore other handwritten text or markings that don't appear to be signatures.
Using a randomly generated configuration for our grounded proposer.
Selected tip: persona
Error getting program description. Running without program aware proposer.
task_demos Image: url='[data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABqQAAAiYCAIAAAA+NVHk](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABqQAAAiYCAIAAAA+NVHkAAEAAElEQVR4nOzdd5xU1d0/8HPb9LKzbbb3Qlu6FEGaGLDxiJJYUMQWY6KIXRS7UcSo5NGoRI3lhwaxoaBIR3ovyy7be5ve+22/P47cZ5wtLLAg5ft+vUx279y599wyw85nvuccQhRFBAAAAAAAAAAAAAAAOP............
(I believe the above b64 string takes up most of the context window. The compilation eventually errors out due to context window exceeded)
Feature request (while im here)
It would be cool to get an image -> answer shorthand string notation
The text was updated successfully, but these errors were encountered:
okhat
changed the title
Using MIPROv2 max_bootstrapped_demos > 0 with vlm causes context to blow up
Using MIPROv2 max_bootstrapped_demos > 0 with dspy.Image causes context to blow up
Dec 18, 2024
What (I think) is the problem
When using a vlm, optimizing using MIPROv2 & setting
max_bootstrap_demos
> 0. The optimizer will put the b64encoded image string into the prompt causing the context to blow up beyond the 200k context window available.Code example
Output
Feature request (while im here)
It would be cool to get an
image -> answer
shorthand string notationThe text was updated successfully, but these errors were encountered: