`--xl/--extract-last` option #717

simonw · 2025-01-24T18:25:19Z

Currently the -x/--extract option returns the first fenced code block. Dirk Loss:

Why the first code block and not the last? Don't some models self-correct their output sometimes?

The text was updated successfully, but these errors were encountered:

simonw · 2025-01-24T18:26:44Z

Will complete this in the PR:

--xl/--extract-last #718

simonw · 2025-01-24T18:28:23Z

As an experiment I asked o1 to review the diff for me:

git diff | llm -m o1 'did I miss anything in adding this feature?'

2025-01-24T17:54:26 conversation: 01jjcq8q9ety0wxa6scdcmgz2a

Model: o1

Prompt:

git diff

diff --git a/llm/cli.py b/llm/cli.py
index 1bde74b..606ff3b 100644
--- a/llm/cli.py
+++ b/llm/cli.py
@@ -37,7 +37,7 @@ from .utils import (
     mimetype_from_path,
     mimetype_from_string,
     token_usage_string,
-    extract_first_fenced_code_block,
+    extract_fenced_code_block,
 )
 import base64
 import httpx
@@ -210,6 +210,13 @@ def cli():
 @click.option("async_", "--async", is_flag=True, help="Run prompt asynchronously")
 @click.option("-u", "--usage", is_flag=True, help="Show token usage")
 @click.option("-x", "--extract", is_flag=True, help="Extract first fenced code block")
+@click.option(
+    "extract_last",
+    "--xl",
+    "--extract-last",
+    is_flag=True,
+    help="Extract last fenced code block",
+)
 def prompt(
     prompt,
     system,
@@ -229,6 +236,7 @@ def prompt(
     async_,
     usage,
     extract,
+    extract_last,
 ):
     """
     Execute a prompt
@@ -318,6 +326,8 @@ def prompt(
             to_save["defaults"] = dict(param)
         if extract:
             to_save["extract"] = True
+        if extract_last:
+            to_save["extract_last"] = True
         path.write_text(
             yaml.dump(
                 to_save,
@@ -335,6 +345,7 @@ def prompt(
             raise click.ClickException("Cannot use -t/--template and --system together")
         template_obj = load_template(template)
         extract = template_obj.extract
+        extract_last = template_obj.extract_last
         prompt = read_prompt()
         try:
             prompt, system = template_obj.evaluate(prompt, params)
@@ -343,7 +354,7 @@ def prompt(
         if model_id is None and template_obj.model:
             model_id = template_obj.model
 
-    if extract:
+    if extract or extract_last:
         no_stream = True
 
     conversation = None
@@ -428,7 +439,9 @@ def prompt(
                     )
                     text = await response.text()
                     if extract:
-                        text = extract_first_fenced_code_block(text) or text
+                        text = extract_fenced_code_block(text) or text
+                    if extract_last:
+                        text = extract_fenced_code_block(text, last=True) or text
                     print(text)
                 return response
 
@@ -448,7 +461,9 @@ def prompt(
             else:
                 text = response.text()
                 if extract:
-                    text = extract_first_fenced_code_block(text) or text
+                    text = extract_fenced_code_block(text) or text
+                if extract_last:
+                    text = extract_fenced_code_block(text, last=True) or text
                 print(text)
     # List of exceptions that should never be raised in pytest:
     except (ValueError, NotImplementedError) as ex:
@@ -862,6 +877,13 @@ order by prompt_attachments."order"
 @click.option("-u", "--usage", is_flag=True, help="Include token usage")
 @click.option("-r", "--response", is_flag=True, help="Just output the last response")
 @click.option("-x", "--extract", is_flag=True, help="Extract first fenced code block")
+@click.option(
+    "extract_last",
+    "--xl",
+    "--extract-last",
+    is_flag=True,
+    help="Extract last fenced code block",
+)
 @click.option(
     "current_conversation",
     "-c",
@@ -891,6 +913,7 @@ def logs_list(
     usage,
     response,
     extract,
+    extract_last,
     current_conversation,
     conversation_id,
     json_output,
@@ -996,10 +1019,10 @@ def logs_list(
                 for attachment in attachments_by_id.get(row["id"], [])
             ]
         output = json.dumps(list(rows), indent=2)
-    elif extract:
+    elif extract or extract_last:
         # Extract and return first code block
         for row in rows:
-            output = extract_first_fenced_code_block(row["response"])
+            output = extract_fenced_code_block(row["response"], last=extract_last)
             if output is not None:
                 break
     elif response:
diff --git a/llm/templates.py b/llm/templates.py
index b540fad..0cf1616 100644
--- a/llm/templates.py
+++ b/llm/templates.py
@@ -9,8 +9,9 @@ class Template(BaseModel):
     system: Optional[str] = None
     model: Optional[str] = None
     defaults: Optional[Dict[str, Any]] = None
-    # Should first fenced code block be extracted?
+    # Should a fenced code block be extracted?
     extract: Optional[bool] = None
+    extract_last: Optional[bool] = None
 
     class Config:
         extra = "forbid"
diff --git a/llm/utils.py b/llm/utils.py
index a4f57a0..b58d42a 100644
--- a/llm/utils.py
+++ b/llm/utils.py
@@ -156,9 +156,9 @@ def token_usage_string(input_tokens, output_tokens, token_details) -> str:
     return ", ".join(bits)
 
 
-def extract_first_fenced_code_block(text: str) -> Optional[str]:
+def extract_fenced_code_block(text: str, last: bool = False) -> Optional[str]:
     """
-    Extracts and returns the first Markdown fenced code block found in the given text.
+    Extracts and returns Markdown fenced code block found in the given text.
 
     The function handles fenced code blocks that:
     - Use at least three backticks (`).
@@ -169,6 +169,7 @@ def extract_first_fenced_code_block(text: str) -> Optional[str]:
 
     Args:
         text (str): The input text to search for a fenced code block.
+        last (bool): Extract the last code block if True, otherwise the first.
 
     Returns:
         Optional[str]: The content of the first fenced code block, or None if not found.
@@ -185,7 +186,8 @@ def extract_first_fenced_code_block(text: str) -> Optional[str]:
         r"""(?m)^(?P<fence>`{3,})(?P<lang>\w+)?\n(?P<code>.*?)^(?P=fence)(?=\n|$)""",
         re.DOTALL,
     )
-    match = pattern.search(text)
-    if match:
+    matches = list(pattern.finditer(text))
+    if matches:
+        match = matches[-1] if last else matches[0]
         return match.group("code")
     return None

did I miss anything in adding this feature?

Response:

It looks like you covered almost everything needed for the “extract last code block” feature—both the CLI wiring and the actual extraction logic. Here are a few small details worth double-checking or clarifying:

Docstring Consistency
• In utils.py, the docstring for extract_fenced_code_block still refers mostly to extracting the “first” fenced code block. It would be good to adjust that text so it clearly mentions extracting either the first or the last block, depending on the last parameter.
Handling of Both --extract and --extract-last
• If a user passes both --extract and --extract-last together, the code first extracts the first block, then immediately extracts the last block from what is left. You might want to clarify whether that’s the intended behavior or if only one flag should ever be “active” at a time.
Potential Variation in Code Fences
• Right now, you only match triple-backtick fences (), and only if the language is attached directly (e.g. python). If you need to handle either tildes (~~~) or a space between the backticks and the language name (``` python), you’ll need a slightly more flexible regex. This may or may not matter for your use case, but it’s something to be aware of.
Documentation / Help Text
• You’ve added the new --xl/--extract-last flag, but if there’s a README or other documentation, you’ll want to mention it there too.

Other than those points, the patch looks consistent: you’ve added a new Template field (extract_last), updated both prompt and logs_list commands to take the new flag and store it, and updated the regex function to handle first-vs-last extraction. Those are the main moving parts, so you should be all set once you pin down the small details above.

simonw · 2025-01-24T18:29:14Z

That cost 1,504 input, 2,250 output, {"completion_tokens_details": {"reasoning_tokens": 1856}} - 15.756 cents.

simonw · 2025-01-24T18:31:35Z

I'll make it so --extract --extract-list is treated the same as --extract-last.

Refs #717 (comment)

simonw · 2025-01-24T18:54:54Z

Updated help and docs:

Refs #717, #728

simonw added the enhancement New feature or request label Jan 24, 2025

simonw added a commit that referenced this issue Jan 24, 2025

Initial implementation of --xl/--extract-last, refs #717

1b2381a

simonw mentioned this issue Jan 24, 2025

--xl/--extract-last #718

Merged

simonw added a commit that referenced this issue Jan 24, 2025

Updated docstring

04caf20

Refs #717 (comment)

simonw added a commit that referenced this issue Jan 24, 2025

--xl/--extract-last in docs, refs #717

256027f

simonw linked a pull request Jan 24, 2025 that will close this issue

--xl/--extract-last #718

Merged

simonw closed this as completed in #718 Jan 24, 2025

simonw closed this as completed in 656d8fa Jan 24, 2025

simonw added a commit that referenced this issue Jan 31, 2025

Release 0.21

f8dcc67

Refs #717, #728

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`--xl/--extract-last` option #717

`--xl/--extract-last` option #717

simonw commented Jan 24, 2025

simonw commented Jan 24, 2025

simonw commented Jan 24, 2025

simonw commented Jan 24, 2025

simonw commented Jan 24, 2025

simonw commented Jan 24, 2025 •

edited

Loading

--xl/--extract-last option #717

--xl/--extract-last option #717

Comments

simonw commented Jan 24, 2025

simonw commented Jan 24, 2025

simonw commented Jan 24, 2025

2025-01-24T17:54:26 conversation: 01jjcq8q9ety0wxa6scdcmgz2a

Prompt:

Response:

simonw commented Jan 24, 2025

simonw commented Jan 24, 2025

simonw commented Jan 24, 2025 • edited Loading

`--xl/--extract-last` option #717

`--xl/--extract-last` option #717

simonw commented Jan 24, 2025 •

edited

Loading