Feature specs discussion board for umBRELA #1

UShivani3 · 2024-05-04T20:10:47Z

I am starting this thread for feature spec discussion for umBRELA @lintool @ronakice.

Suggestions from my side:

parameter for specifying the number of samples for inference and later performing voting to get majority results.
Maybe add some additional instructions to the prompt to guide the LLM. We already have an option of specifying a prompt file, so I'm not sure how useful this could be.
If the input dictionary already includes a relevance label, we can add a key called correctness of the label in output. This could be a feature for verifying already available relevance assessments.

ronakice · 2024-05-05T02:03:07Z

@UShivani3 can you give some demo usage here so @lintool is aware of the exacts of the framework so far? Some snippets.

UShivani3 · 2024-05-05T02:33:36Z

Yes, my bad!

Here is the snippet

Setting up the model judge:

from umbrela.vicuna_judge import VicunaJudge

judge_vicuna = VicunaJudge("dl19-passage")

Passing qrel-passages for evaluations:

input_dict = {
    "query": {"text": "how long is life cycle of flea", "qid": "264014"},
    "candidates": [
        {
            "doc": {
                "segment": "The life cycle of a flea can last anywhere from 20 days to an entire year. It depends on how long the flea remains in the dormant stage (eggs, larvae, pupa). Outside influences, such as weather, affect the flea cycle. A female flea can lay around 20 to 25 eggs in one day."
            },
            "docid": "4834547",
            "score": 14.971799850463867,
        },
    ]
}

judgments = judge_vicuna.judge(input_dict)

Output format for each judgment:

judgment = {
          "model": model_name,
          "query": query,
          "passage": passage,
          "prompt": prompt,
          "prediction": model_response,
          "judgment": relevance_label_after_parsing_model_response,
          }

I have also added a sample code using OSLLMJudge class here: https://github.com/castorini/umbrela/blob/main/src/eval/test.py.

ronakice · 2024-05-06T16:39:59Z

@thakur-nandan can you give your thoughts on the design so far too?

thakur-nandan · 2024-05-06T21:52:33Z

Sure, thanks @UShivani3, overall I like the minimalistic code and easy-to-use repository design. Both prompts look good. The installation instructions in the README are helpful.

One suggestion I have is to decouple the prompt with LLM judge code, This will in the future complicate as one would need to keep on updating the base LLMJudge shown below with newer prompts as shown below:

umbrela/src/umbrela/llm_judge.py

Line 21 in 05ae426

if prompt_type:

How I think we can restructure the design:

PromptTemplate class: This will take in prompt_type and prompt_file and fewshot_count as the input and output any prompt we like (either bing or basic) for the query-passage pair.
LLMJudge Class: This will take in the prompt from the PromptTemplate class as the input and output the relevance judgment.

@ronakice @UShivani3 would be happy to take your suggestions.

thakur-nandan · 2024-05-06T22:00:50Z

One more question: @UShivani3 what does the score in the input_dict signify? Is this a retrieval/reranking score?

Does it affect the LLMJudge response?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature specs discussion board for umBRELA #1

Feature specs discussion board for umBRELA #1

UShivani3 commented May 4, 2024

ronakice commented May 5, 2024

UShivani3 commented May 5, 2024 •

edited

Loading

ronakice commented May 6, 2024

thakur-nandan commented May 6, 2024

thakur-nandan commented May 6, 2024

Feature specs discussion board for umBRELA #1

Feature specs discussion board for umBRELA #1

Comments

UShivani3 commented May 4, 2024

ronakice commented May 5, 2024

UShivani3 commented May 5, 2024 • edited Loading

Setting up the model judge:

Passing qrel-passages for evaluations:

Output format for each judgment:

ronakice commented May 6, 2024

thakur-nandan commented May 6, 2024

thakur-nandan commented May 6, 2024

UShivani3 commented May 5, 2024 •

edited

Loading