Replies: 1 comment 1 reply
-
Hey @pfurovYnP thanks for your feedback! Very helpful. I'll address your points one-by-one:
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Describe the feature or potential improvement
Sometimes, the best scoring tool is human comparison so being able to inspect what your model generated Side by Side with expected output or output from another run would be helpful. As for now, you can only see previews side-by-side without any markdown support
I think, this evolves into being able to score each generation of each run in the same view as well
Scenario: you have an llm based system that creates telegram posts, by transforming the big input into a summarized post. You are using Langfuse to trace the system but you also want to evaluate it and see how the post would change if you use another version of your prompt.
Additional information
No response
Beta Was this translation helpful? Give feedback.
All reactions