Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] - Limit displayed document pills in header #224

Open
peachkeel opened this issue Dec 2, 2023 · 5 comments
Open

[ENH] - Limit displayed document pills in header #224

peachkeel opened this issue Dec 2, 2023 · 5 comments
Labels

Comments

@peachkeel
Copy link
Contributor

Feature description

When there are a lot of documents underlying a chat, the chat header becomes unwieldy. It might be better to display the total number of documents underlying the chat in the header than all of their filenames.

header

Value and/or benefit

Removing the chat_document_pills greatly improves client-side performance when dealing with large corpora.

Anything else?

I've had to comment out the following lines of code to make my system performant:

if (
self.current_chat is not None
and "metadata" in self.current_chat
and "documents" in self.current_chat["metadata"]
):
doc_names = [d["name"] for d in self.current_chat["metadata"]["documents"]]
for doc_name in doc_names:
pill = pn.pane.HTML(
f"""<div class="chat_document_pill">{doc_name}</div>""",
stylesheets=[
"""
:host {
background-color: rgb(241,241,241);
margin-top: 15px;
margin-left: 5px;
margin-right: 5px;
padding: 5px 15px;
border-radius: 10px;
color:var(--accent-color);
}
"""
],
)
chat_documents_pills.append(pill)

@peachkeel peachkeel added the type: enhancement 💅 New feature or request label Dec 2, 2023
@pmeier
Copy link
Member

pmeier commented Dec 4, 2023

Remove chat_document_pills from header

That is not going to happen. Having the documents used for the chat visible by default is intended. Instead what we should do in this case is truncate the number of visible documents. The full number is still visible when clicking the chat info button.

When there are a lot of documents underlying a chat, the chat header becomes unwieldy. It might be better to display the total number of documents underlying the chat in the header than all of their filenames.

This is special to your use case of having the whole corpus of documents active at once. In there I would even say the number of documents shouldn't even displayed as it provides no value to the user.

It becomes more and more clear that we need to support this use case in general. Will open an issue about this soon.

image

It seems that you are using .doc documents converted to .txt in your corpus. Would it help if Ragna supported .doc / .docx out of the box? Any other formats that are needed? We have a long list in #202 (reply in thread) although I'm against adding support for everything listed there.

@pmeier pmeier changed the title [ENH] - Remove chat_document_pills from header [ENH] - Limit displayed document pills in header Dec 4, 2023
@peachkeel
Copy link
Contributor Author

peachkeel commented Dec 4, 2023

Your take on the situation sounds reasonable. My colleague, @Tengal-Teemo, plans to post some of his insights into the performance of Ragna's UI in the discussion section sometime this week. Those insights might be helpful in making sure the UI stays responsive across a variety of conditions.

As far as data connectors are concerned, .doc support would be great in general. For our specific use-cases, though, most of the document preprocessing is already standardized and done. Thus, we're mainly using Ragna to help prototype and do discovery in the middle of these preexisting workflows. Honestly, ingesting text in JSONL format would probably be a nice feature from our perspective:

{"text": "..."}
{"text": "..."}
{"text": "..."}

See: https://github.com/leogao2/lm_dataformat

@pmeier
Copy link
Member

pmeier commented Dec 4, 2023

post some of his insights into the performance of Ragna's UI in the discussion section sometime this week. Those insights might be helpful in making sure the UI stays responsive across a variety of conditions.

Thanks a ton 🚀

As far as data connectors are concerned, .doc support would be great in general.

I've opened #225.

ingesting text in JSONL format would probably be a nice feature from our perspective

Could you open an issue for that. I'm not familiar with the format.

@pmeier
Copy link
Member

pmeier commented Dec 4, 2023

We also need some handling for the popup:

image

Here we shouldn't truncate, but rather provide a scrollable view.

@pmeier
Copy link
Member

pmeier commented Dec 6, 2023

I've added a hard limit for 20 visible documents in #235. This doesn't solve any of the graphics issues raised here, but at least prevents performance hits when one is using a large number of documents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants