Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Unify case sensitive topic names #733

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

preetmishra
Copy link
Member

@preetmishra preetmishra commented Jul 23, 2020

This unifies case sensitive topic names using lowercase topic names as an invariant.

Crux

We need some kind of invariant which we can rely upon for lookups and comparisons given that topics can change their casing at any point. Consequently, I propose to use lowercase topics as keys wherever store/index data.

Commits

The current commit structure is temporary. I have fixed one thing per commit to represent how I went about the changes but we would definitely want to squash them (except the first) before merging.

I would greatly appreciate feedback about the proposal and what else should be fixed.

Copy link
Collaborator

@neiljp neiljp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@preetmishra This seems like a good first step (and the refactoring exposes a potential edge case bug?), but there various points that we need to address, as demonstrated by manual testing. This is not a complete list, but for example:

  • If a user sends to case and CASE then two separate unread counts appear in the topic list, and only one appears in the message list if you're in that topic narrow
  • I have an existing instance which triggered this where there are unreads in topics with two 'different' names (by case), both with unreads on czo, but your code is not combining them.
  • Editing a message topic which matches by case causes it to disappear, but not appear in a new/different topic

Essentially, while fetching messages goes through index_messages, there are other situations we need to consider - anywhere where we compare topics, which is potentially a lot of different places.

Perhaps fundamentally, if we get an update to a topic (eg. edited latest message, or new messge), should we change the topic of every message we have stored? Or just update the rendering?

topics_in_stream[msg['subject']] = set()
topics_in_stream[msg['subject']].add(msg['id'])
if msg['type'] == 'stream' and len(narrow) == 2:
narrow_topic = narrow[1][1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

narrow_topic vs narrow_topics?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The narrow can only have one topic, right?

zulipterminal/helper.py Show resolved Hide resolved
@neiljp neiljp requested a review from sumanthvrao July 23, 2020 16:25
@zulipbot zulipbot added the size: L [Automatic label added by zulipbot] label Jul 24, 2020
@preetmishra
Copy link
Member Author

@neiljp Thanks for the review and the pointers! 👍

I have reworked the fundamental approach that I had to now store lowercase topic names for lookups and comparisons (see #733 (comment)). I have also addressed the three issues that you reported.

@preetmishra preetmishra changed the title Index case insensitive topic names by narrowed topic Unify case sensitive topic names Jul 25, 2020
zulipterminal/model.py Outdated Show resolved Hide resolved
@preetmishra preetmishra changed the title Unify case sensitive topic names [WIP] Unify case sensitive topic names Jul 27, 2020
@neiljp neiljp added this to the Release after upcoming milestone Jul 27, 2020
@zulipbot zulipbot added size: XL [Automatic label added by zulipbot] and removed size: L [Automatic label added by zulipbot] labels Jul 27, 2020
@preetmishra
Copy link
Member Author

Updated with improved commits, more comments and test amendments (except which are related to muted topics).

@preetmishra preetmishra changed the title [WIP] Unify case sensitive topic names Unify case sensitive topic names Jul 27, 2020
@preetmishra preetmishra added the PR needs review PR requires feedback to proceed label Jul 27, 2020
@preetmishra
Copy link
Member Author

Updated to resolve conflicts.

@preetmishra preetmishra changed the title Unify case sensitive topic names [AWAITING] Unify case sensitive topic names Aug 6, 2020
@preetmishra preetmishra removed the PR needs review PR requires feedback to proceed label Aug 6, 2020
@preetmishra preetmishra changed the title [AWAITING] Unify case sensitive topic names [WIP] Unify case sensitive topic names Aug 14, 2020
This extracts msg_topic and narrow_topics as variables and amends the
conditional accordingly.
The intent is to use lowercase topic names, as an invariant, in the data
structures that we locally use to keep track of topics and its metadata
(e.g. unread count).

canonicalize_topic() and compare_lowercase() are added as helpers.

Tests amended.
@preetmishra preetmishra changed the title [WIP] Unify case sensitive topic names Unify case sensitive topic names Aug 21, 2020
@preetmishra preetmishra added the PR needs review PR requires feedback to proceed label Aug 21, 2020
@neiljp
Copy link
Collaborator

neiljp commented Aug 30, 2020

@preetmishra This looks to cover a good number of comparison cases; this was blocked on another PR?

To clarify, this uses:

  • lower case in internal structures (that seems clearest)
  • "latest" topic names in display? (last-but-one commit references that?)
  • the 'real' case in what remaining places? (for display ^)

I think this will be clearer with the now-merged #675 and when the topic list updates with something like #785, as we should be able to test internally more easily.

This looks good, though I've not dug into all the cases so far. Pending further review, this seems reasonable - my concern is whether we might consider locally handling topics with ids, which may simplify this issue - ie. each topic_id in a stream and (stream_id, topic_id) would be unique, and so we can have a 'latest name' (for display) for each id, and the comparisons would all occur at the point where topic names are converted to ids.

The first commit seems like a separate cleanup, is that correct?

@neiljp neiljp removed the PR needs review PR requires feedback to proceed label Aug 30, 2020
@preetmishra preetmishra changed the title Unify case sensitive topic names [WIP] Unify case sensitive topic names Aug 30, 2020
@neiljp neiljp modified the milestones: 0.6.0, Release after next Jan 28, 2021
Base automatically changed from master to main January 30, 2021 20:30
@zulipbot
Copy link
Member

Heads up @preetmishra, we just merged some commits that conflict with the changes your made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the upstream/main branch and resolve your pull request's merge conflicts accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has conflicts size: XL [Automatic label added by zulipbot]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants