Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't sort by sentiment in sentiment_by #13

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vspinu
Copy link
Contributor

@vspinu vspinu commented Apr 13, 2016

Usually grouping operators preserve the order of the groups. In this sense the behavior of sentiment_by is non-intuitive. Especially with data.table by would better have the same semantics as in data.table.

While sorting by sentiment is potentially useful in some contexts, there is virtually no advantage for it with current plotting functionality. Plotting by grouping values is probably more meaningful in majority of the cases.

If they so desire, users can always sort by sentiment with a simple setkey. Sorting by keys is more verbose and results in unnecessary groups names replication.

For these reasons I would propose to remove this "feature".

@trinker
Copy link
Owner

trinker commented May 7, 2016

Thanks for the PR and interest in sentimentr...

Still debating on this. I use sentimentr for analysis and usually care most about the order of the output. Just like with plots, tables shine when they are ordered by the variable of interest. I think you have a specific reason of why it's screwing something up for you but am unsure what that is. I would appreciate a stronger argument for changing the behavior. I'd be more inclined to add this as an argument but retain the default behavior.

@vspinu
Copy link
Contributor Author

vspinu commented May 8, 2016

I don't have a "stronger" argument than the one I have already said. If I do processing by a factor variable G, then I expect my output to be ordered by original order of data or by the order of the grouped factor. That's what all group_by processor do in R. The fact that the output is sorted by something else is unexpected unexpected. Groups are almost certainly have a meaningful order so, I think, preserving that order is desirable independently of the context.

In my case the order is given by time. I have thousands sentences which are grouped by messages which in turn are ordered in time. Sorting it by sentiment doesn't make much sense AFAIC.

I'd be more inclined to add this as an argument but retain the default behavior.

It's up to you to decide of course, but I think sorting is an 'extra" feature. It adds up to the default computation, so it should an extra optional option. Otherwise users would need to revert that sorting computation.

@trinker
Copy link
Owner

trinker commented Sep 14, 2016

I have been bitten by this twice this week. You're right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants