Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored report related functionalities #3885

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

user12986714
Copy link
Contributor

Refactored report related functionalities to reduce redundancy in code. Also implemented: issue #3712 and issue #2795 .

Refactored report related functionalities to reduce redundancy in code. Also implemented: issue Charcoal-SE#3712 and issue Charcoal-SE#2795 .
@user12986714 user12986714 force-pushed the patch-21 branch 8 times, most recently from 6cac5fb to c8e4fa4 Compare May 26, 2020 18:50
@makyen
Copy link
Contributor

makyen commented May 26, 2020

I need to look at this more closely, but wanted to pose a general question:

The prior organization was that there was a function report_posts() which took a list of post URLs, along with some state/control info, and then did everything needed to report those posts.

The report function was a wrapper which decoded the !!/report, !!/scan, etc. and did some validation.

The allspam() function figured out which posts (i.e. the URLs) were owned by the user, potentially on multiple sites, and did a bit of validation. Once it had a list of URLs, it could have just passed them all to report_posts() as a single call and let the list be handled, just like any other set of URLs. It didn't need to fetch the data or construct a post record, it just needed the URLs.

So far, this appears to be heading in the direction of still requiring allspam to collect all that information. Why do that instead of just passing a list of URLs from both report and allspam to report_posts?

@user12986714
Copy link
Contributor Author

(@makyen ) because allspam does not get a list of urls. It first get a list of accounts a user has, and for each account, it get a list of post_data directly, rather than urls of them. Although it is possible to extract url from post_data, this is a waste of api requests.

@makyen
Copy link
Contributor

makyen commented May 26, 2020

Yes, and no. It still makes a single request for each answer, which are a significant percentage of the posts reported. Admittedly, it could reduce that by making one request for all the answers by the user. But, frankly, the reduction in duplicated code seems worth what is small number of redundant API requests.

@makyen
Copy link
Contributor

makyen commented May 26, 2020

Hmmm... It may be that I'm reacting to the fact that we're manually constructing the post record in at least three different files, without any actual coordination between them and the only documentation as to what it is to contain held in apigetpost.py or by looking at the filters for the SE API calls.

If you want to retain getting most/all of the post data from the call to the users/{}/posts endpoint, which isn't a bad choice, then it's better to refactor it to be in apigetpost.py, or some other place that encapsulates generating a Post record.

@user12986714
Copy link
Contributor Author

user12986714 commented May 26, 2020

Redundant code removed. Hopefully we won't have more than 10 allspam commands a day, and for those there won't be more than 10 posts each. Hence the trade off in API requests is quite reasonable.

@makyen
Copy link
Contributor

makyen commented May 26, 2020

Well, if we're really concerned about optimizing API requests here, then we could implement something like get_posts_for_user in apigetpost.py, or it's successor. We could then also consolidate getting the data for answers down to a single request per user/site/100 posts. We could also have a get_posts_for_urls, where we take a list of URLs and optimize API requests by sorting by SE site and grouping by questions and answers.

Overall, there have only been on the order of 550 !!/allspam or !!/reportuser commands executed over the last 4.5 years, which is << 10/day. :)

Thanks for taking care of it. I'm sorry if my comments came off a bit negative. I allowed my frustration with things which are not your responsibility to leak through, which I shouldn't have permitted.

@user12986714
Copy link
Contributor Author

Stuff to add to refdoc.md:

# File chatcommands.py  
## Chat commands  
Chat commands are funtions in chatcommands.py that are under @command decorator. Those not are assumed private to this file unless otherwise noted.  
### Implementation  
#### Details  
- `report(msg, args, alias_used="report")`: Decide if user is not privileged and `alias_used` is not `"scan"`. If yes, raise `CmdException` with non-privileged warning.
Decide if user can report or scan now. If not, raise `CmdException` asking user to wait.
Split `args` into urls and the custom comment, and store the result in `argsraw`. Seperate urls into `urls`.
Generate `message_url` by substituting in host, room id and message id.
Try to extract the custom comment into `custom_reason`. If failed, set `custom_reason` to `None`.
If there are more than `5` urls in `urls`, raise `CmdException` indicating one can at most report `5` urls together.
Set `output` to `[]`.
For each `url` in `urls`, call `report_post()` with proper arguments, and store the return value into `current_output`;
if `current_output` is not empty, append it to `output`.
Decide if `output` is non empty.
If yes, decide if the number of urls is greater than both `1` and count of elements in `output`.
If yes, call `add_or_update_multiple_reporter()` with proper arguments.
Return elements of `output` joined by `"\n"`.
- `allspam(msg, url, alias_used="allspam")`: Decide if `alias_used` is in the set `{"report_user", "allspam"}`.
If yes, set operation to `"report"`. Otherwise set operation to `"report-force"`.
Set `custom_reason` to `None`.
Set `api_key` to a proper value.
Decide if user can issue allspam now. If not, raise `CmdException` asking user to wait.
Call `get_user_from_url()` with `url` as the argument, and store the return value into `user`.
Decide if `user` is `None`. If yes, raise `CmdException` indicating an invalid url.
Set `user_sites` and `user_post_urls` as `[]`.
Decide if the user profile is a network wide one. If yes, use proper methods to get a list of site specific profile and store into `user_sites`.
Otherwise append the specific site to `user_sites`.
For the user profile at each site, use proper API calls and extract a list of `post_data`, and raise `CmdException` if the user has reputation higher than `100`;
for each `post` in `post_data`, extract its url and append it to `user_post_urls`.
Decide if `user_post_urls` is empty. If yes, raise `CmdException` indicating the user has no post.
Decide if `user_post_urls` has more than `15` elements. If yes, raise `CmdException` indicating there are too many posts by the user.
Set `output` to `[]`.
For each `current_url` in `user_post_urls`, call `report_post()` with proper arguments, and store the return value into `current_output`;
if `current_output` is not empty, append it to `output`.
Decide if `user_post_urls` has more than `2` elements. If yes, call `add_or_update_multiple_reporter()` with proper arguments.

(to be continued)

@user12986714
Copy link
Contributor Author

user12986714 commented May 26, 2020

(continued)
Removed for formatting

@user12986714
Copy link
Contributor Author

user12986714 commented May 26, 2020

Note: it is better to review the overall file changes and to use the split view. git diff is doing a terrible job on this pr and is only making things more confusing (especially on individual commits).

chatcommands.py Show resolved Hide resolved
chatcommands.py Outdated
@@ -1752,6 +1752,16 @@ def allspam(msg, url, alias_used="allspam"):

def report_post(url, reported_by, reported_in=None, blacklist_by=None, operation="report", custom_reason=None):
""" Scan or report a single post """
# Arguments:
# - url: url to the post being reported
# - reported_by: user reporting the post
Copy link
Contributor

@makyen makyen Jun 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is reported_by? A string? A dict? Similar information should be provided for the other arguments too. This should provide enough information such that someone could understand what they need to pass to the function, without them having to look through the function to figure out how it's used.

@user12986714
Copy link
Contributor Author

## Chat command utils  
Functions used by various chat commands. They are private to this file.
### Public interface
- report_post(url, reported_by, reported_in, blacklist_by, operation="report", custom_reason=None): report a post.
  - Arguments:
    - url: url to the post being reported (string)
    - reported_by: user reporting the post (msg.owner where msg is ChatExchange message object)
    - reported_in: room in which the post is reported (None, True if from MS API or integer that is the room id)
    - blacklist_by: the chat transcript url of the report chat msg (None or string)
    - operation: Operation to perform (string, must be in {"scan", "scan-force", "report", "report-force", "report-direct"})
    - custom_reason: a custom reason why to report the post (None or string)
### Implementation
Chat command  are implemented independently per function. See individual documentation.
#### Details  
- `report_post(url, reported_by, reported_in=None, blacklist_by=None, operation="report", custom_reason=None)`: Private to the file.
Extract `reporter_name` from `reported_by.name`. Set `operation` to `"report"` if not have been set.
If `operation` is in set `{"scan-force", "report-force", "report-direct"}`, set `is_forced` to `True` and otherwise to `False`.
Use proper method to format `report_info` string with action performed, by whom, from where and the custom reason.
Generate shortlink with `url_to_shortlink()` from `url` and store into `shortlink`.
Decide if `shortlink` is empty. If yes, return an indication that an invalid url is provided.
Get post data from `shortlink` with `api_get_post()` and store into `post_data`.
Decide if `post_data` is `None`. If yes, return an indication that an invalid url is provided.
Decide if `post_data` is `False`. If yes, return an indication that no data is fetched and the post is likely deleted.
Read `post_data.post_url` into `abs_url`. Call `to_protocal_relative()` with argument `abs_url`, and storing the return value into `url`.
Decide if the post has already been reported recently. If yes, decide if metasmoke is integrated;
if yes, post `custom_reason` as a metasmoke comment, if present, and return an indication that the post is reported providing a link to the metasmoke report;
otherwise, return an indication that the post is reported.
Generate post from `post_data` with `Post()` and store into `post`. Extract the auther of the post from `post_data.owner_url` into `user`.
Decide if the post is an answer. If yes, use proper method to fetch its parent post.
Scan the post.
Decide if `operation` is in the set `{"report", "report-force", "report-direct"}`. If yes, blacklist the user.
Decide if the post is manually ignored, and `operation` is not `"scan"`. If yes, adjust the scan result accordingly.
Decide if the post is spam, and `operation` is not `"report-direct"`. If yes, handle the spam and post `custom_reason` as a metasmoke comment, if present.
Decide if `operation` is in the set `{"report", "report-force", "report-direct"}`. If yes, handle the spam using the manually reporting reason,
and post `custom_reason` as a metasmoke comment, if present.
Decide if the post is ignored. If yes, return an indication that the post is likely spam but manually ignored. Otherwise return an indication that the post is unlikely spam.

@makyen
Copy link
Contributor

makyen commented Jun 10, 2020

At least from my point of view, the only thing needed to get this PR merged is to have a similar level of detail (or more) in the docstring for report_post() as there was prior to this change. Basically, the docstring should state the type of each parameter or return value, so that anyone calling it will know if what they are passing to it is what is expected by the function.

@user12986714
Copy link
Contributor Author

Please ignore CircleCI failure. The only change is in comments and I can be almost certain that this is a network issue on CircleCI server.

# if the post wasn't previously reported for the cases where we want to process it
# as such.
# Expand real scan results from dirty returm value when not "!!/scan"
# Presence of "scan_why" indicates the post IS spam but ignored
Copy link
Contributor

@makyen makyen Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please return these comments to being in the code.

The code performs a unintuitive manipulation of the data here, because what's provided by check_if_spam()'s return value has a couple of different formats, which are really not very clear. I found it frustrating to have to go figure out what the various different posibilities were which it could return in order to have the code do what it's supposed to here. I added the above comment so that someone else doesn't need to repeat that same work.

@makyen
Copy link
Contributor

makyen commented Jul 1, 2020

First, sorry for the delay in getting back to this. I finally did more testing (starting here). As you can see for the !!/allspam, the report reason transitioned from "Manually reported question" to "Blacklisted user". It would be preferable that the reporting reason remains "Manually reported question" for all of the reports. The code that's calling report_post in allspam is either going to need to be able to know if the user was/was not blacklisted prior to starting, and then telling report_post to ignore the blacklisting on all the to be reported posts, or it's going to need to be able to tell report_post not to blacklist the user until the last one that it is reporting on this SE site.

In addition, it was a helpful feature that !!/allspam indicates that the reports are part of a sequence and the post's number within that sequence. It would be nice to have a way for that to be in the chat message/report. This could be accomplished by allowing the call to report_post to have a parameter which is a generic additional portion of text which is included in the report/chat message. Alternately, a count and total could be passed to report_post, but that seems like it's making report_post more specialized, rather than the generic "include this text" and allowing the calling function to figure out what that text should be.

@stale
Copy link

stale bot commented Aug 3, 2020

This issue has been closed because it has had no recent activity. If this is still important, please add another comment and find someone with write permissions to reopen the issue. Thank you for your contributions.

@stale stale bot closed this Aug 3, 2020
@makyen makyen added the status: confirmed Confirmed as something that needs working on. label Aug 3, 2020
@makyen makyen reopened this Aug 3, 2020
@stale stale bot removed the status: stale label Aug 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: confirmed Confirmed as something that needs working on.
Development

Successfully merging this pull request may close these issues.

2 participants