Refactored report related functionalities #3885

user12986714 · 2020-05-26T17:07:52Z

Refactored report related functionalities to reduce redundancy in code. Also implemented: issue #3712 and issue #2795 .

Refactored report related functionalities to reduce redundancy in code. Also implemented: issue Charcoal-SE#3712 and issue Charcoal-SE#2795 .

makyen · 2020-05-26T19:05:32Z

I need to look at this more closely, but wanted to pose a general question:

The prior organization was that there was a function report_posts() which took a list of post URLs, along with some state/control info, and then did everything needed to report those posts.

The report function was a wrapper which decoded the !!/report, !!/scan, etc. and did some validation.

The allspam() function figured out which posts (i.e. the URLs) were owned by the user, potentially on multiple sites, and did a bit of validation. Once it had a list of URLs, it could have just passed them all to report_posts() as a single call and let the list be handled, just like any other set of URLs. It didn't need to fetch the data or construct a post record, it just needed the URLs.

So far, this appears to be heading in the direction of still requiring allspam to collect all that information. Why do that instead of just passing a list of URLs from both report and allspam to report_posts?

user12986714 · 2020-05-26T19:08:47Z

(@makyen ) because allspam does not get a list of urls. It first get a list of accounts a user has, and for each account, it get a list of post_data directly, rather than urls of them. Although it is possible to extract url from post_data, this is a waste of api requests.

makyen · 2020-05-26T19:15:26Z

Yes, and no. It still makes a single request for each answer, which are a significant percentage of the posts reported. Admittedly, it could reduce that by making one request for all the answers by the user. But, frankly, the reduction in duplicated code seems worth what is small number of redundant API requests.

makyen · 2020-05-26T19:42:49Z

Hmmm... It may be that I'm reacting to the fact that we're manually constructing the post record in at least three different files, without any actual coordination between them and the only documentation as to what it is to contain held in apigetpost.py or by looking at the filters for the SE API calls.

If you want to retain getting most/all of the post data from the call to the users/{}/posts endpoint, which isn't a bad choice, then it's better to refactor it to be in apigetpost.py, or some other place that encapsulates generating a Post record.

user12986714 · 2020-05-26T20:18:46Z

Redundant code removed. Hopefully we won't have more than 10 allspam commands a day, and for those there won't be more than 10 posts each. Hence the trade off in API requests is quite reasonable.

makyen · 2020-05-26T20:44:48Z

Well, if we're really concerned about optimizing API requests here, then we could implement something like get_posts_for_user in apigetpost.py, or it's successor. We could then also consolidate getting the data for answers down to a single request per user/site/100 posts. We could also have a get_posts_for_urls, where we take a list of URLs and optimize API requests by sorting by SE site and grouping by questions and answers.

Overall, there have only been on the order of 550 !!/allspam or !!/reportuser commands executed over the last 4.5 years, which is << 10/day. :)

Thanks for taking care of it. I'm sorry if my comments came off a bit negative. I allowed my frustration with things which are not your responsibility to leak through, which I shouldn't have permitted.

user12986714 · 2020-05-26T21:00:32Z

Stuff to add to refdoc.md:

# File chatcommands.py  
## Chat commands  
Chat commands are funtions in chatcommands.py that are under @command decorator. Those not are assumed private to this file unless otherwise noted.  
### Implementation  
#### Details  
- `report(msg, args, alias_used="report")`: Decide if user is not privileged and `alias_used` is not `"scan"`. If yes, raise `CmdException` with non-privileged warning.
Decide if user can report or scan now. If not, raise `CmdException` asking user to wait.
Split `args` into urls and the custom comment, and store the result in `argsraw`. Seperate urls into `urls`.
Generate `message_url` by substituting in host, room id and message id.
Try to extract the custom comment into `custom_reason`. If failed, set `custom_reason` to `None`.
If there are more than `5` urls in `urls`, raise `CmdException` indicating one can at most report `5` urls together.
Set `output` to `[]`.
For each `url` in `urls`, call `report_post()` with proper arguments, and store the return value into `current_output`;
if `current_output` is not empty, append it to `output`.
Decide if `output` is non empty.
If yes, decide if the number of urls is greater than both `1` and count of elements in `output`.
If yes, call `add_or_update_multiple_reporter()` with proper arguments.
Return elements of `output` joined by `"\n"`.
- `allspam(msg, url, alias_used="allspam")`: Decide if `alias_used` is in the set `{"report_user", "allspam"}`.
If yes, set operation to `"report"`. Otherwise set operation to `"report-force"`.
Set `custom_reason` to `None`.
Set `api_key` to a proper value.
Decide if user can issue allspam now. If not, raise `CmdException` asking user to wait.
Call `get_user_from_url()` with `url` as the argument, and store the return value into `user`.
Decide if `user` is `None`. If yes, raise `CmdException` indicating an invalid url.
Set `user_sites` and `user_post_urls` as `[]`.
Decide if the user profile is a network wide one. If yes, use proper methods to get a list of site specific profile and store into `user_sites`.
Otherwise append the specific site to `user_sites`.
For the user profile at each site, use proper API calls and extract a list of `post_data`, and raise `CmdException` if the user has reputation higher than `100`;
for each `post` in `post_data`, extract its url and append it to `user_post_urls`.
Decide if `user_post_urls` is empty. If yes, raise `CmdException` indicating the user has no post.
Decide if `user_post_urls` has more than `15` elements. If yes, raise `CmdException` indicating there are too many posts by the user.
Set `output` to `[]`.
For each `current_url` in `user_post_urls`, call `report_post()` with proper arguments, and store the return value into `current_output`;
if `current_output` is not empty, append it to `output`.
Decide if `user_post_urls` has more than `2` elements. If yes, call `add_or_update_multiple_reporter()` with proper arguments.

(to be continued)

user12986714 · 2020-05-26T21:27:16Z

(continued)
Removed for formatting

user12986714 · 2020-05-26T21:41:39Z

Note: it is better to review the overall file changes and to use the split view. `git diff` is doing a terrible job on this pr and is only making things more confusing (especially on individual commits).

chatcommands.py

makyen · 2020-06-08T02:02:44Z

chatcommands.py

@@ -1752,6 +1752,16 @@ def allspam(msg, url, alias_used="allspam"):

 def report_post(url, reported_by, reported_in=None, blacklist_by=None, operation="report", custom_reason=None):
    """ Scan or report a single post """
+    # Arguments:
+    # - url: url to the post being reported
+    # - reported_by: user reporting the post


What is reported_by? A string? A dict? Similar information should be provided for the other arguments too. This should provide enough information such that someone could understand what they need to pass to the function, without them having to look through the function to figure out how it's used.

user12986714 · 2020-06-08T14:56:15Z

## Chat command utils  
Functions used by various chat commands. They are private to this file.
### Public interface
- report_post(url, reported_by, reported_in, blacklist_by, operation="report", custom_reason=None): report a post.
  - Arguments:
    - url: url to the post being reported (string)
    - reported_by: user reporting the post (msg.owner where msg is ChatExchange message object)
    - reported_in: room in which the post is reported (None, True if from MS API or integer that is the room id)
    - blacklist_by: the chat transcript url of the report chat msg (None or string)
    - operation: Operation to perform (string, must be in {"scan", "scan-force", "report", "report-force", "report-direct"})
    - custom_reason: a custom reason why to report the post (None or string)
### Implementation
Chat command  are implemented independently per function. See individual documentation.
#### Details  
- `report_post(url, reported_by, reported_in=None, blacklist_by=None, operation="report", custom_reason=None)`: Private to the file.
Extract `reporter_name` from `reported_by.name`. Set `operation` to `"report"` if not have been set.
If `operation` is in set `{"scan-force", "report-force", "report-direct"}`, set `is_forced` to `True` and otherwise to `False`.
Use proper method to format `report_info` string with action performed, by whom, from where and the custom reason.
Generate shortlink with `url_to_shortlink()` from `url` and store into `shortlink`.
Decide if `shortlink` is empty. If yes, return an indication that an invalid url is provided.
Get post data from `shortlink` with `api_get_post()` and store into `post_data`.
Decide if `post_data` is `None`. If yes, return an indication that an invalid url is provided.
Decide if `post_data` is `False`. If yes, return an indication that no data is fetched and the post is likely deleted.
Read `post_data.post_url` into `abs_url`. Call `to_protocal_relative()` with argument `abs_url`, and storing the return value into `url`.
Decide if the post has already been reported recently. If yes, decide if metasmoke is integrated;
if yes, post `custom_reason` as a metasmoke comment, if present, and return an indication that the post is reported providing a link to the metasmoke report;
otherwise, return an indication that the post is reported.
Generate post from `post_data` with `Post()` and store into `post`. Extract the auther of the post from `post_data.owner_url` into `user`.
Decide if the post is an answer. If yes, use proper method to fetch its parent post.
Scan the post.
Decide if `operation` is in the set `{"report", "report-force", "report-direct"}`. If yes, blacklist the user.
Decide if the post is manually ignored, and `operation` is not `"scan"`. If yes, adjust the scan result accordingly.
Decide if the post is spam, and `operation` is not `"report-direct"`. If yes, handle the spam and post `custom_reason` as a metasmoke comment, if present.
Decide if `operation` is in the set `{"report", "report-force", "report-direct"}`. If yes, handle the spam using the manually reporting reason,
and post `custom_reason` as a metasmoke comment, if present.
Decide if the post is ignored. If yes, return an indication that the post is likely spam but manually ignored. Otherwise return an indication that the post is unlikely spam.

makyen · 2020-06-10T04:02:29Z

At least from my point of view, the only thing needed to get this PR merged is to have a similar level of detail (or more) in the docstring for report_post() as there was prior to this change. Basically, the docstring should state the type of each parameter or return value, so that anyone calling it will know if what they are passing to it is what is expected by the function.

user12986714 · 2020-06-10T19:08:33Z

Please ignore CircleCI failure. The only change is in comments and I can be almost certain that this is a network issue on CircleCI server.

makyen · 2020-07-01T23:21:55Z

chatcommands.py

-        # if the post wasn't previously reported for the cases where we want to process it
-        # as such.
-        # Expand real scan results from dirty returm value when not "!!/scan"
-        # Presence of "scan_why" indicates the post IS spam but ignored


Please return these comments to being in the code.

The code performs a unintuitive manipulation of the data here, because what's provided by check_if_spam()'s return value has a couple of different formats, which are really not very clear. I found it frustrating to have to go figure out what the various different posibilities were which it could return in order to have the code do what it's supposed to here. I added the above comment so that someone else doesn't need to repeat that same work.

makyen · 2020-07-01T23:34:00Z

First, sorry for the delay in getting back to this. I finally did more testing (starting here). As you can see for the !!/allspam, the report reason transitioned from "Manually reported question" to "Blacklisted user". It would be preferable that the reporting reason remains "Manually reported question" for all of the reports. The code that's calling report_post in allspam is either going to need to be able to know if the user was/was not blacklisted prior to starting, and then telling report_post to ignore the blacklisting on all the to be reported posts, or it's going to need to be able to tell report_post not to blacklist the user until the last one that it is reporting on this SE site.

In addition, it was a helpful feature that !!/allspam indicates that the reports are part of a sequence and the post's number within that sequence. It would be nice to have a way for that to be in the chat message/report. This could be accomplished by allowing the call to report_post to have a parameter which is a generic additional portion of text which is included in the report/chat message. Alternately, a count and total could be passed to report_post, but that seems like it's making report_post more specialized, rather than the generic "include this text" and allowing the calling function to figure out what that text should be.

Instead of the non-indicative "[Post](<url>): Error message" reply, use the explicit "Post <url>: Error message".

…spam()

stale · 2020-08-03T21:53:58Z

This issue has been closed because it has had no recent activity. If this is still important, please add another comment and find someone with write permissions to reopen the issue. Thank you for your contributions.

Refactored report related functionalities

1835965

Refactored report related functionalities to reduce redundancy in code. Also implemented: issue Charcoal-SE#3712 and issue Charcoal-SE#2795 .

user12986714 force-pushed the patch-21 branch 8 times, most recently from 6cac5fb to c8e4fa4 Compare May 26, 2020 18:50

user12986714 force-pushed the patch-21 branch from c8e4fa4 to 218f4b5 Compare May 26, 2020 19:06

Minor fixes

d37ce14

user12986714 force-pushed the patch-21 branch from 218f4b5 to d37ce14 Compare May 26, 2020 19:17

user12986714 added 2 commits May 26, 2020 16:07

Removed redundant code (code vs API request trade off)

8bdde2e

Minor fixes

9f38064

makyen reviewed Jun 8, 2020

View reviewed changes

chatcommands.py Show resolved Hide resolved

makyen reviewed Jun 8, 2020

View reviewed changes

Minor changes in chatcommands.py: Improved comments

dc315ea

user12986714 force-pushed the patch-21 branch from b8a8294 to dc315ea Compare June 10, 2020 13:55

makyen mentioned this pull request Jul 1, 2020

findspam.py: Add reason user unregistered or non-existent #4131

Closed

makyen reviewed Jul 1, 2020

View reviewed changes

user12986714 added 6 commits July 1, 2020 22:31

chatcommands.py: Removed confusing links

cec7cf4

Instead of the non-indicative "[Post](<url>): Error message" reply, use the explicit "Post <url>: Error message".

chatcommands.py: Do not blacklist the user until the last post in all…

fbc7d8f

…spam()

test_chatcommands.py: Adjust test cases accordingly

2385ceb

chatcommands.py: Add batch report prefix

23989d1

Use .owner.name instead of .owner

6fd56f8

Adjust test cases accordingly

d02c9ec

user12986714 force-pushed the patch-21 branch from 69c6d0b to d02c9ec Compare July 2, 2020 15:13

stale bot added the status: stale label Aug 2, 2020

stale bot closed this Aug 3, 2020

makyen added the status: confirmed Confirmed as something that needs working on. label Aug 3, 2020

makyen reopened this Aug 3, 2020

stale bot removed the status: stale label Aug 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactored report related functionalities #3885

Refactored report related functionalities #3885

user12986714 commented May 26, 2020

makyen commented May 26, 2020

user12986714 commented May 26, 2020

makyen commented May 26, 2020

makyen commented May 26, 2020

user12986714 commented May 26, 2020 •

edited

Loading

makyen commented May 26, 2020

user12986714 commented May 26, 2020

user12986714 commented May 26, 2020 •

edited

Loading

user12986714 commented May 26, 2020 •

edited

Loading

makyen Jun 8, 2020 •

edited

Loading

user12986714 commented Jun 8, 2020

makyen commented Jun 10, 2020

user12986714 commented Jun 10, 2020

makyen Jul 1, 2020 •

edited

Loading

makyen commented Jul 1, 2020

stale bot commented Aug 3, 2020

Refactored report related functionalities #3885

Are you sure you want to change the base?

Refactored report related functionalities #3885

Conversation

user12986714 commented May 26, 2020

makyen commented May 26, 2020

user12986714 commented May 26, 2020

makyen commented May 26, 2020

makyen commented May 26, 2020

user12986714 commented May 26, 2020 • edited Loading

makyen commented May 26, 2020

user12986714 commented May 26, 2020

user12986714 commented May 26, 2020 • edited Loading

user12986714 commented May 26, 2020 • edited Loading

Note: it is better to review the overall file changes and to use the split view. git diff is doing a terrible job on this pr and is only making things more confusing (especially on individual commits).

makyen Jun 8, 2020 • edited Loading

Choose a reason for hiding this comment

user12986714 commented Jun 8, 2020

makyen commented Jun 10, 2020

user12986714 commented Jun 10, 2020

makyen Jul 1, 2020 • edited Loading

Choose a reason for hiding this comment

makyen commented Jul 1, 2020

stale bot commented Aug 3, 2020

user12986714 commented May 26, 2020 •

edited

Loading

user12986714 commented May 26, 2020 •

edited

Loading

user12986714 commented May 26, 2020 •

edited

Loading

Note: it is better to review the overall file changes and to use the split view. `git diff` is doing a terrible job on this pr and is only making things more confusing (especially on individual commits).

makyen Jun 8, 2020 •

edited

Loading

makyen Jul 1, 2020 •

edited

Loading