thesecretmaster / ips-comment-bot Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 3.0 311 KB

Ruby 100.00%

ips-comment-bot's People

Contributors

Stargazers

Watchers

Forkers

superplane39 arezohayeman caldeirag

ips-comment-bot's Issues

Have a new detection type for comments on old posts

As requested by Catija.

It would be useful if comments on posts that are over a certain age get detected, like the regexes and magic comment. This would probably be post useful as a new type of detection - old_post or something like that - that can be adjusted for posting into child rooms as needed.

It'd also be good if the specific age of the question was able to be adjusted in the config. For a starter, though, two weeks is probably old enough to start detecting.

Add validation around adding a new regex

People often forget to specify a reason (which will create a null reason) or accidentally capitalize letters in their regex (resulting in a regex that'll never match anything).

Add some validation around Add that will:

Disallow adding a regex without a valid reason
a. Where a "valid reason" is any reason with 1 or more characters
Run a weaker version of howgood on the new regex. If it matches less than 10 existing comments, print a warning after the "Added" message that says something like: "Note: this regex only matches X (Y% of total) comments in the database. Please ensure you're adding a useful regex." (or maybe something a little less harsh than that on the second sentence)

Proposed `howgood` output change

I propose we change the howgood output to be an ASCII table. This would require changing lines 198-206 of comment_scan.rb to:

tp_msg = [ #Generate tp line
  "#{'tp'.center(6)}",
  "#{tps.round(0).to_s.center(11)}",
  "#{((tps*100/total).round(8).to_s+"%").center(14)}",
  "#{((tps*100/Comment.where("tps >= ?", 1).count).round(8).to_s+"%").center(15)}",
  "#{((tps*100/Comment.count).round(8).to_s+"%").center(18)}",
].join('|')

fp_msg = [ #Generate fp line
  'fp'.center(6),
  fps.round(0).to_s.center(11),
  ((fps*100/total).round(8).to_s+"%").center(14),
  ((fps*100/Comment.where("fps >= ?", 1).count).round(8).to_s+"%").center(15),
  ((fps*100/Comment.count).round(8).to_s+"%").center(18),
].join('|')

total_msg = [ #Generate total line
  'Total'.center(6),
  total.round(0).to_s.center(11),
  '-'.center(14),
  '-'.center(15),
  ((total*100/159229).round(8).to_s+"%").center(18),
  #(total*100/Comment.count).center(18),
].join('|')

#Generate header line
header = " Type | # Matched | % of Matched | % of all type | % of ALL comments"

final_output = [ #Add 4 spaces for formatting and newlines
  header, '-'*68, tp_msg, fp_msg, total_msg
].join("\n    ")
say "    #{final_output}"

#Could also be this depending on programming preference
#puts "    #{header}\n    #{'-'*68}\n    #{tp_msg}\n    #{fp_msg}\n    #{total_msg}"

I would fork and push but I have no way of testing the bot after I've made the changes. So for fear of breaking things, I'm posting my suggestion here instead :)

No accounting for 'deleted user' in the metadata post

You can see this here - the metadata post has edited 122 days ago by []() ( rep) because the last revision was by a now-deleted user. Minor issue, but still worth saying 'by a deleted user' or somesuch, such as Smokey does?

Add some gamification to encourage more feedbacks

^{Normally, I'd just implement this, but I'm not sure if this is a direction we want to take the bot. I'm posting this here to get my thoughts down and have something to point TAS users to.}

With the (near) completion of #28, we'll have each new feedback (tp/fp/rude) linked to a chat user. I propose we add some soft gamification in TAS to encourage more feedbacks. This plan has two parts:

On feedback (tp/fp/rude) given, check the total number of feedbacks attributed to that user. Print a congratulatory message (instead of the usual message) if they've reached a milestone. ie...
a. "Congratulations on your first feedback, @user! Now feed me moooaaaare :D"
b. "Let's all give @user a pat on the back for their 25th feedback!"
c. "Holy moly did you really just give me your 100th feedback, @user?!"
Create a scoreboard mention command (similar to cats, ie: "@ips, gimme teh scoreboard"). This command will show the top 10 users with the most feedbacks in a table like
User | XXX Total | YYY tp's | ZZZ fp's

However, I'm not sure if this is something we want to do. I'll run it by TAS now that it's written up and see what they say.

Disable "Invalid Input" response when reply was >1 parameter long

Right now, when the bot gets a response to a comment report that it can't parse (that doesn't begin with i), it prints

Invalid feedback type. Valid feedback types are tp, fp, rude, and wrongo

However, more and more often people are responding to bot messages with human responses (to draw conversation to a reported comment). Adding a prepended i is difficult to remember and looks ugly. Instead, why don't we ignore any reply that is longer than 1 word (since all replies to comment reports are 1 word commands).

This FR was run by TAS and seems to have the room's support (see here).

This will probably just involve adding a check on msg.body.split(' ').length right after line 121 of comment_scan.rb.

Bonus/Easter Egg: have a 1/5 chance of replying with a random comment from the list

Ain't that the truth.

You're telling me.

Yep. That's about the size of it.

That's what I've been saying for $(AGE_OF_BOT)!

What else is new?

For real?

Humans, amirite?

Remove a regex reason if the only regex it has is removed

Right now if I add a new regex reason and then remove the regex under it, the reason sticks around like chewing gum under a high school desk--not really doing any damage, but nobody really wants it there if they have a choice.

Can we add a feature to !!/del where if the regex being deleted is the only one under a reason, the reason is deleted too?

Have an option to not detect all comments and just report the regex hits

To make this bot a little more useful for using on other sites that may not want a record of all comments ever posted, there should be an option to run the bot and just have the regex hits posted into chat, such as is happening with The Awkward Silence as of right now, but without having the HQ room.

So, for instance, if I want to detect certain comments on Scifi.SE but not to have all the comments, just the regex hits, this would be useful for that.

Use perspective as an actual reason

Score > 0.7

[FR] Adding command !!/cats to display random picture of cat

As discussed in chat (https://chat.stackexchange.com/transcript/message/51960200#51960200) it would be nice to have a bot displaying (cute) pictures of cats when we ask for it.

This API can be used: aws.random.cat/meow

The proposed command could be !!/cats

(this shouldn't be implemented on the meta bot to avoid having to bots responding to this same command)

Don't duplicate post manually reported comments

When a user manually reports a comment, it gets reposted in every chat room that the bot is in, including the room it was reported from. It can be a little confusing seeing the same comment posted twice, so I propose that the bot not repost manually reported comments in the room they were reported from.

Ignore the OP for regexes, with the exception of offensive

A lot of the regex fp detections I see, especially for the possible-aic ones, come from the OP responding to the comments with more comments.

This could be avoided by excluding the OP from the regex, much like moderators. If you've posted the post that this comment was posted on, you should be excluded from non-abusive regexes.

Add unit tests (that GitHub will auto-trigger when pushing new code)

This is maybe more of a longterm goal, but it'd be really awesome if we could get some unit tests in the project. Bonus points if we could get them to auto-run when someone tries to make a pull request (to make it easy to see if there are build errors/obvious semantic errors that break stuff).

This'd probably require a pretty big refactor to make the core functions callable from a test with test data. We'd have to add a level of abstraction so that we can fake the SE API calls.

I definitely won't be trying this any time soon, but maybe some day...

Add "rescan" reply-to option

A common workflow is to see a bad comment, !!/add some new regex and then rescan the comment to be caught by the regex. This requires finding the comment id and formulating a !!/manscan command.

Instead, it would be easier to reply to a reported comment with "rescan" to have the bot rescan (call scan_comments) the comment. ie:

@IPSCommentBot rescan

Get rid of tp/fp/rude on the Comment table

With the advent of the Feedbacks table, storing tp/fp/rude int's on the Comment is now redundant.

That being said, everything looking at tp/fp/rude numbers is looking at those columns, so this may be a bit of an undertaking.

What will likely need to be done:

Remove tp/fp/rude int columns on the Comment table
Add some easy way to fetch tp/fp/rude numbers for a column
- Maybe a view? Or the SQLite equivalent? Or just a function on the Comment class that returns a map?
Update all code fetching/updating those values to ensure that it's fetching/updating Feedbacks instead
Write a migration script or SQLite query to create Feedback rows from anonymous users (maybe user id -1) for all of the legacy comments before this db table was added

Have a "x comments in time y on post z" alert

What would be useful for identifying problematic comment threads would be a certain amount of comments in a certain timeframe triggering a message from the bot to the chatroom.

For instance, 10 comments in 5 minutes, 20 comments in an hour, 30 in the past day, on the past day.

Such a message would probably look like

Possible argument: 10 comments in 5 minutes on post Title by user.

Posted in both the control and child rooms.

Have the DB viewable from the web

It'd be cool, as well as useful for people who want to look at some data, to have the database that the bot is running somehow be available to people on the web. Would there be some way to have the bot automatically put the database into e.g. a DB reader? I don't exactly know how this works ;)

Delete offensive detections from child rooms after a minute

It's probably best not to leave the offensive content in child rooms for long. (This caused an argument in The Closet a bit back.)

If the bot deleted the messages about the offensive detection a minute (or possibly a minute and a half) after reporting, that's long enough for it to be seen and have action taken on it, but within the time limit for deleting your own chat messages.

Alert HQ room when a user passes a threshold of tp comments

It'd be nice to have a warning when a user has been leaving a lot of delete-worthy comments.

I'm thinking a message in the HQ room every time someone passes multiples of 20 tp comments.

E.g.:

**Alert**: User <ID> has left 40 comments marked tp. (@Mithrandir)

This would happen at 20, 40, 60, 80...

This would be without using the username, just the ID.

Add a !!/whitelist command

Having a command to add someone to the whitelist would be useful. For instance, !!/whitelist ips 10814 would whitelist @thesecretmaster on IPS.

Add a way to tell which regex triggered a report

Add the ability to reply to a reported comment to have the bot respond with which regex triggered the comment. Eg:

 > Hi, and welcome to IPS! Could you provide additional detail to the situation? What is the context of your disagreement with your friend? What have you tried so far? — Jess K. 2 mins ago
 #19255 Jess K. | Q: How to apologize to a friend when you know you did something wrong but can't confess it? (score: 0) | posted 5 minutes ago by user10477618 (1 rep) | edited 1 minutes ago by A J (6679 rep) | Toxicity 0.06513069 | tps/fps: 0/0

 > @IPSCommentBot huh?

 > Comment matched "experimental-aic(@scohe001)" for regex: - q: have\Wyou\Wtried

Implement Perspective to find rude comments

Implementing Perspective would help in finding comments that are flaggable but do not directly contain any keywords that would trip the regex.

This is not high-priority at all, but it would be useful to have at some point in the future.

Link feedbacks to people

This would just be a handy thing to have. Also we should maybe add some logging around feedback time, location, and the original regexes that caught it. I may go in and do this myself when I get time.

Link to users should just be /u/###, not /u/###/name

Currently, Smelly posts links to users in the format https://interpersonal.stackexchange.com/u/31/arwen-und%c3%b3miel. However, this doesn't work - this link format doesn't work. It should just be https://interpersonal.stackexchange.com/u/31, as SE doesn't support /u/<id>/<name> format links.

Implement a feedback system

As requested by M.A.R..

Currently, there's no way to give feedback on a comment detected by regexes and so no way to evaluate the effectiveness of a singe pattern.

Implementing a feedback system that would allow feedback on the individual regexes tripping would be useful for refining the detections.

Move handling of offensive words out of user defined regular expressions and into code

We currently have several regular expressions under the category of "offensive". These regular expressions get printed whenever someone uses the !!/regexes command in chat. This means that the bot is routinely posting messages that contain content we've deemed offensive. In order to not keep putting this into chat, where someone might come across it, we should have these cases handled within the bot's code where they won't be constantly printed in chat. This will make it slightly harder to add new offensive words to the list of things we catch, but should be fairly maintainable moving forward.

"Huh?" functionality broken for Toxicity

It looks like "huh?" doesn't report "high toxicity" even if the toxicity is >= to .7.

Here is the line that should be reporting it: https://github.com/thesecretmaster/ips-comment-bot/blob/master/comment_scan.rb#L111

And here is an example of it failing: https://chat.stackexchange.com/transcript/message/50545563#50545563

This is probably something silly, but I'm writing up an issue so I'll remember to poke at it when I have the time.

Exclude moderators from the regex

Excluding moderators from being posted to the child room when they trip they regex would be useful and lead to less clutter in the child room.