sobotics / socvfinder Goto Github PK

View Code? Open in Web Editor NEW

19.0 7.0 11.0 189.71 MB

Queen, search and report question to for cv review and scan comments for Heat

License: GNU General Public License v2.0

NSIS 2.01% Java 94.96% Batchfile 2.92% Shell 0.11%

stack-overflow heat-detection review

socvfinder's Introduction

SOCVFinder

Background

This bot was developed to address the following issues:

The problem of duplicated questions, that not only confuse search engines but also encourage rep hunts by copying and pasting answers. SE specifically introduced the the mighty Mjölnir to address this problem, but too often it is not used for various reasons.
The inefficiency of current close review queue. There are too many questions and too few reviewers; the end result is that the few reviewers often see all their work age away, especially if they filter on low-traffic tags.

Objectives

To provide a means and aids to users having a gold-badge in a tag in finding duplicates faster.
To help users easily identify (and review) questions of certain (customizable) characteristics. A.K.A., "cherry-picking", "filtering", etc. so that they feel they are making a difference helping the community.

1. Dupe hammer aids

1.1 Live duplicate notifications

Allow users with a gold badge to opt-in for real-time notifications of possible duplicates. Those notifications are messages sent to certain configurable chat rooms.

Example:

[tag:possible-duplicate] [tag:python] [tag:mysql] [tag:django] [tag:django-1.7] Link to a question @User1 @User2

The users will only be pinged if they're present in the room. It is possible to give feedback on a duplicate notification (true positive; false positive).

1.2 Duplication search

Allow users to search for questions with duplicate flags or close-votes as they wish, possibly indicating filter options, like a max number of questions to review, a date span, a score count etc. This is similar to the tag filter in the /review interface but provides more filtering options and abstracts itself from the current review interface.

Screen-shot of an example of Java duplicate search:

The bot won't return previously reviewed questions.

2. Enjoyable and targeted reviewing

Make your reviews count!

Leverage the community's work, don't let it go to waste and have some fun doing it.

2.1 What can make reviewing enjoyable?

Knowing that your vote counts;
Cherry-picking questions;
Getting statistics on reviewing (users/tags/rooms).

2.2 How can you make your vote count?

Review questions that already have high or low close vote count.
Review questions that will not go to the roomba.
Review questions together with others

2.3 Cherry-pick your questions

Query the bot for the desired number of questions in tags of choice, with the possibility to specify a desired CV count, question score, if not roomba, if has answers, if has accepted answer etc. The bot won't return questions already reviewed.

Screen-shot of an example of a cherry-pick request with a filter "3 close-votes" in the Swift tag:

2.4 Enjoyable statistics

Query the bot for some statistics so you can enjoy your efforts and see the status of your favorite tags.

Examples:

This is your effort that I have registered all time
   nr [tag]             Reviews  CV virt.  CV count    Closed
-----------------------------------------------------------------
   1. java                  370       221       216        87
   2. blinking               45        22        22         4
   3. javascript             25         8         8         3
   4. c                      11         2         2         2
   5. c#                      1         1         1         1
   6. maven                   1         1         1         0
   7. c++                     1         1         1         1
   8. python                  1         1         1         1
   9. swift                   2         0         0         0
-----------------------------------------------------------------
      TOTAL                 457       257       252        99

Tag statistics all time
   nr [tag]             Reviews  CV virt.  CV count    Closed
-----------------------------------------------------------------
   1. python                839       814       672       286
   2. java                  779       721       537       272
   3. c#                    272       185       185        69
   4. php                   111        98        98        89
   5. ios                   160        93        93        33
   6. c                     122        84        83        42

3. Limitations

The bot only uses the Stack Exchange API and does not do any "screen scrapping".
The API can not filter on close votes, hence all questions need to be scanned; to reduce API calls, an index system is implemented.
The information of a question currently being the close-vote queue is not available in the API. As such, we cannot redirect user to the /review interface.

4. Commands

For full command specification based on privilege level see Quick guide.

5. Accounts

This bot is using the Queen account and is also a registered stack app. The test is currently made in SOCVFinder chat room.

6. Source code

Source code is available on GitHub at /jdd-software/SOCVFinder.

socvfinder's People

Contributors

Stargazers

Watchers

Forkers

gunr2171 rschrieken bhargav-rao adeak double-fault aralun raisingagent danbopes sardar-usama jmcabandara mdoubledash

socvfinder's Issues

Hydrant - A Dashboard for Heat Detector

Hydrant - A dashboard for Heat Detector

After collecting some important points about the new dashboard, I'm compiling them here and formalizing the creation of a new dashboard.

Need

Heat Detector is quite famous networkwide for its ability to detect rude and abusive comments.
However the problem that we are facing at hand is that not many people are feeding back accurately
(The bot utilizes ML and feedback is very very important for it's improvement).
This issue can be easily solved by having a web dashboard.

Thoughts

Unlike CopyPastor, which had to be written from scratch, the dashboard for HD can be easily modeled based on Sentinel. The similarities between Natty and HD are:

Both are related to flagging (comment vs answer)
Both have various reasons for detections
Both need feedback

Changes

Changes needed on the HeatDetector side:

A proper/strict rule to differentiate between what should be a true positive and what should be a false positive.
A way to provide feedback through a command, so that a userscript could be made use of.
And of course, calling the dashboard in every report.

Challenges

Framing the strong rules to differentiate between a heated argument and not is often subjective. Quantizing this in someway in order to improve the algorithm would be the first challenge. (This is not related to the dashboard development work, but the way we provide feedback).
Few of the posts might be borderline, and we wouldn't want to feedback these. Hence a way to "not provide feedback" would be needed.
We are also tracking a few keywords like "spam", "rude", etc which aren't noisy comments, but are meant to help us discover spam or r/a posts. We need a different feedback type for this.
We need to find a way to anonymize the data which we display on the dashboard. Some kind of a login would be nice, but one other idea is to display minimal information (just the comment text and reasons, for example), and display all the details after logging in.
Art is very very busy at the moment and certainly won't have much time to design the dash. Hence we'll need to either do it in one of the languages which we know or recruit someone who knows Ruby and RoR. Given that we still haven't put up an opensource ad this time, it might be a bit hard.

Hosting

We can host it on the sobotics webserver. That would not be an issue.

Update chatexchange library

As you might have heard, Stack Exchange will remove OpenID.

Since tunaki/chatexchange is currently using OpenID to connect to chat, we had to release an update. If you don't update, you won't be able to connect to chat after July 25, 2018.

We've moved the project to org.sobotics.*. This is the updated maven dependency:

<dependency>
  <groupId>org.sobotics</groupId>
  <artifactId>chatexchange</artifactId>
  <version>2.0.0</version>
</dependency>

You'll have to update your import statements. All APIs are still the same, so you don't have to modify the implementation.

If you have issues updating your project, feel free to ask for help in our chatroom.

Suggestion: show [SOCVFinder] at the start of all auto report messages

A typical smoke detector message is formatted like this:

[ SmokeDetector ] Bad keyword in body: How to read some columns from a CSV file with C++? by Adrian on stackoverflow.com (..mentions..) …

Would it be worth adding that prefix to the bot as well? I'm assuming that most people interact with the bot by reacting to reports auto-announced. It will also help advertise that it's a bot and give a quick way to access the commands.

[ SOCVFinder ] [tag:possible-duplicate] [tag:java] [tag:rest] [tag:spring-restcontroller] [tag:spring-rest] Currency Symbols like €,£ are Corrupted in Spring HTTP.GET

The cherry picker still uses the old dumps API

http://socvr.org:222/api/socv-finder/dump-report is no more.

We should move to https://github.com/SOBotics/Reports

The number of messages in SOCVR is too damn high.

Queen messages posted in SOCVR are getting a bit too much, more than our cv-pls.

Here are my ideas to lower the report count:

Only allow a set number of reports per time frame. For example: 5 reports per 60 minutes.
Only report when some part of the post as reached a threshold, like the score is < -1.

These two options however dilute the speedyness of the reporting system. If the point of the bot is to allow people to dup-hammer quickly on newly created posts, these options are less than ideal.

Two other options:

Don't ping people in SOCVR, ping them in the SOCVFinder room. (Doesn't really solve the "massive number of reports in SOCVR problem)
Instead of posting each report in SOCVR, tell a user

There are 3 [tag:possible-duplicate] posts created recently in your watched tags. View them [here].

The link would send you to the first message in SOCVFinder.

_{Title reference}