Git Product home page Git Product logo

discordwikibot's Introduction

DiscordWikiBot

DiscordWikiBot [ˈdɪskɔːdˈwiːkibɒt] is a Discord bot that transforms [[wiki]] and {{template}} links in chat messages into actual links using MediaWiki APIs, and informs about recent changes in Wikimedia projects and on Translatewiki.net. It supports editing and deleting its own messages. It was originally developed for the Discord server of Russian Wikipedians. A private instance of the bot is available for Discord servers of Wikimedia communities.

DiscordWikiBot is cross-platform console app built with .NET Core. It uses DSharpPlus, WikiClientLibrary, and EvtSource for most of heavy lifting. Its code is published under MIT licence.

Installation

  1. Download the source files.
  2. Create token.txt in project folder with a private token for your Discord bot. If you haven’t created your own Discord bot, create it first. Do not share your private token.
  3. Change config.json to your needs according to instructions there.
  4. Add eventStreams.json file containing only {} to the folder with config.json if you intend to use recent changes streams (Wikimedia projects only).
  5. Compile the bot’s binaries using any compiler that supports .NET Core (use IDEs like Visual Studio or MonoDevelop or dotnet CLI).

Important: When developing or updating the bot, take care of the folder where the application is compiled. That’s where eventStreams.json and overrides.json are being stored when running it, and if you clean the folder accidentally, all the data will get lost.

Configuration

The version in this repository is configured for Russian Wikipedia by default. Your instance of the bot can change this by changing values in config.json. Below is a short documentation for every available variable (remove lines starting with // if you’re going to copy from here). Required parameters are marked.

{
	// Link to the bot’s source code
	"repo":  "<https://github.com/stjohann/DiscordWikiBot> (C# / MIT)",

	// Default domain for recent changes streams (only Wikimedia projects work here)
	"domain": "ru.wikipedia.org",
	
	// REQUIRED: Default language of the bot
	"lang": "ru",

	// REQUIRED: Default wiki link configuration
	"wiki": "https://ru.wikipedia.org/wiki/$1"
}

Most variables in config.json can be overridden per server by members with ‘Manage server’ permission.

Usage

When the bot is enabled, it will transform [[link syntax]] to real URLs to the pages of your wiki or its interwiki links, and will transform {{template syntax}} to real URLs to the templates of your wiki. To stop the bot from reacting to links in your message, wrap it into ` (`[[example]]`) or escape [[ symbols (with \ before them).

DiscordWikiBot can be configured per server by server members with ‘Manage server’ permission. Available configuration includes the language of the bot, the default wiki URL, recent changes streams parameters etc. Up-to-date instructions for configuration of the bot are provided on Meta-Wiki.

Versioning

DiscordWikiBot uses semver for versioning:

  • Major versions (X.0.0) are changes that remove backwards-compatibility of any of the configuration files, including introducing new expectations from bot owners.
  • Minor versions (0.X.0) are changes that introduce new features to the bot.
  • Patch versions (0.0.X) are changes that fix existing code without introducing new features.
  • Third-party bot maintainers can be expected to update the bot to minor/patch versions without any required changes.

Pull requests should, if possible, include a change in DiscordWikiBot/DiscordWikiBot.csproj file with an appropriate version change.

Translations

Translations are done by volunteers on translatewiki.net. Pull requests with simple translation changes will not be accepted (except for en.json).

discordwikibot's People

Contributors

dependabot[bot] avatar peterbowman avatar stjohann avatar tamzinhadasa avatar translatewiki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

discordwikibot's Issues

EventStreams: Change used streams to make redirect filtering work

The biggest current problem with EventStreams implementation is that --type new streams do not have any access to data about whether a page is a redirect or a legitimate page. At best, there is auto-filled summary that can potentially be ignored.

There can be some things done to fix this, but it requires too much effort from me right now. Rough

  1. See how different streams correspond in terms of available data.
  2. Figure out whether the transition to another stream(s) by default could be something viable, even only for page creations or page edits.
  3. Implement the code for different EventStreams schemas if they would provide more options for filtering.
  4. Implement new filters augmented by new data (--redirect true/false/null for one, --usergroup autoreview might be another example), while still supporting all the old ones.
  5. (Optional) Implement a way to display log actions (categorise is useless given Discord rate limits IMO).

IIRC page-create stream can provide more data than recentchange stream I am using for everything right now, so there definitely might be something there. You can pass and parse multiple streams (like /page-create,recentchange), so that is not a concern. At the same time, this issue is still more of a problem statement and I do not have time or will to work on this myself yet.

Support gender-specific namespaces

Currently, if you link to the bot something like [[Участница:Dlom]] (female for ‘user’), it will return [[Участник:Dlom]] (male for ‘user’). Ideally, links to user and user talk pages should respect their gender identity, but there are two things that are standing in the way:

  1. Since we expand every namespace alias like MediaWiki does, ‘Участница:’ gets treated just like another namespace alias.
  2. In order for it to not be treated like that, we would need to know whether it is a gender-specific alias or not. There is currently no way to know this in MediaWiki API.

This affects every language with gender-specific namespace names. If you think this needs improvement, give thumbs up on a Phabricator task below, or maybe even create a patch for it.

Phabricator task:
https://phabricator.wikimedia.org/T204610

EventStreams: Some interwiki links in messages do not work

Since MediaWiki only handles some links as forwarding, all links on a typical interwiki map that are not forwarding currently result in link errors such as this:
https://ru.wikipedia.org/wiki/translatewiki:MediaWiki:Coll-create_a_book_tooltip/ru?uselang=en

This needs to be fixed. The best way to fix it would be to add some code to Linking that would use default interwiki map for the site (if it exists) to do transformations to the link without any API requests (i. e. replicating much of Linking.AddLink without any API calls for interwiki links), and then use it where the Linking.GetLink is currently used. It should be fairly easy to write this, but code duplication should be avoided. Ideally, EventStreams.ParseComment and Linking.AddLink should call the same function, but the latter should be able to replace basic (requestless) interwiki link handling with more advanced.

Simpler way to solve the same problem is to append Special:GoToInterwiki/ everywhere, but given that interwiki links can follow basically any rules, it would mean handling every link with : in it this way. [This is now done, so links would start to work, but this issue still needs to be fixed in a better way.]

EventStreams: Allow title matching using regular expressions

Some server owners have long requested adding ways to stream a number of defined pages using the bot.

I have thought before that the best way for doing this would be something like glob patterns, but this has multiple problems. For one, you would have to re-implement or take a library that is doing glob matching. There are also questions on whether it would be clashing with actual MediaWiki titles. After researching this question for a bit, I decided that just allowing people to use regular expressions (regexps) is good enough to solve this need.

Here are the theoretical requirements for any potential implementation:

  1. Regexps can be passed only to --title attribute of the configuration.
  2. Regexps should be passed using --title /.*/ syntax (i. e. always wrapped into //), since this would keep the params to the minimum and introduce a simple way to tell what is a regexp and what is not (str.StartsWith('/')). This needs to account for articles like https://en.wikipedia.org/wiki//b/ which are unlikely to have their own stream feeds but probably still need some way to reference them in EventStreams (e. g. :/b/?).
  3. The code should define a reasonable MatchTimeout (0.5 second?) and try/catch errors from slow regexps to prevent any ReDOS attacks.
  4. Passed regexps should be tested with the timeout and slow regexps should be rejected by the bot on the configuration step (!openStream).
  5. Passed regexps should match the whole string for clarity (^…$) and should not ignore case.
  6. (If we can find a way) Regexps should be as simple as possible in the number of features allowed.

There might be other notable things I forgot, please report them if you read the issue and can think of them.

Issues, community wishes, and acknowledgements

For years, I have documented all the issues and feature requests in my personal bug tracker. This mega-issue is created in an effort to open up this list and to facilitate outside contributions to the library, while not getting myself bogged down in documenting every single bug and feature request.

Please note that the list here is mostly not vetted for validity, and I can change my mind on whether to accept pull requests resolving them. All items starting with Bug: can be safely resolved by outside contributors. Items starting with Controversial: will probably require some discussion and PRs for them are not guaranteed to be accepted fast.

If you have noticed a bug or have a feature request, either create a new issue with a detailed description for it or write a comment here.

  • Bug: Sometimes TranslateWiki.cs code can send the messages twice to the same channel
  • Bug: [[Unsupported titles/:)]] [[:)]] [[)]] [[=)]] are not rendering correct links
  • Bug: [[[🌹]]](<https://ru.wikipedia.org/wiki/🌹>) doesn’t render as a link
  • Bug: https://ru.m.wikipedia.org/wiki/Обсуждение_участника:Matsievsky#Добавление_ссылок_в_{{lang-en}}_и_{{lang-ja}} should not get a link response
  • Bug: [[Grants:Programs/Wikimedia_Community_Fund#General_Support_Fund]] does not get link response because #general is an extremely common Discord channel name
  • Bug: TranslateWiki.cs code can have problems with getting out of limit of 1024 characters sometimes
  • Bug: !help can fail if the command documentation in a language is too long and gets out of limit of 2048 (?) characters
  • Bug: Since the bot can run for weeks if undisturbed, provide a way to update stored wiki site data
  • Bug: API requests for user pages (for gender support) should be done better (bundled in one request and done per user)
  • Bug: Linking.Remove should be re-implemented
  • Bug: Introduce a way to change bot’s User-Agent string from config.json (and make it mandatory to set one)
  • Bug: Add special pages localisation, like in MediaWiki
  • Bug: [[:az:voy:]] etc. should return a link without "Main page" appended to it
  • Bug: !serverDomain should not be a thing if EventStreams are disabled
  • Add namespace=all to EventStreams, but have some way for the bot owner to allow/deny using it on a server
  • Support {{fullurl:}} / {{localurl:}} in Linking.cs
  • Controversial: Add !channelDomain like already done with !channelWiki
  • Interwiki links where title starts with capital letter and has no namespaces can drop API requests for full siteinfo
  • Code cleanup: Add a method for Locale.GetMessage that accepts a language and returns a method to get locale info without specifying (the same) language
  • Add an optional ability to enable bot responses to bot messages (!serverRespondToBots?)
  • Code cleanup: Write unit tests for more bot responses
  • Add !search command to allow correct linking to complex wiki searches like insource:"class" insource:"messagebox" insource:/messagebox/ prefix:all: (some of which can be denied by MediaWiki invalid title rules)
  • Add command !link [[test]] [[second test]] that would return link response with embeds (for up to 3)
  • Add command !linkfile [[File:Example.png]] [[File:MediaWiki.png]] (or do it as part of !link) that would return link response with proper embeds (for up to 3)
  • Add command !linkdiff [[Special:Diff/…]] (or do it as part of !link) that would return diff info like from EventStreams (for up to 3)
  • Add lazy checks on redirects (redirect=no or resolve?) and page existence to the bot
  • Controversial: Should [[google:test meow]] be hard-coded to resolve correctly?
  • MediaWiki invalid title rules should be handled by siteinfo API for the wiki instead of hard-coding it in the bot
  • Controversial: Files from Wikimedia Commons or other common media repo should get direct links to Wikimedia Commons pages
  • [[Wikipedia talk:Twinkle/Development#Using the [rollback] or [vandalism] button in Contributions]] does not get link response due to MediaWiki invalid title rules (including in MediaWiki)
  • Links to Wikidata/Wikibase pages should have labels in current server language next to them
  • Controversial: Drop code for Translatewiki data conversion and release a new major version because of it (only with other breaking changes)
  • Media: file names return their full file URL in MediaWiki
  • Code cleanup: Write Config methods that take DiscordServer/DiscordGuild for simplicity, or refactor current ones
  • Provide desktop links to mobile links posted by users
  • Add a preference for Special:MyLanguage/ links, like in jhsoby/telegram-wikilinksbot

See other open issues as well.

Acknowledgements

These are issues that have been resolved by community members, compiled in a list here to show the appreciation of their work. If you feel like your contribution is missing from here, feel free to remind me of it here or via Discord DMs. A valid contribution is an adopted technical proposal or contributed code, either formally (via PRs) or informally (via Discord chats).

  • Append underscore if title ends with a full stop (#20)
  • Hide links that are located inside spoiler blocks (#18)

DiscordWikiBot is also continuously improved by the translators on Translatewiki.net and (previously) on Github.

Allow channel-based configuration

In #2 it was requested to make it easier to link to sister projects. The main drive for this suggestion was the fact that people are having multiple wiki communities on the same server and need to write interwiki prefixes for every link (also, they can’t use template links since they lead to different place). The proposed solution for this is to implement channel-based configuration. Channel-based configuration would need to be enabled or disabled by commands like !channelWiki etc.

Currently, I see the use for this only for wiki link configuration. Having other options be customised on channel level seems pointless and potentially confusing. If you have use cases for it, feel free to comment them.

Technicality:
When implementing this feature, one must take into account a collision between Discord channel IDs and server IDs (server IDs being the IDs of the first channel created on the server). Otherwise, channel-based configuration can end up leaking into being server-wide.

Link to sister project if there's a namespace for it in link

Hi, a suggestion from French Wikimedia Discord. We think it would be a good thing if the link to a sister project could be automatic if the expression between [[]] starts with a sister project-specific namespace (e.g. [[Wiktionary:something]] or [[WT:something]] for Wiktionary), without having to add the project before (e.g. [[wikt:WT:something]]).

Thanks in advance, and by the way, thank you for this great bot!

Ignore magic words when linking to templates

There are MediaWiki APIs that can return a list of all magic words and parser functions in a wiki. While we already ignore parser functions (since # at the start can’t denote a template), magic words sometimes get linked by people. It is better to ignore these when they are written like {{BASEPAGENAME}}, but still allow them when someone explicitly writes [[Template:BASEPAGENAME]].

CXuesong/WikiClientLibrary doesn’t have an interface for getting this information, so we will need to do separate API requests. CXuesong/WikiClientLibrary#89 added SiteInfo.MagicWords that can be used here in some way.

Example of an API response:
https://ru.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=magicwords|variables&formatversion=2

Filtering by new pages doesn’t work

DiscordWikiBot uses XmlRcs to stream recent changes to Discord servers. With the new filtering options, I wanted to provide an ability to filter by new pages only. Right now, it doesn’t work.

In my testing, huggle-rc.wmflabs.org didn’t respond with any events that matched ChangeType.New, no matter how I’ve written the relevant code. I haven’t done thorough investigation, but for now everything suggests that the problem is upstream. I’ve tried to reach out to Huggle devs that have written that helpful library but haven’t got a response yet.

Relevant code:

Phabricator issue:
https://phabricator.wikimedia.org/T220856

Translatewiki: Move the config to channel overrides

While Translatewiki module currently already sets channel overrides (for translatewiki-key data), it also uses server configs for everything else, which duplicates the data. This is more of a legacy thing than a conscious decision, since it makes no real difference if a server would track multiple Translatewiki languages or not. This change would enable the servers to do it as a side-effect, or even just posting the same language into multiple channels on one server (which would make testing the case where multiple servers track the same language easier).

What I want to do with this is to bring Translatewiki module closer to EventStreams/Linking modules:

  1. move all config into channel overrides (but in config.json in this case, not translatewiki.json like with eventStreams.json);
  2. rename !guildTW command to !trackTranslatewiki or similar (I generally wish to move away from !guildX to !serverX names, and this would not make sense if you could add multiple languages to one server);
  3. add code converting the old syntax to the new syntax, like one that already exists in Translatewiki.cs, so the old data does not get lost;
  4. eventually remove that code after some time.

Edit: One concern though is that it will be harder to figure out which channels require Translatewiki streaming in this setup. So when implementing this one should be careful about not, say, for-looping the channel list. It is easier for Linking to have overrides in comparison because it reacts to events in channels.

The removal of the conversion code for the data will introduce a new major version (v.N.0.0) of the bot.

Use MediaWiki API for validating language codes

MediaWiki has 2 language-related APIs, one of which is new. action=query&meta=siteinfo&siprop=languages can return all language codes that are present in a wiki, and new action=query&meta=languageinfo (MediaWiki 1.34 and above) can return all language codes in a wiki with their preferred fallbacks.

We can use these APIs for validating language codes in configuration for DiscordWikiBot, but there are two requirements:

  1. It should be fetched only once, when starting the application.
  2. There should be some graceful degradation if API is not present, either to older module without fallbacks or to native language code validation.

Current native language code validation causes issues for some languages, like Serbo-Croatian.

Information about APIs:
https://phabricator.wikimedia.org/T220415

Improve support for MediaWiki wikis with non-standard URLs

Since the addition of support for different servers, it uses /wiki/$1 pattern to detect whether something is a wiki or not. Judging by interwiki map on Meta-Wiki, we can see that this is not enough to determine whether something is a MediaWiki wiki or not: they can also use simple /$1 or something convoluted like /index.php?title=$1 or /index.php/$1 and still be valid wikis.

This poses two problems for the current code:

  1. Easier one: support more wiki URL patterns in linking bot. This can be done by including checks for more URL patterns and fetching APIs of those wikis for their interwiki chains. I should come up with a good way to know (and even remember) wiki URLs somewhere, because it might be silly to ask, say, Google for /api.php a hundred times.
  2. Harder one: update the current code to use /api.php at the end of the string as a way to validate wiki URLs rather than /wiki/$1. That way, the bot will ask the API and get and remember the article path from there. I didn’t hear any requests before asking about this problem, but it will be a good thing to do. All the old values with /wiki/$1 will need to be deprecated and updated in the configs.

The removal of deprecation of old URLs will introduce a new major version (v.N.0.0) of the bot.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.