Wikipedia:Bots/Requests for approval/DeadbeefBot II

DeadbeefBot II

Operator: Dbeef (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 02:10, Friday, May 23, 2025 (UTC)

Automatic, Supervised, or Manual: automatic

Source code available: https://github.com/fee1-dead/usync

Function overview: Sync userscripts from Git(Hub/Lab) to Wikipedia.

Links to relevant discussions (where appropriate): Wikipedia:Bots/Requests for approval/DeltaQuadBot 9, User:Novem Linguae/Essays/Linking GitHub to MediaWiki

Edit period(s): Continuous

Estimated number of pages affected: All pages that link to User:Dbeef/usync

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): No

Function details: The bot scans and parses the list of user scripts at Special:WhatLinksHere/User:Dbeef/usync, and they must start with the following header format:

// [[User:Dbeef/usync]]: LINK_TO_REPO REF FILE_PATH

so for example:

// [[User:Dbeef/usync]]: https://github.com/fee1-dead/usync refs/heads/main test.js

And will start syncing from the Git file to the on-wiki script.

Any user script author intending to use the bot must (1) insert the header both on-wiki, and on the Git file themselves, serving as an authorization for the bot to operate. (2) Create an application/json webhook in their Git repository pointing to https://deadbeefbot-two.toolforge.org/webhook (URL does not work yet; have not deployed it) to notify the bot of new commits that have occured on the file.

The bot will then make edits using the commit message and author information to update the user scripts.

Currently, it only supports js files in the User namespace, but its scope could be trivial expanded to cover more formats (CSS/plain wikitext) depending on usage.

This is an improvement upon the previous DeltaQuadBot task: Auditability is achieved through linking on-wiki edits to GitHub/GitLab URLs that tell you who made what changes. Webhooks are used instead of a periodic sync. Authorization must be given on-wiki to allow syncs to happen.

The code is currently a working demo. I'm planning on expanding its functionality to allow Wikimedia GitLab's webhooks, and actually deploying it. I will also apply for Interface Administrator perms as this bot requires IA permissions. Will also request 2FA on the bot when I get to it.

Discussion

Just so we are aware of the alternatives here: bd808 suggested on Discord of an alternative solution to this problem which does not involve an IntAdmin bot, where script developers can create OAuth tokens and submit those tokens to a Toolforge service, and the Toolforge service would use those OAuth tokens to make edits as the script author (0xDeadbeef/GeneralNotability/etc.) instead of having the edits coming from a single bot account. There are different trade offs. I think if we're okay with a bot having IA permissions, then this solution is more convenient to setup, as the OAuth one requires going through the extra steps of creating a token. This bot also makes those edits in a centralized place when people want to inspect which scripts are maintained using this way. beef [talk] 02:34, 23 May 2025 (UTC)[reply]

I see a risk here in having a bot blindly copy from the github without any human verification. Interface editor rights are restricted for very good reason, as editing the site's js would be very vaulable to a potential attacker. By introducing this bot, we now also have to be concerned about the security of the github repo's the bot is copying from. Something which is external to Wikipedia. We have no control over who might be granted access to those repos, and what they might do.

In fact, it may actually hinder development of tools/scripts. Currently, as a maintainer, one can be fairly liberal in who you add to your github repo, knowing that you can review any changes when you manually move them from the GitHub to on-wiki. With this change, anyone you add to the repo, realistically should be someone the community would trust with interface admin rights. --Chris 09:49, 23 May 2025 (UTC)[reply]

I think the bot task is more aimed at user scripts than gadgets. You don't need to be an interface admin to edit your own scripts. Being an opt-in system, script maintainers who don't wish to take on the risk can choose not to use the system. As for security, it should be the responsibility of the script author to ensure that they, and others who have been added to the repo, have taken adequate measures (like enabling 2FA) to secure their github/gitlab accounts. – SD0001 (talk) 10:14, 23 May 2025 (UTC)[reply]

For what it's worth, there are already people doing this kind of things to their own userscripts, such as User:GeneralNotability/spihelper-dev.js. However, they were never done with a bot because the bot would need to be interface admin. So they just store BotPasswords/OAuth tokens in GitHub and write a CI job that uses that to edit on-wiki.

Being someone with some fair bit of the open source process, I don't see why someone who wants to personally review any changes themselves should choose to add people liberally to the GitHub repo, and then choose to use this bot if it gets approved. They should try to move the development/approval cycle onto GitHub, appropriately using pull requests and protected branches, or just keep doing what they are doing. beef [talk] 10:22, 23 May 2025 (UTC)[reply]

Script maintainers might be happy to take the risk of automatically copying scripts from an external site to become active client-side scripts at Wikipedia, and they might be happy with the increased vulnerability surface area. The question here is whether the Wikipedia community thinks the risk–benefit ratio means the procedure should be adopted. Johnuniq (talk) 10:36, 23 May 2025 (UTC)[reply]

User scripts are an "install at your own risk" already, so feel free to avoid installing user scripts that do any automatic syncing. If the community doesn't like a bot that does this for whatever reason, I can also be fine with a "store OAuth tokens that give a toolforge service access to my account" approach which requires no community approval and no bots to run, just slightly less convenient to setup.

All I am saying is that the increased vulnerability surface area remains to be proven. WP:ULTRAVIOLET and WP:REDWARN have been doing this for years. Whether approval for code occurs on-wiki or off-wiki shouldn't matter. beef [talk] 11:00, 23 May 2025 (UTC)[reply]

The bot as proposed crosses a pretty major security boundary by taking arbitrary untrusted user input into something that can theoretically change common.js for all users on Wikipedia.

Has anyone looked at the security of the bot itself? Chess (talk) (please mention me on reply) 01:44, 9 June 2025 (UTC)[reply]

@Chess: theoretically change common.js for all users on Wikipedia - no, only common.js that link to the specified page/transclude the specified page would be in scope for the bot. dbeef [talk] 01:47, 9 June 2025 (UTC)[reply]

@Dbeef: I understand what's in scope, but is the authorization token actually that granular? If there's a vulnerability in the bot, I could exploit that to edit anything. Chess (talk) (please mention me on reply) 02:04, 9 June 2025 (UTC)[reply]

@Chess: I'm not sure what you mean.

I had thought about the security implications long before this BRFA:

The only public facing API of the bot is a webhook endpoint. While anyone can send in data that looks plausible, the bot will only update based on source code returned from api.github.com. So malicious actors have to be able to modify the contents of api.github.com to attack that.
The credentials are stored on Toolforge, standard for a majority of Wikipedia bots. Root access is only given to highly trusted users and I don't think it will be abused to obtain the bot's OAuth credentials. If you think otherwise, I can move the bot deployment to my personal server provided by Oracle.
The public facing part uses Actix Web, a popular and well-tested Web Framework. Toolforge provides the reverse proxy. Don't think there's anything exploitable to get RCE.
The bot always checks the original page for the template with the configured parameters before editing. If the sync template is removed by the original owner or any interface administrator, the bot will not edit the page.

dbeef [talk] 04:51, 9 June 2025 (UTC)[reply]

@Dbeef: To answer Chess about BotPasswords, there is just one checkbox for "Edit sitewide and user CSS/JS" that encompasses both. ~ Amory (u • t • c) 01:06, 10 June 2025 (UTC)[reply]

While anyone can send in data that looks plausible, the bot will only update based on source code returned from api.github.com. So malicious actors have to be able to modify the contents of api.github.com to attack that. How does the bot verify the contents_url field in a request made to the webhook is hosted on api.github.com in the same repository as the .js file it is synching to?

I'd be reassured by OAuth, mainly because it avoids taking untrusted user input into a bot with the permissions to edit MediaWiki:Common.js on one of the top ten most visited websites on Earth. Chess (talk) (please mention me on reply) 01:58, 10 June 2025 (UTC)[reply]

How does the bot verify the contents_url field in a request made to the webhook is hosted on api.github.com in the same repository as the .js file it is synching to? That's a really good point. I need to fix that. dbeef [talk] 02:10, 10 June 2025 (UTC)[reply]

@Dbeef: I'm uncomfortable with interface admin being granted to a bot that hasn't had anyone else do a serious code review.

Not verifying contents_url would've allowed me to modify any of the scripts managed by dbeef onwiki, to give an example.

OAuth limits the impact of any flaws to just making edits under certain user accounts. Chess (talk) (please mention me on reply) 14:34, 13 June 2025 (UTC)[reply]

@Chess: That is a valid concern and an oversight. It was originally not there when I queried raw.githubusercontent, but I noticed that that updated slowly. I then decided to use api.github.com but hadn't realized contents_url was user input.

That was quickly fixed two days ago.

I won't be of much help reviewing my own code, but maybe other people can take a look as well? Maybe we can ping some rust developers.. dbeef [talk] 15:17, 13 June 2025 (UTC)[reply]

I'm a C++ developer unfortunately. I know nothing about Rust and can't even compile the bot right now. Chess (talk) (please mention me on reply) 04:37, 14 June 2025 (UTC)[reply]

Has there been a discussion establishing community consensus for this task, per WP:ADMINBOT? I don't see one linked here, nor one from Wikipedia:Bots/Requests for approval/DeltaQuadBot 9. The community might also decide whether the OAuth route is preferable to the interface-admin route. Anomie ⚔ 11:13, 23 May 2025 (UTC)[reply]

Good idea, I'll post a summary to WP:VPT soon. beef [talk] 11:16, 23 May 2025 (UTC)[reply]

See Wikipedia:Village pump (technical)#Syncing user scripts from an external Git repository to Wikipedia beef [talk] 12:18, 23 May 2025 (UTC)[reply]

{{BotOnHold}} This is just until the discussion concludes (feel free to comment out when it has). Primefac (talk) 23:48, 25 May 2025 (UTC)[reply]

The discussion was archived at Wikipedia:Village pump (technical)/Archive 220#Syncing user scripts from an external Git repository to Wikipedia with a rough consensus to implement the bot. dbeef [talk] 03:31, 8 June 2025 (UTC)[reply]

Approved for trial (30 edits or 30 days, whichever happens first). Please provide a link to the relevant contributions and/or diffs when the trial is complete. I will be cross-posting this to both WP:AN and WP:BN for more eyes. Primefac (talk) 13:23, 8 June 2025 (UTC)[reply]

I will be deploying the bot in a few days and do some deliberate test edits to get this started. If any user script authors are willing to try this for trial please let me know :) dbeef [talk] 13:37, 8 June 2025 (UTC)[reply]

The linked discussion seemed to settle pretty quickly on using OAuth rather than interface editor permissions. Is that still the plan? Anomie ⚔ 03:07, 9 June 2025 (UTC)[reply]

That's not how I read it. It was explored as an alternative but to me it looks like more editors expressed support for the interface editor bot. dbeef [talk] 03:37, 9 June 2025 (UTC)[reply]

On reviewing again, it looks like I misremembered and misread. The subdiscussion that concluded in OAuth was about the possible alternative to interface editor. OTOH I'm not seeing much support for the conclusion that interface editor was preferred over (normal) OAuth either; the few supporting statements may have been considering only interface editor versus password sharing. Anomie ⚔ 11:31, 9 June 2025 (UTC)[reply]

It isn't necessarily an either/or thing. Both solutions can co-exist. If some people prefer the OAuth-based approach, they can of course implement that – it doesn't even need a BRFA. What's relevant is whether the discussion had a consensus against the interface editor approach – I don't think it does. – SD0001 (talk) 11:39, 9 June 2025 (UTC)[reply]

What's relevant is whether the discussion had a consensus against the interface editor approach – I don't think it does. As I said, I misremembered and misread. OTOH, dbeef claimed but to me it looks like more editors expressed support for the interface editor bot which I don't unambiguously see in the discussion either.

If some people prefer the OAuth-based approach, they can of course implement that – it doesn't even need a BRFA. I don't see any exception in WP:BOTPOL for fully automated bots using OAuth from the requirement for a BRFA. WP:BOTEXEMPT applies to the owner's userspace, not anyone who authorizes the bot via OAuth. WP:ASSISTED requires human interaction for each edit. WP:BOTMULTIOP does not contain any exemption from a BRFA. Anomie ⚔ 12:00, 9 June 2025 (UTC)[reply]

That's a fair observation. I do see support for an interface admin bot and I believe there are no substantial concerns that would make a blocker. I continue to think of interface admin bot as the easier solution but I am not opposed to figuring out the OAuth piece also at a later time. It is just that I don't have truckloads of time to focus on stuff that seems on its surface a bit redundant. dbeef [talk] 12:46, 9 June 2025 (UTC)[reply]

With OAuth, the edits would be from the users' own accounts. No bot account is involved as edits are WP:SEMIAUTOMATED with each push/merge to the external repo being the required human interaction. – SD0001 (talk) 13:43, 9 June 2025 (UTC)[reply]

I look at WP:SEMIAUTOMATED as having the user approve the actual edit, not just do something external to Wikipedia that results in an edit that they've not looked at. But this discussion is getting offtopic for this BRFA; if you think this is worth pursuing, WP:BON or WT:BOTPOL would probably be better places. Anomie ⚔ 12:01, 10 June 2025 (UTC)[reply]

Instead of requiring it to be linked and followed by some text in an arbitrary sequence, I'd suggest to use a transclusion for clarity, like: {{Wikipedia:AutoScriptSync|repo=<>|branch=<>|path=<>}} (perhaps also better to put the page in project space). – SD0001 (talk) 15:52, 8 June 2025 (UTC)[reply]

that's a little harder to parse but I suppose not too hard to implement, if parsoid can do it (hand-parsing is an option too). I'll take a look in the next few days. dbeef [talk] 16:02, 8 June 2025 (UTC)[reply]

After reading comments here, I'm unsure. (1) Why do we need a bot for this? Is there a need to perform this task repeatedly over a significant period of time? (Probably this is answered in the VPT discussion linked above, but it's more technical than I can understand.) (2) Imagine that a normal bot copies content from github to a normal userspace page, and then a human moves it to the appropriate page, e.g. first the bot puts a script at User:Nyttend/pagefordumping, and then I move it to User:Nyttend/script. This should avoid the security issue, since there's no need for the bot to have any rights beyond autoconfirmed. Would this work, or is this bot's point to avoid the work involved in all those pagemoves? (3) On the other hand, before interface admin rights were created, and normal admins could handle this kind of thing, do we know of any adminbots that were working with scripts of any sort, and if so, how did the security process work out? Nyttend (talk) 10:17, 9 June 2025 (UTC)[reply]

(1) Yes, because platforms like GitHub give better experiences when developing user scripts, instead of having people copy from their local code editor and paste to Wikipedia each time. This includes CI and allowing transpiled languages such as TypeScript to work. (2) is this bot's point to avoid the work involved in all those pagemoves - Yeah. (3) I don't think there was any bot that did this. dbeef [talk] 10:34, 9 June 2025 (UTC)[reply]

How are you handling licensing? When you, via your bot, publish a revision here you are doing so under CCBYSA4 and GFDL. What are you doing to ensure that the source content you are publishing is available under those licenses? — xaosflux ^Talk 10:07, 13 June 2025 (UTC)[reply]
I think I could put a section on WP:USync that says "by inserting the header you assert that any code you submit through the Git repository is licensed under CCBYSA4/GFDL or another compatible license", but that's the best I can do.

Would you want me to parse SPDX licenses or something? I think the responsibility is largely on the people who use the bot and not the bot itself when it comes to introducing potential copyvios. dbeef [talk] 15:09, 13 June 2025 (UTC)[reply]
Is a compatible license even common on that upstream? You can't delegate authority, whoever publishes a revision is the one issuing the license on the derivative work. — xaosflux ^Talk 18:31, 13 June 2025 (UTC)[reply]
This appears that it may end up whitewashing licenses. Anyone that reads any page from our project should be able to confidentially trust the CCBYSA license we present, including required elements such as the list of authors. — xaosflux ^Talk 00:56, 14 June 2025 (UTC)[reply]

SPDX is an industry standard and is meant for automatically verifying the licence of a source file. Would that be inappropriate here? Chess (talk) (please mention me on reply) 04:36, 14 June 2025 (UTC)[reply]
I was just wondering how exactly we should be doing it.

For example, we can require that one must use something like {{Wikipedia:USync|authors=Foo|license=MIT}}, with license being a manually approved list. dbeef [talk] 04:39, 14 June 2025 (UTC)[reply]
Including it in a template in the userscript makes sense, since then the list of authors' preferred attribution can be maintained on the repo instead of onwiki, while still being replicated onwiki.

The "license" field should probably be SPDX if that makes it easier to parse.

Specifically, the "licence" field should contain CC-BY-SA-4.0 OR GFDL-1.3-or-later since that matches the requirements for contributing to Wikipedia, which is that all content must be available under both licences. I don't think allowing MIT-only (or other arbitrary permissive licences) makes sense right now under the assumption it's compatible with CC-BY-SA/GFDL. We might have to maintain the MIT licence text, and the only people using this bot would be those writing userscripts specifically for Wikipedia. Multiply that by the many variants of licences that exist.

I think it's a good idea to keep the amount of parsing in the bot as small as possible given its permissions and impact. Chess (talk) (please mention me on reply) 02:30, 15 June 2025 (UTC)[reply]

If it matters, I can vouch that CI/CD is a basic requirement now for much of software development, so I'm generally supportive of the intent of this proposal. It's better because it creates a single source of truth for what is currently deployed to the Wiki. Chess (talk) (please mention me on reply) 16:35, 13 June 2025 (UTC)[reply]