This is an archive of past discussions on Wikipedia:Database reports. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.
For some reason, tonight's Empty Categories list has maintenance categories listed on it. They are typically ommitted as they would overwhelm the content categories and because they don't stay empty for long and they do not get tagged for speedy deletion, CSD C1. I'm not sure if this talk page is monitored so I'll just ping Jonesey95 and see if they know what has happened. LizRead!Talk!01:11, 15 September 2022 (UTC)
They've gone and changed the schema on us. See quarry:query/67346; what's now in lt_namespace and lt_title in the new linktargets table used to be in tl_namespace and tl_title (which now seem to always be 0 and ''?), and the database report still assumes they're there. —Cryptic01:34, 15 September 2022 (UTC)
Liz, you can ping me any time. I noticed that Wikipedia:Database reports/Transclusions of non-existent templates had been blanked by the bot this morning and figured that something screwy was happening with a database or one of the servers, so I restored the previous report and figured I'd give things a day to sort themselves out. The above wikitech-l posting is gibberish to me, but maybe Fastily will know if and what things need to change in that report. If this change affects a bunch of reports, I expect that we'll see a thread on VPT in the next day or two. – Jonesey95 (talk) 02:57, 15 September 2022 (UTC)
This is a pretty succinct statement of how to update queries that read from templatelinks. For the non-existent template report, for example, you'd need to change this to this. —Cryptic03:57, 15 September 2022 (UTC)
This is all like reading a Greek textbook to me but I have enormous confidence in your abilities to get to the bottom of this, Cryptic and Jonesey95. Thank you for looking into this. There are only one or two of us that utilize this database report but it's one I check daily and helps us keep on top of the category clutter than comes out of deleting articles at AFD and categories at CFD. It also helps us notice if a new editor (they are almost always new editors) goes on a tear, creating dozens of unused categories. And lately a very experienced editor has been working on a major job recategorizing pages that left hundreds of empty categories to tag and delete.
Now that I think about it, when there are problems with this list, I usually go directly to the bot operator, MZMcBride so I will ping him to this discussion in case he can follow all of this. I appreciate your help! LizRead!Talk!07:23, 15 September 2022 (UTC)
That was my error, not yours - I was fooled by there still being enough rows in templatelinks with tl_title not the empty string that the results looked right, when I hadn't found even a single non-empty instance before that. —Cryptic17:40, 19 September 2022 (UTC)
Cryptic and Jonesey95, it happened again on tonight's Wikipedia:Database reports/Empty categories. Looks like they are maintenance categories involving files and Proposed deletions. There are plenty of empty clean-up categories that aren't appearing on this list, it's the daily, not monthly maintenance If you tell me that this situation will be lasting a while, then I'll stop pinging you every time it happens. Just thought I'd let you know. LizRead!Talk!01:13, 16 September 2022 (UTC)
Sorry, I fixed another tool of mine (ours even), forgot about these. I'm traveling tomorrow, so it might not be until Saturday that I have time to fix the reports. Legoktm (talk) 05:51, 16 September 2022 (UTC)
I think fixed most of them, hopefully the next runs of the reports are better. If there's a monthly report that's off let me know and I can kick it manually. Legoktm (talk) 23:19, 17 September 2022 (UTC)
Ughhh, I have no clue why and I'm mostly offline tomorrow, if it's still wrong after tomorrow's update I'll start poking at it again... Legoktm (talk) 08:38, 22 September 2022 (UTC)
Is there any particular reason why Wikipedia:Database reports/Polluted categories only runs once a month? Given the importance of cleaning polluted categories out, and the fact that running it only once a month means that there are typically hundreds of categories to deal with by the time it actually updates (thus making it an onerous task that people become significantly less likely to bother with at all), once a month isn't often enough. Bearcat (talk) 15:41, 22 November 2022 (UTC)
Weekly would be best, if possible, but every two weeks would also be okay if there's a reason why weekly isn't feasible. Bearcat (talk) 16:03, 22 November 2022 (UTC)
{{Database report}} template can now be used to set up one-off or periodically updating reports in userspace or project namespace, given an SQL query. The template doc lists the supported formatting options. Feel free to give it a try and let me know if you face any issues. – SD0001 (talk) 15:43, 28 October 2022 (UTC)
Hi SD0001. I played around with the wiki template-based approach the other day on bizarrely subnested userpages (configuration) and I quite enjoyed it. The wikilinks formatting options of {{database report}} are neat and it's cool that it supports on-demand updates by clicking a link. Nice job.
If you haven't seen the news yet, BernsteinBot has been disabled. HaleBot will take over most of the tasks that it used to do. There are a lot of scattered reports in various places, if you notice something isn't updating, please leave a note here and ping me.
Legoktm, thanks to you and the bot approval team for your swift action. And thanks to all of our bot operators, like MZMcBride, of past and present. The tools you create make our editing lives so much easier. LizRead!Talk!02:00, 13 October 2022 (UTC)
I tried to get Unused templates working last night but messed up with the subst:#time calls, will fix that tonight. I found the code for the (filtered) report, I'll set that up tonight too. Uncategorized templates should be set to go on the regular schedule. Legoktm (talk) 15:43, 14 October 2022 (UTC)
I saw the update and figured you were working on it. Did you notice that there were undesirable underscores, and that links with parens in them were not quite right, e.g. 1910s_in_music_ (in code here to make sure that the underscores show)? Maybe that's all tied up in the subst work. – Jonesey95 (talk) 17:00, 14 October 2022 (UTC)
I did not notice that, it was me being lazy by using the pipe trick. Should be fixed now, though the last page of the report is missing because of an edit filter I just fixed. The (filtered) report is running daily now too. Legoktm (talk) 02:48, 15 October 2022 (UTC)
Nice work. It's good to have the reports running again. Please see this discussion for suggestions about how the filtered report could benefit from a few more filters. It should be able to fit on one page pretty easily. – Jonesey95 (talk) 05:32, 15 October 2022 (UTC)
After cleaning up this report somewhat, I have noticed that redirects and soft redirects are included, but those pages are valid when an editor's username has been changed. Ideally, those pages would be excluded from the report. Soft redirects are in Category:User soft redirects. Regular redirects are often, but maybe not always, in Category:Redirects from moves. – Jonesey95 (talk) 19:00, 8 December 2022 (UTC)
@Jonesey95: the report is running daily now, I can adjust the frequency to monthly if you want. Redirects should be skipped, I'll get around to soft redirects shortly. Legoktm (talk) 07:22, 12 December 2022 (UTC)
Yes but there is clearly an issue since it's coming out blank which is definitely not correct, there are certain articles that have a userspace link for legitimate reasons (Jimmy Wales for example), if the report was running correctly these would be listed--Jac16888Talk15:53, 16 December 2022 (UTC)
Would it be possible to have a database report listing WP:featured articles by word count or readable prose size (not wikitext size) or is there a better way to produce such a list? If it's to be a database report, it wouldn't need to be run more than once a month. Thanks! HJ Mitchell | Penny for your thoughts?18:33, 11 February 2023 (UTC)
Hi @HJ Mitchell, Is there a definition for how "word count" or "readable prose size" should be measured? Specifically, is there a list of what elements should be excluded or not? Legoktm (talk) 00:21, 13 February 2023 (UTC)
@Legoktm and MZMcBride:Wikipedia:Prosesize is able to produce prose size in both characters (bytes) and words; could that definition be used? I'm aware of FA/By length but as you say MZM, it only measures the the total page size, which doesn't necessarily bear any resemblance to the amount of prose. Articles that cite lots of sources (especially web sources), for example, use more markup and hence have more wikitext than those that cite fewer sources more heavily (eg books). Hence Taylor Swift (10111 words) tops that list and Douglas MacArthur (18679 words) is at #34. Thanks, HJ Mitchell | Penny for your thoughts?14:13, 13 February 2023 (UTC)
I guess we could...could you just link to the report from the category instead? It would be simpler than having to maintain a special case just for this report. Legoktm (talk) 23:35, 24 February 2023 (UTC)
I suppose we could make the bot only update part of the page? We could use some kind of a marker to determine which part needs to be overwritten and which parts don't. 0xDeadbeef→∞ (talk to me) 04:53, 25 February 2023 (UTC)
HaleBot is sleeping
HaleBot's been quiet for the past couple days. Hasn't made any edits. Does it have a new, more relaxed and human-like editing schedule? Just runs its reports whenever it's in the mood for that? wbm1058 (talk) 11:57, 28 April 2023 (UTC)
Please add Category:Templates for deletion to the list of categories that are excluded. Some templates (such as Template:S-line/RB-SN right/31) still appear on the list after being sent to TfD which makes it harder to see what new templates were added to the report.
Hi everyone. I think that the list of most watched pages should be updated, since the last time it was run was 6 years ago. Can this be done without too much difficulty? Thanks! Trawle (talk) 13:32, 7 June 2023 (UTC)
Hi! On May 14th, 2022, AnomieBOT began maintaining CAT:COIREQ. Is it possible to obtain the edit history for this bot's predecessor? Specifically I'm looking for the equivalent to this page, so that I may view the Xtools page which includes historical graphs of the edit request levels going back 5 years which AnomieBOT, of course, doesn't have. This data would be much appreciated. Thank you so much for any help! Regards, Spintendo06:22, 5 July 2023 (UTC)
I’m curious if the configuration of the Untagged stubs report can be changed to remove articles whose titles start with “Lists_of”. My argument here is that a decent number of these currently appear on this report (polluting it a bit), and these pages cannot be reasonably expanded without further lists being created. Since set indices and disambiguation pages are already excluded, I’d see the argument for excluding these similar “list of lists” pages. I may have already asked for this, so sorry if this is a repeat request! Michaelwallace22 (talk) 21:40, 12 July 2023 (UTC)
@Legoktm@0xDeadbeef, it's not immediately clear to readers why some bots are linked and others not at Wikipedia:List of bots by number of edits. It looks like it's based on activity, but I think it would be an improvement to link all of them and perhaps just add a "Currently active?" column if we want to communicate that info. Cheers, {{u|Sdkb}}talk23:26, 10 August 2023 (UTC)
Ah, I see. In that case, I think both lists could use a change — lots of people might be curious why e.g. a top-100 editor has stopped editing and want to check out their userpage, and linking in black makes that harder. (I did add a note of what the black-linking means to the bot list, but I hope that's just an interim step.) {{u|Sdkb}}talk15:24, 12 August 2023 (UTC)
I've modified the way Template:Attached KML is registered a while back and since then most of the sub templates have been added to articles. Can this exclusion be removed from the report so it can now be tracked? Gonnym (talk) 18:12, 31 July 2023 (UTC)
Not sure why Halebot is adding a colon to the front end of the Lalbijo2020 file's link on this page, but doing so is a syntax error that has effectively been culled from Wikipedia, and it would be nice for various gnomes to not need to fix it weekly. If the bot or that entry could be modified to keep this from occurring each update, that would be great. Thanks. Zinnober9 (talk) 18:26, 18 August 2023 (UTC)
Dicklyon recently removed the R avoided double redirect rcat from Novint falcon as it was listed in the linked miscapitalizations report. I reverted that edit, as it seems like this is the situation where {{R avoided double redirect}} is supposed to be used; if the correctly-capitalized Novint Falcon redirect were expanded into a full article, the other one would need to be changed to a link to that new article.
Special:WhatLinksHere/Novint falcon shows Novint falcon as transcluding itself, which I assume comes from {{R avoided double redirect}} (Module:R avoided double redirect verifies that the current article's redirect destination matches the specified article's redirect destination, which I guess must be listed as transcluding). (The edit also removed the parameter to {{R from miscapitalisation}} but that one was a link to the correctly capitalized form so I don't think it was the cause.)
I think that all that would be needed to fix this is to change the query on both linked miscapitalizations and linked misspellings to include a p1 != p2 check; if a redirect is linking or transcluding itself it's probably fine. However, I'm not sure how to actually make this change or if there's another aspect to this I'm aware of, or even who's responsible for maintaining the code that updates these reports. --Pokechu22 (talk) 21:50, 9 September 2023 (UTC)
BlackcurrantTea, hmm. I assume those must come from other templates too then. I notice that Wikipedia:Example of a redirect has a self-transclusion, but 48 hours and 48 hours to life don't. Another interesting set of examples is 2. Divisjon and Talk:2. Divisjon; only the talkspace one has a self-transclusion (but the mainspace one is also transcluded by the talkspace one). So I guess that means that this is a more general issue.
@0xDeadbeef: Thanks! I think the same change also needs to be made on the linked misspellings report too since it has similar logic, though I'm not 100% sure of this. --Pokechu22 (talk) 18:24, 11 September 2023 (UTC)
Another case of unneeded listing comes about via redirect tags such as {{Redirect|Cityrail|the former New Zealand rail operator|Tranz Metro}} at CityRail. Can that be fixed to take Cityrail out of the report? Dicklyon (talk) 17:38, 12 September 2023 (UTC)
That would probably need us to remove "transclusion" type links with SQL queries, which needs some investigation on what needs to be done. I'm quite busy right now, so feel free to put up a pull request if you are able to implement this. 0xDeadbeef→∞ (talk to me) 11:53, 13 September 2023 (UTC)
At Wikipedia:Database reports/Linked miscapitalizations, the inclusion of the "number" column makes it very hard to see what the diffs are, since the diff tends to align numbers rather than names. It would be equivalent, I think, to be able to sort on article name, rather than the number. Is there any reason to not just do away with the number column? Dicklyon (talk) 16:39, 21 August 2023 (UTC)
@Gonnym: Yeah, MZ said the same a while back. If someone wants to send a PR enabling it for this report that would be appreciated. (Or, if you're feeling courageous, flipping the default.) Otherwise I'll get to it...later. Legoktm (talk) 19:25, 22 August 2023 (UTC)
@HaleBot: Maybe just do away with the number column? And add a summary at the top with number or articles and total number of links, which would be good for progress tracking? Dicklyon (talk) 05:56, 4 October 2023 (UTC)
You know what, I just deployed my change to the codebase that would make all eligible reports use static row numbers. I was holding off waiting for Legoktm to review it first, but since he's on a wikibreak.. I haven't thoroughly tested it, so this might break some reports, let me know and I'll fix. 0xDeadbeef→∞ (talk to me) 12:27, 4 October 2023 (UTC)
If you look at the list Wikipedia:Database reports/Linked miscapitalizations sorted by number of links, you typically see a whole lot with just one link, and then a bunch with 10 or more. That's because I'm focusing on the ones with 2 to 9 links. The ones with just 1 link accumulate as an indication of what's happening recently. The ones with a lot of links need someone with AWB or JWB to handle efficiently. For the ones with a few links, edting the linking articles in tabs is efficient enough. Dicklyon (talk) 03:52, 7 October 2023 (UTC)
Top new article reviewers report code needs to be updated
There was a recent change to PageTriage, where the logging of reviews is split based on whether the target is an article or a redirect. This is causing the Wikipedia:Database reports/Top new article reviewers report to give wrong results. Please change any queries in the code to replace instances of log_action = 'reviewed' with log_action in ('reviewed', 'reviewed-article', 'reviewed-redirect') This should fix the problem. -MPGuy2824 (talk) 03:30, 9 November 2023 (UTC)
@MPGuy2824 @Novem Linguae Thanks for the ping! Partially fixed with 80b6552, but I think the counting of redirects is still sort of broken. My understanding (please correct me if I'm wrong): For historical data, we need to still go by page.page_is_redirect, but for where data is available, we should sum where log_action = 'reviewed-redirect'. Is that correct? — MusikAnimaltalk15:49, 10 November 2023 (UTC)
Yes, that is correct. Since this report calculates data over the previous 365 days, we can remove the code that takes care of historical data only after that time. I've set a reminder for myself via W-Ping. Thanks for the quick fix, btw. -MPGuy2824 (talk) 03:11, 11 November 2023 (UTC)
/Blocked users in user group
Good evening fellow Wikipedians, so the database report above is no longer updated since October 6 of last year. The not who was updating it, BernsteinBot (talk·contribs), hasn't edited since October 12, 2022. Should we archive the report or get another not to take over the updating? Toadette(Happy Thanksgiving!)18:34, 22 November 2023 (UTC)
I suppose I can finally get around to looking into how {{database report}} works (no interest in running a bot ever again after the way mine was treated). The query in the configuration is severely out of date - besides the schema changes, it doesn't cull the extendedconfirmed group, currently at 5867 blocks, and I'm sure it was close to that when BernsteinBot was still running - but that's easy enough to fix. —Cryptic19:07, 22 November 2023 (UTC)
Database reports from searches
I don't know if there's any way to do this efficiently, but there are a couple searches I have devised that reliably turn up a lot of busted formatting. They are not obtained by querying the database directly, but is there any way to get them on a page such as these? Here are a couple examples:
[2], which is insource:/\[1\]\[2\]/ in mainspace, i.e. the string "[1][2]" appearing in the page's source. This almost always means that someone has messed something up and copypasted a sentence from their browser into the edit window, destroying references.
insource:/\<sup\>\{\{.itation needed/ in mainspace. This detects when someone has used {{citation needed}} in superscript tags.
The big daddy of them all: insource:"citation needed" -insource:"needed|date" -insource:"needed|reason" -insource:/\{\{.itation .eeded\}\}/ -insource:"needed span" -insource:"needed lead" -insource:"needed paragraph" -insource:"needed section" -insource:/on-ne/ -insource:/ded \(Wi/ in mainspace. This gives busted {{cn}} attempts, where somebody just typed "[citation needed]" or "(citation needed)" etc into an article instead of invoking the template. I have a huge regex to fix a few dozen of the most common types of this error in my JWB settings.
Et cetera, et cetera. Usually I fix these myself from JWB but I feel like others would enjoy helping with this as well. Is there a way to set up a bot to do search reports for stuff like these? jp×g🗯️22:08, 29 October 2023 (UTC)
Hi @JPxG: I think we can just have a page that is a collection of these search links and maybe have a bot that updates the hit count daily (to track the approximate number of pages)? The search function gives instant results, which is probably preferable over a page updated periodically by bots. 0xDeadbeef→∞ (talk to me) 10:09, 30 October 2023 (UTC)
Oooh, given that elasticsearch replicas exist (TIL!) it would be nice if we can make use of the extra features. Though if the web UI search is sufficient in some cases I still don't think bot reports would be necessarily beneficial? 0xDeadbeef→∞ (talk to me) 11:37, 30 October 2023 (UTC)
Rethinking this, its actually probably quite beneficial to have a community maintained list of search queries where a bot would come by and update periodically. Its better at tracking stuff and makes it better for editors to navigate. 0xDeadbeef→∞ (talk to me) 08:59, 23 November 2023 (UTC)
Sorry, this is my fault. Should be fixed now and I just kicked off a run. I'll be back online in like 6 hours in case it didn't work to debug further. Legoktm (talk) 21:06, 5 December 2023 (UTC)
Ok thanks. I can be patient:) I only mentioned it because the one discussed above, Wikipedia:Database reports/Uncategorized templates, updated several hours ago. While the others have not yet updated. DB1729talk01:47, 6 December 2023 (UTC)
OK, I think all the reports are up to date, except the article streak ones. If anything did not get an update, please let me know and I can look again when I wake up in a few hours. Legoktm (talk) 07:31, 6 December 2023 (UTC)
Ugh, that's wild. It might be a few days before I can look in depth. I wonder if one of the DB replicas is out of sync with the others...or maybe something changed and our query is just busted now. Legoktm (talk) 07:29, 6 December 2023 (UTC)
OK so I tracked down phab:T354089, which seems to be that the replica has fallen out of sync with production, causing some weirdness, but there's more to the story, I'm still debugging. Legoktm (talk) 04:45, 29 December 2023 (UTC)
There are 1,675 templates listed on the report at this writing, which is probably about the right number. We'll see if it fluctuates into the 200–300 range, as it has been doing, or if it stays relatively stable. Thanks for continuing to track down this strange problem. It's challenging to debug a problem when you are not convinced that you have found the actual cause of the problem. – Jonesey95 (talk) 14:13, 29 December 2023 (UTC)
I'm not panicking yet, but HaleBot has not edited for a couple of days. Over 48 hours, if my math is right. It averages about 45 edits per day, so a two-day break is unusual. – Jonesey95 (talk) 05:01, 22 February 2024 (UTC)
See T358175. It's trivial to restart, but I've left it in a broken state in case it makes it easier for Toolforge admins to diagnose the underlying root cause. Legoktm (talk) 05:23, 22 February 2024 (UTC)
Wikipedia:Database reports/Unused templates (filtered) update related to Module:Pagetype
Looking for a template with no transclusions is much easier than just looking for one that happens to be a self-transclusion...I'm thinking of how to restructure the SQL query to accommodate this, if anyone wants to propose a better query that handles this, please do. Legoktm (talk) 04:02, 16 February 2024 (UTC)
I don't see why it would need it? Just add a clause to the templatelinks join; you already have the template page's page_id. quarry:query/80586. Also note the backslashes in the LIKEs; underscore is a metacharacter. —Cryptic06:17, 22 February 2024 (UTC)
@Cryptic: awesome, I'm glad you're better at SQL than me :) Would you like to submit a PR with your improved query? Otherwise I'll get to it shortly. Legoktm (talk) 05:11, 23 February 2024 (UTC)
We have another issue which could be related to this change. Template:Anarchism US shows a transclusion at its talk page but it's not used there. So the updated code should also check if the self transclusion is from its own talk page. Gonnym (talk) 07:49, 13 March 2024 (UTC)
That's a strange one. I'm guessing that one of the "new pages" lists causes this check somehow. I wonder if the problem will resolve itself after the new template page (created March 11) falls off of the list eight days after its creation. – Jonesey95 (talk) 16:29, 14 March 2024 (UTC)
Polluted categories
I wanted to ask if it's possible to generate an earlier-than-usual update on a report. I hadn't personally done a runthrough on Wikipedia:Database reports/Polluted categories in about a month or two while assuming that other people were staying on top of it, but it turns out they weren't — so when I went back to it this morning there were 1,000 categories on it, which is its generation limit, and that limit had only gotten it to the letter P, meaning that there are potentially dozens or hundreds more categories hiding on the other side of the wall.
So I've trudged my way through cleaning up what was there (pity me), but wanted to ask if it's possible to run an early update to catch the post-1,000 stuff instead of having to wait three more days for the regular weekly update. Bearcat (talk) 16:32, 26 March 2024 (UTC)
Weekly potential U5s database report not updated for nearly a year
I'm looking at the query and I don't see how it ever worked. (Besides doing things very inefficiently, it can't see user pages created after late July 2018 - intentionally, though I can't fathom why - nor users who have any deleted edits, which is probably accidental.) I'll see if I can't come up with something that does what I think it was trying to. —Cryptic19:50, 14 April 2024 (UTC)
Now sorting by whether it's a redirect first then by page length, which, while not as good as user creation time, is more likely to be useful than alphabetical. (Sorting by redirect is needed to make it reasonably fast, and there's only a handful of user page redirects that meet the other criteria. And they're likely all problematic anyway.) @Legoktm: This query can be dropped into /dbreps2/src/enwiki/webhostpages.rs without other changes, or I can take over this report with SDZeroBot's {{database report}} if you prefer. —Cryptic20:40, 14 April 2024 (UTC)
I have been quite busy these weeks. Feel free to open a GitHub issue/pull request, or ping me here again on the weekends to nudge me.. 0xDeadbeef→∞ (talk to me) 13:57, 18 April 2024 (UTC)
Based on what I encountered, the deleted edit thing was probably supposed to make it not list user pages of users whose only contributions are creations of deleted pages and who have warnings and stuff on their User: instead of User talk: for some reason (example: User:ISpeakTruth). Doesn't seem to be that many of those, and these situations can no longer occur, so it's probably not needed. Flounder fillet (talk) 21:40, 14 April 2024 (UTC)
That's possible, but I really don't think it is. The way it was programmed makes it look like it was accidental - it checks the current total of non-deleted edits in user: and user talk: and compares them against the user_editcount field. If it were intentional, directly checking for deleted edits in the archive table would be a more natural way to do it, more accurate, (much) faster, and could be made to only exclude users with deleted edits in non-user/usertalk namespaces besides. —Cryptic21:52, 14 April 2024 (UTC)
I wrote some versions of this query. Of course the query worked previously, the proof is in the page history. My off-hand guess for why it broke is that some query planner got worse or some index got changed and the query is now taking too much time or CPU to generate, but who knows. Sometimes it's a database field that's been renamed, sometimes it's something else entirely.
Cryptic, you should have access to look at the logs yourself, but if you don't for some reason, that seems like the real issue here. I don't know why you'd need to ping Legoktm and others, that seem very silly.
This query made some heuristic choices for finding these types of potentially problematic user pages. These choices obviously have trade-offs. In particular, I happened to be focused on older and longer pages, which is why I added restrictions on page.page_len and page.page_id. I personally also wanted to only find cases where the user had only edited in two specific namespaces, at least to start. However, there are lots of cases that won't be included as a result of making these choices. If a user made a single spam edit to a real article as well as spamming their user page, they wouldn't be included in this report as-written. In cases where user.user_editcount is wrong, this report could omit some pages. In cases where the page length is 498 bytes and still promotional spam that should be deleted, it wouldn't be included here. And so on.
Improvements to this and any other database report are always welcome. I thought the archive table was no longer available in database replicas, but I may be mistaken. Let's see you all do better. Please. :-) --MZMcBride (talk) 07:48, 15 April 2024 (UTC)
Quarry says my show tables; query against enwiki_p has been queued for 21 minutes now, but I was able to run this query against a database in a different cluster and archive and friends are still available. I guess I was thinking of something else. I'm doubtful it will be efficient or quick to use the archive_userindex table or similar, but I'm very interested to see what you all come up with to uncover more pages to be reviewed and potentially deleted. --MZMcBride (talk) 08:08, 15 April 2024 (UTC)
Hi, I just wanted to note that the report linked above didn't run on Thursday like it usually does. Just thought I'd bring it to the attention of whomever needs to know. Thanks for all your work maintaining these reports. LEPRICAVARK (talk) 13:46, 18 May 2024 (UTC)
There's a link to source on its user page. Most reports are a single query, which makes picking them out easy even if you don't speak Rust. A few have significant post-processing or followup queries, though I haven't found one yet that couldn't be done - perhaps a bit less easily, granted - in a single query. —Cryptic15:27, 22 May 2024 (UTC)
So the discussion is here because User talk:HaleBot was retargeted to here. User:HaleBot lists its two maintainers. The owner, who only reluctantly takes these on after their creators abandon them and nobody else steps up, hasn't edited in 2 months and his recruited assistant isn't healthy. The problem is probably a hiccup on Toolforge of some sort. You can't just put Toolforge jobs on autopilot and expect them to run forever.
"See T358175. It's trivial to restart, but I've left it in a broken state in case it makes it easier for Toolforge admins to diagnose the underlying root cause. Legoktm (talk) 05:23, 22 February 2024 (UTC)"
Done, though it didn't make any difference to the results. Asking at WP:RAQ might get more sets of experienced eyes on such questions; I don't know how many of the regulars there also watch this page. —Cryptic08:39, 23 May 2024 (UTC)
Usurping these reports in-place still needs cooperation from the bot's maintainers, or else it'll eventually overwrite the migrated (and possibly improved) queries. —Cryptic08:51, 23 May 2024 (UTC)
Thanks. The bot seems happy for now. Want to describe the bug for us? Are willing to take on some mods as I was suggesting above? Dicklyon (talk) 04:32, 25 May 2024 (UTC)
It is fixed here. As for your suggestion, I'm not sure what you mean by a piped link. As in, a link whose text is not the same as the target? 0xDeadbeef→∞ (talk to me) 11:15, 25 May 2024 (UTC)
Thanks! Yes, WP:Piped links display different text, so sometimes a link to a miscapitalized redirect isn't in need of a fix. Often, though, the displayed text is also over-capitalized. It would be awesome to have different reports or different counts of these things. Dicklyon (talk) 23:37, 26 May 2024 (UTC)
In this edit, the bot updated the date, but didn't otherwise update the report, which I know should have a ton of changes reflecting my hundreds of case-fixing edits yesterday. Never seen that before... Dicklyon (talk) 15:41, 4 August 2024 (UTC)
Much of what shows up in Wikipedia:Database reports/Linked miscapitalizations is due to piped links that have no affect on the article appearance, and I spend a lot of time fixing them so that I can get down to what matters in the report. And I take a certain amount of flak for fixing things that don't affect the article appearance. If those piped links were simply skipped, the report might be a more useful list of what to fix.
On the other hand, quite a few of those piped links also have miscapitalized link text in the article, so are still worth looking at sometimes. Maybe we could have reports both ways? Or separate counts of piped and not? Other ideas? Dicklyon (talk) 17:07, 4 May 2024 (UTC)
The problem is that the text of the piped links aren't stored in the database, just the links themselves. I'll think a bit about this, maybe a separate tool that did further processing would do the trick. Legoktm (talk) 17:23, 30 August 2024 (UTC)
Longest short description
A short description is usually seen in the search bar, and gets cut off after around 40 characters. But it's not hard to find SD's about twice that long,[3][4] and perhaps even longer than that. I wonder what is the longest short description. Wizmut (talk) 06:53, 9 May 2024 (UTC)
After some digging and learning I found that it is possible to do a Quarry search for these.[5] But it might still be nice to have a page dedicated to these cases. Wizmut (talk) 10:14, 9 May 2024 (UTC)
The category is populated by short descriptions with length greater than 100; the report is apparently greater than or equal to 100. The only page that would have appeared in both was Cuaderno, whose description has been shortened since the report was last generated. —Cryptic00:08, 1 September 2024 (UTC)
I've been doing some file cleanup lately and have come across a significant number of files which are tagged with a non-free use rationale (such as with {{Non-free use rationale}}) but which are missing an actual file copyright tag (such as those listed at Wikipedia:File copyright tags/All. Per the section on that page regarding non-free content, non-free files are required to have both a rationale and a copyright tag. Currently, the report at Wikipedia:Database reports/Files without a license tag does not seem to include files which do have a rationale but which do not have an actual license/copyright tag; for instance, File:Esther Applin 1944.jpg (which should presumably get tagged with {{non-free biog-pic}} unless it is found to be public domain or similar), among many others I've seen in the course of my recent cleanup work. Would it be possible to add files in this situation to the report, or else create a separate report for them? 🔹Blue (talk/contribs) 21:53, 1 September 2024 (UTC)
The untagged stubs report is backlogged with a lot of soft redirects to Wikiquote, Wikispecies, etc and lists of lists. It would be useful if these were ignored. The majority of the list is currently false positives which prevents new entries from being added. CFA💬19:30, 13 July 2024 (UTC) (please mention me on reply)
Hi @CFA, I've done the first part by excluding more soft redirects. Can you give an example of the lists of lists you're seeing? The report should already exclude articles that start with "List of" and "Lists of". Legoktm (talk) 06:45, 26 August 2024 (UTC)
You're welcome! And, thanks for the examples, I've added some more filtering conditions that should exclude those in the next run. Legoktm (talk) 04:39, 29 August 2024 (UTC)
I might be useful, if we're going to change that page, to convert it to use the format at User:Jonesey95/self-transcluded-templates so that regular editors (or template editors, if we want to protect the report a bit) can make changes like the above after discussion. That way, nobody has to mess with off-wiki code. – Jonesey95 (talk) 00:19, 10 September 2024 (UTC)
Dusty Articles should exclude soft redirects and potentially Set indexes
@Legoktm since the report already excludes hard redirects, it should exclude soft redirects, which is currently does not do, for example, various wiktionary redirects like Technical tap, this could probablly be resolved by excluding pages found in Category:Wikipedia soft redirects and it's subcategories.
Another issue is Set index articles, which are functionally another disambiguation page and often do not need edits for long periods of time, such as some surnames, geographic details, etc. I'd propose excluding these from Dusty Articles and perhaps have it be a sub report exclusive to set indexes, maybe for particular categories of sex indexes, as sometimes they can be genuinely overlooked(i.e. an article was created for a person with an obscure surname that does indeed already have SIA). Anyway I understand this to likely be a much more nuanced issue to resolve than the soft redirects so more disccusion is likely needed on that matter. Akaibu (talk) 21:40, 6 October 2024 (UTC)
TV articles with "was"
Could we get a database report on TV show articles that have "was" in the first sentence (i.e., Name of Show was...) MOS:TV has dicated use of "is" since forever, but I'm still finding "was"es all over the place. Ten Pound Hammer • (What did I screw up now?)20:56, 27 October 2024 (UTC)