CC-By-SA files have a requirement for attribution by URL. Lots of people forget this when they use |link=|. The query below doesn't work since usability team decided to merge the file list into WhatLinksHere. Although a dump report is still possible.
That's true, of course, but we should know about any that do appear so that they can be populated or deleted. 'Non-article' has almost wholly been superseded by 'NA-Class', so there is no good reason for empty 'non-article' categories to remain undisturbed. -- Black Falcon(talk)18:49, 13 January 2012 (UTC)[reply]
The first order of business is fixing this report all together- It's supposed to be updating daily but hasn't for over 9 days. Removing that exception seems reasonable to me, however. VegaDark (talk) 18:57, 13 January 2012 (UTC)[reply]
Over at Bot Requests, Baa made a proposal about something that could find articles with the least amount of edits in X years. They suggested I take it over here since toolserver can apparently handle it better. Ten Pound Hammer • (What did I screw up now?)01:43, 8 February 2012 (UTC)[reply]
Hello. Some of you might have run into me before...I am doing a research project on bots, bot operators, and technical tools on WP and WM projects. I'm wondering if anyone wants to tackle this problem, which would help me out tremendously. I am looking for stats and data on bots, especially over time. Things like:
(#) of bot accounts registered over time (by month would be fine) (on English WP)
(#) of bot edits over time (on English WP)
(#) of BRFA approved and not approved over time (on English WP only, obviously)
same trends for bot use on other language versions (which would be a bonus)
I've found some info on these things spread around WP, but nothing that is both up-to-date and reasonably accurate/reliable. If you're interested in investigating this with me, I'd really appreciate the help (I don't have the technical knowledge base personally to find this data). Please let me know here or on my talk page.
And if you're a bot operator or Wikimedia developer (or someone who deals with the technical infrastructure of WP) and you'd like to be interviewed, please see my call here.
I think you might also want to know how many bot accounts are active (i.e. being used) and how active they are (edit count) over time. The count of bot accounts will just rise and rise, as it's unusual for a bot account to have the flag taken away. Josh Parris22:42, 25 February 2012 (UTC)[reply]
Hi Josh Parris and Tim1357...Thank you both for your comments and your work. Josh...yes, stats on active versus inactive bots would help. Tim...this looks great (thank you!) Can you briefly explain to me how you got these numbers? For the second bullet point, I meant # of bot edits per month and year, though knowing the total # of bot edits to date would also be helpful (and in turn, what percentage of all edits that represents). Are you also able to do this for other language versions? Thank you again for the help! UOJComm (talk) 22:29, 27 February 2012 (UTC)[reply]
The categories Images made obsolete by a PNG version and Wikipedia images available as SVG contain files which have been superseded. Many of these images have been replaced in articles with their superseded versions, but there are a lot of articles which still need to be updated to use superseded images. I would like to have a list of files from each category that are still used in main space, preferably sorted by the number of articles the file is used in. I can't seem to find a way to do this with automated tools such as AWB or CatScan, but if there is a way I'd be happy to do it myself if possible. These lists would help me with the running of my bot (DanhashBot). Can anybody write and run database queries for these two lists, or explain how I can use an automated tool to compile such a list myself? Thanks! —danhash (talk) 00:57, 1 April 2012 (UTC)[reply]
Dom says that https://toolserver.org/~magnus/glamorous.php can do what you want to do, but it only works for Commons currently. You'll need to poke Magnus to add a database selection option.
It would be much easier to have this information in the form of a database report. Glamorous seems to be able to provide (all or some of) the information I'm looking for, however it lists every language Wikipedia and is in a format that is very hard to use for this task. Any possibility of a new database report being created for images in use on the English Wikipedia? —danhash (talk) 18:04, 5 April 2012 (UTC)[reply]
So what I'm imagining is a report with four sections. It would look something like this:
Superseded files used in articles; data as of ...
== Images made obsolete by a PNG version ==
| No. | File | Uses
== Wikipedia images available as SVG ==
| No. | File | Uses
== PNG version available ==
| No. | File | Uses
== Vector version available ==
| No. | File | Uses
How many of those 47,000 are used in articles? I'd think that the list wouldn't be so long that reasonable pagination would be a problem, but if so the report could be limited to the top 300 or so files (or any other number that seems reasonable). —danhash (talk) 19:57, 11 April 2012 (UTC)[reply]
I have the original two lists that I got from jira:DBQ in a sandbox, but I can't sort them by number of uses. Sorting is not technically necessary, but on images with a small number of uses (especially those with just one or two uses) it is easier to update articles manually as it's not worth setting up AWB. I want to start with files with the highest number of uses, and with the current list it's not easy to find those files. There are other solutions I'm sure (for example I could ask Hoo man to re-run the query and sort the output (should have thought of that when I first asked him)), it just seems like a database report here would be the easiest to use and the simplest to update, but if it's too difficult or time consuming to do I can certainly find another way. —danhash (talk) 14:16, 17 April 2012 (UTC)[reply]
Ummm apparently I made a mistake. When you said you missed the "in articles" bit, I thought you meant that you listed the images in each category, but forgot to limit the list to images that were used in articles, so that the database report was a copy of the images in each category. But that was a total misunderstanding on my part and I'm really not sure how I got confused like that or how I didn't figure it out sooner; sorry about that. When you said you missed the "in articles" bit, did you mean that the uses column counts all uses, not just article-space uses? —danhash (talk) 13:22, 18 April 2012 (UTC)[reply]
Yes, sorry. The current "uses" column accounts for links from any namespace. I'll try to fix this tomorrow.
The Commons component made this report significantly more complex. Because now we're querying Commons for the file names, but we have to set the scope to only English Wikipedia uses, otherwise you end up looking at usage on Commons (or Wikimedia-wide file usage). --MZMcBride (talk) 03:20, 19 April 2012 (UTC)[reply]
I can provide a list of unwatched by active users of the 10,000 most linked pages. However I doubt with watchlist bankruptcies and fatigue of veteran editors that we'll get 500 useful active watchers. — Dispenser21:01, 13 April 2012 (UTC)[reply]
The query ran for about 50 minutes. We have 507,131 articles with no watchers and another 382,684 articles with no active watchers. Active users are those who's user_touched is within the last 30 days, same as $wgRCMaxAge (It was more accurate before automatic log out time was increased from 30 to 180 days). Per Toolserver Rules I can't give this list away. — Dispenser05:25, 15 April 2012 (UTC)[reply]
Thanks! So, as I understand it, that means 900,000/3,900,000 are not being monitored? Can you think of any way to prioritize those 900,000 into a smaller sample (top 1,000 or 10,000 most-viewed pages) and then another way to get the TS to give permission to give the list (say via email) to trusted users who could add subsets to their watchlists? I'd personally be willing to drop a 500 page chunk on my watchlist. MBisanztalk17:21, 15 April 2012 (UTC)[reply]
Assuming a median network latency of 0.3 sec × 900,000 pages = 3 days to run the report. While OK for a onetime job, its not viable for the long term. I had originally sorted by amount of incoming links (there's correlation view count), but I found the pages rather uninteresting. Even built a tool to select from WikiProjects. Still nothing interesting, although seeing WikiProject Fictional characters with a lower unwatched rate (8%) than WikiProject Biography; WikiProject Poland (77% unwatched) beat WikiProject Antarctica (70% unwatched) was amusing. So many stubs, orphans, and obscure incomplete list articles.
Anyway, following from the discussion I hacked up recent changes to only show one-fifth of articles, tools:~dispenser/cgi-bin/unwatched_changes.py. Its clear to me now that we need to rethink RC patrol. AJAXified with "X new changes, update now" atop, articles scope to your interests, inline-diff, reverts automatically hidden, user karma/hours, diff verification with multiple levels, Cluebot vandalism score, improved automatic edit summaries, and more. Only if people cared enough. — Dispenser05:42, 1 May 2012 (UTC)[reply]
SELECT
page_title
FROM page
LEFT JOIN pagelinks
ON pl_title = page_title
AND pl_namespace = page_namespace
AND pl_namespace = 0
WHERE pl_namespace IS NULL
AND page_namespace = 0
AND page_is_redirect = 0
LIMIT 100;
Exclude redirects, disambiguation pages, and pages tagged with {{orphaned}}.
The segfault was due to excessive recursion in PCRE, triggered by a regex in PageTriage. Offhand it looks like it would be triggered on pages where there are more than ~18k characters between a '{{' and its matching '}}'
We're fixing it by setting the PCRE recursion limit to 1k, apparently the default value (which it was set to previously) is 100k which is way too high
This refers to the edits by 0:0:0:0:0:0:0:1, 216.38.130.164 and 208.80.152.165 on May 11th, 2012 between 01:01 and 02:02 UTC. Those were caused by me chasing down an error that was occurring when people edited this page: you'd see an error page with ERR_ZERO_SIZE_OBJECT but your edit would go through. This pointed to a segmentation fault in the Apache process handling the request, and tracking it down was difficult (and added another bogus edit to the page history every time I tried), but I got there in the end as explained in my quote. --Catrope (talk) 03:01, 11 May 2012 (UTC)[reply]
Templates linking to other templates' edit pages
I would like, if it's possible, to have a list of all cases where:
Would it be possible to get a list of SUL accounts without any attached local accounts? This is a random glich that can happen when the last local account is renamed or when a user's registration breaks. MBisanztalk18:36, 14 May 2012 (UTC)[reply]
Per Wikipedia:Manual of Style/Music samples and non-free content criterion 3, non-free music samples will rarely need to be more than 64kbps. It should be easy to create a database report which lists all non-free audio files of higher than 64kbps, which could then be flagged as needing attention/fixed. (It may also be worth not including any files currently tagged with {{non-free reduce}}, as these have already been flagged as needing attention.) A weekly update would probably be fine. Thanks, J Milburn (talk) 21:04, 25 June 2012 (UTC)[reply]
Category sort key and category main articles
During a discussion on a proposed category page MOS at Wikipedia_talk:Manual_of_Style/Category_pages#Cat_main the issue was raised of missing and incorrectly sorted main articles in categories. As an example science should be (and in this case is) in Category:Science with a category sort order of a space. Due to either forgetting in the case of new categories or removal due to vandalism this important bit of code or important categorisation is often missing. Can a report be generated that dives the instance where there is a category lacking its associated article and another report giving the cases of the absence of the space sort key? See WP:SORTKEY for info. -- Alan Liefting (talk - contribs) 02:45, 27 June 2012 (UTC)[reply]
Broken section links
I noticed that this page, which lists broken section links, was last updated over a month ago and is now out of date. Can the page be updated more frequently (perhaps once a day or once a week?) I'm trying to fix broken section links on Wikipedia. Jarble (talk) 18:13, 2 July 2012 (UTC)
It appears that the page was updated daily until April 20, 2012. Jarble (talk) 18:15, 2 July 2012 (UTC)[reply]
Images without Fair Use rationale
A large number of the long entries showing up in this report do in fact have an NFUR meeting NFCC,
I've been marking this up with {{has-NFUR}}. Could this be added to the list of 'templates' to
look for when removing the item from the report?
Sfan00 IMG (talk) 09:12, 6 July 2012 (UTC)[reply]
Please add "AND page_title NOT LIKE 'Lists_of_%'" to the query above. This should be categorized under "Stub" reports as a companion to the Long stubs report. The long stubs report lists articles that have stub tags but probably shouldn't (based solely on length of article). This report indicates articles that do not have stub tags, but probably should. Dawynn (talk) 11:33, 31 July 2012 (UTC)[reply]
Many of the reports aren't being updated
The Toolserver has been closed down on July 1, 2014. Please use Wikimedia Cloud Services, or more precisely Toolforge. And preferably, stop using this template.
Presumably the Toolserver issues over the course of the last few weeks (months?) have played into this. There are probably threads on WP:VPT about this issue. Killiondude (talk) 18:34, 20 July 2012 (UTC)[reply]
I just posted this someplace else, but it looks like the s1 cluster, which is this wiki, toolserver database updates are backlogged over 8 days and this growing hourly. I'm adding a table that updates hourly. Vegaswikian (talk) 19:50, 25 July 2012 (UTC)[reply]
Yeah, there are a few issues. Some of the reports have been abandoned (easy enough to pick those up and start running them again). Others are simply broken. Some need to be rewritten (moderately painful, but shouldn't be too bad in most cases); some need to be disabled (at least temporarily). I'll work on fixing up some of this during this week and next. It really has gotten out of hand. --MZMcBride (talk) 00:24, 4 October 2012 (UTC)[reply]
It's always possible to do a full report. The question is whether it makes sense to do so. I don't know how many uncategorized templates there were in 2009 or how many there are now, but I imagine it'd be a lot of subpages if I switched the report to a paginated output. Do you need the full list for some reason? If you categorize the templates, wait for a refresh of the report, categorize the templates, etc., eventually you'll get less than 1,000 entries at Uncategorized templates (configuration). What's the problem? --MZMcBride (talk) 05:55, 24 July 2012 (UTC)[reply]
Sometimes when processing Uncategorized templates (configuration), rather than just progressing through the list, I prefer to scan the list looking for "easy targets" (e.g. it's probably going to be pretty easy to find a suitable category for a template named "Australian_<something>"). Having more templates listed in the report, would increase the number of "easy targets" available to address. I also think it would be useful to know how many uncategorized templates there were each time the report was run, so we could see whether the number was static, reducing or increasing over time. Perhaps if a full report is impractical, the number could be increased to 2000 or 3000? Just an idea. DH85868993 (talk) 09:13, 24 July 2012 (UTC)[reply]
As of (mumble mumble) 5-ish days ago, there were approximately 118,000 articles in namespace 10 with no categorylinks. - TB (talk) 18:10, 24 July 2012 (UTC)[reply]
Did you account for redirects? I imagine about half are template redirects (which are usually uncategorized, naturally). I'm trying to pull the figure now, but the Toolserver is taking its sweet-ass time. --MZMcBride (talk) 22:25, 24 July 2012 (UTC)[reply]
MZMcBride, please don't invest too much effort trying to determine an accurate figure. Even if half of the 118,000 identified by TB are template redirects, that still leaves way too many uncategorised templates to make generating a full report every time practical. Thank you both for your efforts. But maybe consider my suggestion of increasing the report to 2000 or 3000. DH85868993 (talk) 02:41, 25 July 2012 (UTC)[reply]
The toolserver was too grumpy to list for me entities that are non-redirects in namespace 10 that are transcluded into namespace 0. I'd concur with the 50% estimate above; a full report would include around 50,000 items. Best might be to list the 1000 'most interesting' uncategorised templates - the most transcluded or if this is too expensive, perhaps the least dusty. - TB (talk) 07:11, 25 July 2012 (UTC)[reply]
Popular/duplicated external links
I would like to be able to track down groups of external links which are duplicated across many articles.
In particular I would like to be able to locate popular DOI and PMID links, those which are found in many articles, which are ripe for sharing using templates like {{cite doi}} and {{cite pmid}}.
Currently my only method for finding such links is repeated use of the Linksearch which I doubt is the most efficient method!
Is there an existing report which relates to the popularity of External Links, at least by site, and hopefully per site?
TIA HAND —Phil | Talk13:45, 2 August 2012 (UTC)[reply]
I've popped up an initial listing of common external links to the two sites listed above for you at User:Phil_Boswell/common_els. Be aware that the toolserver (on which this report was generated) is struggling a bit at the moment; changes made to Wikipedia in the last 12 days or so will not be reflected in the report. - TB (talk) 20:30, 2 August 2012 (UTC)[reply]
Indeed, although it's actually http://dx.doi.org for DOI links. But there's something not quite right about the results anyway: there's not nearly enough of them. If you take a look at Special:linksearch/www.ncbi.nlm.nih.gov/pubmed/16381836, there's 79 articles linking to that PMID alone: it doesn't appear on your list, and I'm pretty sure those links were created before the Toolserver got backed up. I suspect if you remove the "http://" from the search criteria, you might see a difference: it does occasionally change the results of the manual search, don't know why because allegedly it shouldn't!
On another note, you'll see there's a template on that list, a sub-page of {{cite doi}}: it looks easy enough to extend the search to the Template NAMESPACE, is that correct? That would be helpful to determine which of those links already have a template suitable for sharing.
I'd like a report for bulleted lists of at least 4 bullets, each with bold words, and each with at least 10 non-bold words. Such bulleted lists should be converted to table.
Crossing files are half round on two sides with one side having a larger radius than the other. Tapered in width and thickness. For filing interior curved surfaces. The double radius makes possible filing at the junction of two curved surfaces or a straight and curved surface.
Crochet files are tapered in width and gradually tapered in thickness, with two flats and radiused edges, cut all around. Used in filing junctions between flat and curved surface, and slots with rounded edges.
Knife files are tapered in width and thickness, but the knife edge has the same thickness the whole length, with the knife edge having an arc to it. Used for slotting or wedging operations.
Pippin files are tapered in width and thickness, generally of a teardrop cross section and having the edge of a knife file. Used for filing the junction of two curved surfaces and making V-shaped slots.
would be better as:
File Types and Uses
Name
Image
Description
Crossing files
Crossing files are half round on two sides with one side having a larger radius than the other. Tapered in width and thickness. For filing interior curved surfaces. The double radius makes possible filing at the junction of two curved surfaces or a straight and curved surface.
Crochet files
Crochet files are tapered in width and gradually tapered in thickness, with two flats and radiused edges, cut all around. Used in filing junctions between flat and curved surface, and slots with rounded edges.
Knife files
Knife files are tapered in width and thickness, but the knife edge has the same thickness the whole length, with the knife edge having an arc to it. Used for slotting or wedging operations.
Pippin files
Pippin files are tapered in width and thickness, generally of a teardrop cross section and having the edge of a knife file. Used for filing the junction of two curved surfaces and making V-shaped slots.
There are two bots which remove file links. ImageRemovalBot removes Wikipedia files after they have been deleted from Wikipedia. CommonsDelinker removes Commons files after they have been deleted from Commons. Both bots seem to be operating properly. --Stefan2 (talk) 17:53, 6 September 2012 (UTC)[reply]
We pick up and fix quite a few typos in filenames over at the Red Link Recovery project. If there are particular problems that crop up often, post me a few examples and I'll look into it. - TB (talk) 20:58, 6 September 2012 (UTC)[reply]
I personally don't think this is an appropriate bot task. I've done a few hundred of these image removals (of deleted images) and the number of edge cases and other weirdness simply can't be accounted for programmatically. Consider, for example, a <gallery> inside its own section of the page with a single deleted image:
== Gallery ==
<gallery>
File:The pretty image that once was.jpg|Caption text here
</gallery>
A human would know to remove the entire section. A bot would just comment out the single image, leaving a blank and very awkward section. This is obviously only one specific edge case, but there are hundreds (if not thousands) of other edge cases. For example, another issue that comes up frequently is the use of an "image" template parameter accompanied by a "caption" parameter. Nearly any bot will ignore the caption parameter, but a human usually can look at the page and see the relationship between the deleted image and the caption and know to remove both.
The entire practice of commenting out images rather than outright removal has also never made much sense to me. If I were the king of the world, CommonsDelinker and ImageRemovalBot wouldn't be allowed to operate. Fundamentally I don't believe this task is fit for bots at this time. --MZMcBride (talk) 13:57, 7 September 2012 (UTC)[reply]
I agree that the bots should remove the file completely rather than commenting them out. The edits summary can b usewd to mention the filename so others could possibly chase it up. -- Alan Liefting (talk - contribs) 21:06, 7 September 2012 (UTC)[reply]
P.S. I love the name "shadows" for these files, but I may like "eclipsed" even more. Such nice imagery in both. This should be trivial to create as a regular database report; I may get to it this week (or this year, who knows).
Template categories containing articles report
Can anyone identify which content of Category:Image with comment templates is causing it to appear in Wikipedia:Database reports/Template categories containing articles? I can see numerous non-templates (i.e. user pages, "Wikipedia" pages and a redirect) in the category, but I can't see any articles. I recognise that there may have been an article in the category when the report was generated which is not there now, but the category has been listed in the report the last 4 times it has been generated and each time I've checked it (which is usually within an hour of the report being generated), I haven't seen any articles. Thanks. DH85868993 (talk) 06:11, 18 September 2012 (UTC)[reply]
Is it possible to get some of these reports for other wikis as well? E.g. I'm interested in the "User preferences" for Portuguese Wikipedia, but if it is not a difficult thing to do it I imagine some users would consider useful other reports as well. Helder03:28, 21 September 2012 (UTC)[reply]
I'd suggest getting a Toolserver account or finding someone with a Toolserver account who will volunteer to run these reports. Anyone with a basic understanding of Python can figure this out in an hour or less.
Hopefully that's somewhat helpful. If you have specific questions about how to set up database reports, feel free to leave me a note on my talk page or on this page. Eventually someone will get back to you. --MZMcBride (talk) 17:21, 28 October 2012 (UTC)[reply]
Indefinitely semi-protected articles with no prior protection history
Could a new database report be articles that have been indefinitely semi-protected (or protected for a year or longer, maybe) that had no prior protection history? Thanks, David1217What I've done02:15, 23 September 2012 (UTC)[reply]
We hear you. The logging database table in which the information you need resides is a large and ugly, making (for purely technical reasons) your request a bit trickier to fulfil than you might expect. I'm sure myself or one of the other volunteers here will do this eventually. - TB (talk) 07:25, 6 November 2012 (UTC)[reply]
Yeah, I'll take this one. It shouldn't be too bad to do, it'll just need to be done in separate queries. Basically you get a list of every indefinitely semi-protected article and then iterate through the list looking for any previous protection entries.
How often would you like the report to be updated? And will "Indefinitely semi-protected articles with no prior protection history" work as a report title? --MZMcBride (talk) 15:25, 6 November 2012 (UTC)[reply]
Eh, you don't need to update it that often—monthly is fine. I'm good with any name you choose. Thanks for taking this, MZ. I've got just one request: if this initial report goes well, could it be expanded to articles that have been indefinitely semi-protected with few (two or less) prior protections? Thanks. David1217What I've done03:37, 7 November 2012 (UTC)[reply]
Hmm. There's already an Indefinitely semi-protected articles (configuration) report. I wonder if adding a "Prior protections" column that contains the number of prior protection entries makes sense. That would seem to make more sense than making a separate report.
I also just noticed that the pagination on that particular report is currently a bit funky. Assuming the pagination could be fixed up (the current report divides redirects and non-redirects into sections), would the addition of a "Prior protections" column be sufficient here? --MZMcBride (talk) 04:58, 7 November 2012 (UTC)[reply]
Back in mid-July, there was a request on this talk page to schedule an "Untagged stub articles" report. Code was even provided for the report. What protocol is needed to establish this as a scheduled monthly report? Dawynn (talk) 00:21, 7 October 2012 (UTC)[reply]
The real answer to your question is to get yourself a Toolserver account and learn Python. The database reports code is all hosted on GitHub and released into the public domain. :-)
Now, I'm in a constant battle to remember that this isn't a fair answer. Not everyone can be expected to want to spend the time necessary to get a Toolserver account, figure out how to use it, figure out Python, and then contribute code. In this particular case, because Legoktm did all of that, I was happy to review his code (and even rewrote parts of it for him).
In your case, there have been two comments on this talk page from you, but I'm still completely unclear what you even want. There are a lot of database reports (well over 100 at this point, I think). Some are broken, it's true. Some actually break for a few weeks and then magically fix themselves. So when I look at old comments about how "this report isn't updating", a lot of the time the report has resolved itself. Sometimes not. It depends on the specifics (which we unfortunately lack here!).
Focusing more narrowly on the productive and the constructive: if you can tell me "I'm trying to improve Wikipedia by doing X, and your report Y is broken and is making this more difficult", that would be very helpful. That way, I could actually take a look at Y and figure out how to make it work properly so that you can do X. This isn't to say that having a clear bug report will suddenly give me more time or inclination to put more work into these database reports, but it will go a very long way in getting your issue resolved both in this case and in future cases. You also have to take care when dealing with developers, but particularly volunteer developers, to not throw an entire elephant at them at once. Find a discrete issue and focus energy on getting that discrete issue fixed; don't make a giant list that just scares everyone away. ;-) --MZMcBride (talk) 18:57, 3 November 2012 (UTC)[reply]
Thanks for your reply, MZMcBride! Maybe someday I'll learn Python, get an account, and work on debugging existing reports or creating new reports. Until then, I just depend on the kindness of strangers. I didn't make the giant list of reports - it was here before I discovered this page. Having said that, your advice is reasonable, and I'll start a new thread using that approach. Thanks! GoingBatty (talk) 02:12, 4 November 2012 (UTC)[reply]
P.S. The last time I looked at the list of reports, there were many that said they were to run daily/weekly/monthly but had not been updated according to the published schedule. The list looks much better today, so thank you to everyone who has worked to get the reports back on track! GoingBatty (talk) 02:22, 4 November 2012 (UTC)[reply]
Can someone run a database report for articles without talk pages?
I'm assuming that would be a very long list, I don't need all of the articles (unless it is just as easy or easier to get all of them). PleaseStand mentioned that the SQL might be something like "SELECTarticle.page_titleFROMpagearticleLEFTJOINpagetalkONtalk.page_namespace=1ANDarticle.page_title=talk.page_titleWHEREarticle.page_namespace=0ANDtalk.page_idISNULL;". Thanks, RyanVesey03:39, 8 November 2012 (UTC)[reply]
Thank you very much. I don't know if you are allowed to leave it up there indefinitely, but I'll be putting it into a text file tonight. I noticed that a number of the pages (at least the recent pages) had been deleted? Is there a way to filter those out? Otherwise I'll plug it into AWB. RyanVesey19:35, 8 November 2012 (UTC)[reply]
I think MZMcBride was planning on leaving it there (he can keep it there as long as he wants to). Are you referring to where the article page has been recently deleted? Or the talk page? Legoktm (talk) 20:11, 8 November 2012 (UTC)[reply]
You'll need to filter out deleted pages yourself. At any given time, there are dozens (or hundreds) of pages up for deletion (speedy, prod, deletion discussions, etc.). So when a list is generated, it'll quickly become out of date. Such is the nature of a wiki that sees somewhere around 200,000 changes per day. :-) --MZMcBride (talk) 04:43, 9 November 2012 (UTC)[reply]
It was nice to have this back again late last month, but it appears that it didn't run the following week, any chance this is easy to get repaired? It was quite effective at digging up stuff for WP:URBLP to put into its queue to work on... Cheers, --j⚛e deckertalk17:26, 5 December 2012 (UTC)[reply]
Huh. Bizarre. I wonder what's going on there. These kinds of intermittent issues are annoying, as now I'll look at the report (in the index) and think it's working fine because it last updated November 23. When in reality, something is terribly broken somewhere. Hmm. --MZMcBride (talk) 04:42, 6 December 2012 (UTC)[reply]
I'm really working through the Polluted Categories backlog; I did 400 or so in two days. I was wondering if the page could be updated twice a week, so that I don't run out of pages.
Hi Drilnoth - I'm using my bot to remove the categories from the user pages based on this report. However, this doesn't fix categories that are populated using templates. You may want to look at Wikipedia talk:Database reports/Polluted categories and see which categories should be populated with {{polluted category}} (so they won't appear on the report) or which templates should be updated so the category only appears for articles in mainspace. Thanks! GoingBatty (talk) 04:57, 13 December 2012 (UTC)[reply]
Hmm, odd. I hadn't seen that bot going around at all. Well, I'll see what I can do... get my template editing skills back up to par. Thanks! –Drilnoth (T/C) 05:22, 13 December 2012 (UTC)[reply]
I've been looking at this report almost every day, and using my bot to remove the categories from user pages and manually adding {{polluted category}} to maintenance categories. Now that the easy stuff is done and report is down under 800 entries, a lot more analysis will be needed to see which templates need to be changed so they only categorize in mainspace. Therefore, I reverted the changes that MZMcBride made on Wikipedia:Database reports and Wikipedia:Database reports/Polluted categories/Configuration to get this back to weekly. Thanks! GoingBatty (talk) 18:02, 2 January 2013 (UTC)[reply]
OK, so my changes didn't stop this report from running again today. MZMcBride, could you please let me know if there's something else that needs to be done? Thanks! GoingBatty (talk) 04:38, 3 January 2013 (UTC)[reply]
Right, the configuration subpages are just for reference, they don't actually get read by any scripts. That'd be awfully dangerous. Occasionally I'll update my crontab when I notice someone editing a configuration subpage, just to make them feel special. ;-)
In this case, there was a discrepancy between the configuration subpage and the index. The index said (says) "Weekly" while the configuration subpage was reading "2,6" (which is twice a week). I've changed the configuration subpage to be proper (once) weekly syntax and synced the change to the live crontab. --MZMcBride (talk) 05:17, 3 January 2013 (UTC)[reply]
The replicated copies of the English Wikipedia are on two hosts: thyme and rosemary. Both are corrupt, last I heard. Demonstration below. --MZMcBride (talk) 04:04, 19 December 2012 (UTC)[reply]
mzmcbride@willow:~$mysql-hthyme-e"select page_namespace, page_title from page join categorylinks on page_id = cl_from where cl_to = 'American_murder_victims' and page_namespace = 2;"enwiki_p;
mzmcbride@willow:~$mysql-hrosemary-e"select page_namespace, page_title from page join categorylinks on page_id = cl_from where cl_to = 'American_murder_victims' and page_namespace = 2;"enwiki_p;
+----------------+-------------------+
|page_namespace|page_title|
+----------------+-------------------+
|2|Alexdrudi/sandbox|
+----------------+-------------------+
I have noticed that quite a few pages in this report shouldn't be there. For example the most recent report, generated today, includes Category talk:Indonesian military-related lists, it isn't orphaned and it hasn't been edited since October - so it hasn't been fixed since the report was generated. There are quite a few occurrences like this. Is this a bug or am I missing something? -- Patchy109:09, 21 December 2012 (UTC)[reply]
Completely unreferenced biographies of living people (newest) & (oldest)
Unresolved
I'm no expert but I'm guessing these reports look for external links and when it can't find any it includes the article, would that be right? Alot of the articles in the reports are not completely unreferenced and I have noticed that most of them use either {{Infobox NFL player}} or {{Infobox gridiron football person}} (there are probably others). Both of these infoboxes use a parameter where the ID of the football player is given and the link is automatically created for a football website. Like I said I'm not an expert, but are these being included in the report because there isn't actually an external link in the article code itself? And if this is the case, can it be fixed to exclude them? Let me know if I'm making stuff up... -- Patchy109:30, 21 December 2012 (UTC)[reply]
The code looks for a further reading/bibliography/references/external links/sources section, any <ref tags, any urls which have the form http(s)://, and "isbn", so any auto-links created by an infobox won't be noticed. It probably would be nice if it also checked the externalinks table as a backup (maybe even at the same time as the initial query?). Legoktm (talk) 09:45, 21 December 2012 (UTC)[reply]
A one-time report or a regularly updated report? The only way I can see to do this these days (given the size of enwiki's revision table) is to do individual queries for every talk page. At the moment, that's about 4,343,871 queries (many more if we include redirects), which is technically doable, but a little tedious and not inexpensive.
It's clearer now what you want, but it remains unclear why. That is, if you can explain further what you're trying to do with this report, there may be alternate ideas to solve the underlying task. Unless, of course, you're simply interested in the data and nothing more. --MZMcBride (talk) 18:21, 31 December 2012 (UTC)[reply]
Both: regularly updated and one-time report. The report would show which Article Talk pages were edited most each month (first 5,000). It is simply for statistical and historical purposes so that we know which articles are being and were discussed most on Wikipedia each month. I'm pretty sure there must be an easier way to ask the database: "give me the top 5,000 Article talk pages that were edited most on November 2012" and "give me the top 5,000 Article talk pages that are being edited most this month". This is akin to a "hot discussions" list and will allow editors to set an eye and contribute to discussions which are undergoing heavy influx—which, in turn, will make Wikipedia more collaborative while increasing the quality of its articles. —Ahnoneemoos (talk) 18:42, 31 December 2012 (UTC)[reply]
This report has been very helpful to find misspellings that tools such as AWB and WPCleaner are programmed to ignore. However, with 1,000 entries, it's a daunting task to clean these up. In order to help prioritize the entries on this report, could someone please temporarily update the report to exclude piped links, so we can focus on those misspellings that are presented to the reader? For example, it's more important to fix [[Tennesee]] than it is to fix [[Tennesee|TN]]. I realize that this might exclude some piped links that present misspellings to the user, such as [[Millersberg, Michigan|Millersberg]] (instead of Millersburg), which is why I'm requesting this change to be temporary. Thanks! GoingBatty (talk) 02:41, 1 March 2013 (UTC)[reply]
The way the report is set up right now that isn't possible. It simply checks against the link tables which only stores what the user clicks on, not what they see. The only way to check for piped links would be checking against page text, aka a dump. Legoktm (talk) 02:44, 1 March 2013 (UTC)[reply]
Hello, would just like to have a few requests here, partially because I am inspired by Cracker Barrel, and also in response to a question I asked on the mathematics reference desk some time ago.
Semi-protected biographies of living persons
Semi-protected stubs
Semi-protected pages without interwiki links (meaning semi-protected articles without an article on any other Wikipedia)
Featured articles without interwiki links (meaning featured articles without an article on any other Wikipedia, like Cracker Barrel)
Thanks, although I would like these to be updated weekly, and for the semi-protected without interwiki links one to list the first 1000 pages. Also, I would like to request a few more similar to the ones above (more specific)
Is it possible to have this report include a column with a timestamp of the most recent edit, that way it will make it easy to find legitimate discussions that have just been started incorrectly. Also I intend to to a lot of work with this report, so would it be possible to increase the frequency of runs to say every week instead of every month? -- Patchy109:03, 12 January 2013 (UTC)[reply]
The following would be ideal if someone was going to sit down and do it, but I know how much work it takes, so I know this might just be a wishlist. -- Patchy110:35, 13 January 2013 (UTC)[reply]
No.
AfD page
Article
Most recent revision
No. of entries in deletion log of article
Fetching the most recent edit would require an additional query to the revision table, and the number of entries in the log one would require a query to the logging table. It should be easy to get the article (simple string parsing) as another column. Legoktm (talk) 09:07, 18 January 2013 (UTC)[reply]
AWB sort
Could I have Wikipedia:AutoWikiBrowser/CheckPage split into two lists in my userspace? One of all users on that list who have not made at least 30 edits in the last three years or are currently blocked and another of all those who have made at least 30 edits in the last three years and are unblocked. One time run. Thanks. MBisanztalk05:15, 23 January 2013 (UTC)[reply]
Okay, I'm speaking nonsense - we've loads of featured articles lacking even a single solitary interwiki link ;) Report revised and reposted at the same location. - TB (talk) 19:22, 8 February 2013 (UTC)[reply]
New page "survival curves"?
Could someone analyse the pages (Article namespace or otherwise) created in the first half of 2012, say; compare their dates of creation to their dates of deletion (if any); and produce from this a survival curve of new pages? I.e. a graph that would show what percentage of pages were still around a week after their creation, what percentage were still around after two weeks, etc.? I would ask that on the graph you have one line per namespace, plus a line for all pages combined. This would be a one-off report.
I ask because I am thinking of proposing that, once Wikidata is more integrated into Wikipedia and new articles/project pages get added to it by bots as a matter of course, that said bots be made to wait until the page is days old on this project, so that editors won't have to delete entries from multiple wikis if e.g. the page gets speedied over here. If we were able to see that of articles are deleted within a week, or that of Template pages that survive a week then go on to survive at least another three months, or whatever, we would be in a far better position to make an informed decision on how long to make bots hold off for (i.e. what to make ). It Is Me Heret / c21:53, 16 February 2013 (UTC)[reply]
I'd be curious to see those numbers evolve as maybe a weekly thing, if it's not a particularly painful query. I can't say that it would have practical maintenance value, but I do think it would be interesting to watch the transition. *shrug* --j⚛e deckertalk00:39, 21 February 2013 (UTC)[reply]
I'm working on setting up a rrd, but the problem is even if the langlinks are removed locally it wont make a difference since links from Wikidata will show up in the same table. Legoktm (talk) 00:42, 21 February 2013 (UTC)[reply]
Well I'd rather not archive a request that is feasible and no one has just gotten to it yet, but I'll take a look and see which ones we can archive. Legoktm (talk) 01:08, 1 March 2013 (UTC)[reply]
Seeing as Wikidata is now live, I was wondering if there is a list of articles with interwiki links that are not sections (e.g. have a # in the link). They could be ordered by the number of interwiki links they have. Del♉sion23(talk)15:49, 7 March 2013 (UTC)[reply]
I am sure me and lego can come up with something for this. Also just a note when I get around to it I should have a lovely db generated webpage showing all IW links Addbot has left on any page globally as well as the date of its last check e.t.c :) ·Add§hore·Talk To Me!15:56, 7 March 2013 (UTC)[reply]
The English Wikipedia has a culture against red links. At WP:FAC they often viewed as a negative towards "completeness" and the recommended action is to create stubs or delete them. And unlike stubs, its fairly hard to add metadata to them, although the Video game reference library been doing a pretty good job at it. See also: WP:RLR. — Dispenser17:04, 13 March 2013 (UTC)[reply]
non-free file size reduction requests - order by size
Would it be possible to simply get User:DASHBot to resume automatic reductions of images tagged for reduction? That way, the category would usually be more or less empty. I'm not sure why the operator stopped running the bot every day. --Stefan2 (talk) 23:40, 20 March 2013 (UTC)[reply]
I found an initial concern about automatically reducing images. I kind of share that sentiment. I've combed through the category of non-free file size reduction requests and found generally a small minority of those images are really large (>1000x1000 pixels), the rest being technically too large but not alarmingly so (maybe 400x500). If the sizes could be listed and if the list could be sorted, the largest offenders could be managed manually without much of a problem. I might also add that some images I've seen that are technically too large contain fine print or details and resizing them would ruin the image. Those I'm happy to pass on and I'd hate to see a bot come by and mangle them. – JBarta (talk) 00:14, 21 March 2013 (UTC)[reply]
It would be good if some of these reports were updated in light of the addition of the Module namespace for Lua code. I would be particularly interested in having an update for the most transcluded list that also included Modules. Dragons flight (talk) 18:56, 20 May 2013 (UTC)[reply]
Redirected templates
Is it possible to have a report which identifies templates which have been redirected and where the former usage
isn't that high, typically around 20 uses? Sfan00 IMG (talk) 08:29, 25 April 2013 (UTC)[reply]
Wikipedia indefinitely semi-protected articles and BLPs without interwiki links
Hello, it's been a while since my last DBR request. Actually, I have two for now: Indefinitely semi-protected articles (excluding BLPs) without interwiki links, and Indefinitely semi-protected BLPs without interwiki links (that's two separate requests). So basically, the contents of Category:Wikipedia indefinitely semi-protected pages (excluding sub-categories) and Category:Wikipedia indefinitely semi-protected biographies of living people that do not have even a single wikilink. The reason I'm asking for separate reports is that, the maximum number of articles that can be returned is 100, so I would want to see a separate list for non-BLPs and BLPs, because otherwise they could combine. Thank you. Narutolovehinata5tccsdnew04:45, 24 August 2013 (UTC)[reply]
I think it would be useful to have a list of fully-protected templates and modules that have few transclusions. I would set the limit at fewer than 10,000 transclusions (my usual rule of thumb for applying full protection to templates), and have the report run fortnightly. It would be a useful counterpart to Wikipedia:Database reports/Templates transcluded on the most pages - at the moment admins can patrol that report to find templates that need protection, but there isn't any report that admins can patrol to find what pages need their protection reducing. This has resulted in a steady increase in the number of protected templates over time (although some of that must also be attributed to the general growth of Wikipedia). — Mr. Stradivarius♪ talk ♪11:12, 13 September 2013 (UTC)[reply]
D'oh, I obviously should have looked harder... I missed it because it was listed under "protections" rather than under "templates". It could do with an update, though, as it was last run in February. — Mr. Stradivarius♪ talk ♪23:47, 20 September 2013 (UTC)[reply]
Biographical articles missing a DEFAULTSORT template - weekly?
There are a substantial number of biographical articles misplaced within categories due to no DEFAULTSORT template. Finding these in a large category is an art form. Many of them have arisen from quick take-on of articles without sufficient controls. Yes, I appreciate there are certain countries / languages where DEFAULTSORT is not required, but the benefit of this report would outweigh these exceptions. Itc editor2 (talk) 12:15, 9 November 2013 (UTC)[reply]
Wikipedia indefinitely semi-protected articles and BLPs without interwiki links (again)
I'd like to have a report listing the articles with AFTv5 enabled which have unreviewed feedback older than, say, 2 weeks. On those pages, enablers failed to respect the requirements for usage of the tool set by the associated RFC, so something needs doing. --Nemo09:30, 23 November 2013 (UTC)[reply]
Please exclude anything with template-level transclusion (editable by template editors and admins, such as {{Navbar}}) - this is explicitly intended for high risk templates. (This is a newer setting/feature on Wikipedia, which is the reason it wasn't included here originally.)
Yes. It was down at the moment due to Tool Labs webserver issues, but it should be on-line now. As for adding reports, please see https://github.com/valhallasw/tsreports/tree/master/reports for how to add one. I've added you to tsreports-dev (please test your query there!), which should also give you access to tsreports. I should write some docs on writing & debugging queries, though... a few pointers: manual query testing can be done with python/QueryWorker.py, testing the web part should be done on tsreports-dev, and can be done with ./deploy (should show up at http://tools.wmflabs.org/tsreports-dev/ - run 'webservice start' if it has a 'no webservice' notice). If it works, push to github (for which I should also add you as maintainer...) and deploy to tsreports with deploy_to_production. Valhallasw (talk) 11:05, 30 June 2014 (UTC)[reply]
Request for version of Polluted categories report for Draft namespace
I have been using Wikipedia:Database reports/Polluted categories to have my bot remove article categories from user pages. I am now also having my bot remove article categories from pages in the Draft namespace. Could someone please create a similar report for article categories that are in the Draft namespace? Thanks! GoingBatty (talk) 14:14, 27 August 2014 (UTC)[reply]
Navboxes with wrong name parameters
I suggest a weekly report of cases where these are all satisfied:
Template:A links to Template:B's edit page (A ≠ B)
Template:A does not transclude Template:B, or if it does then Template:B is a redirect
Explanation: I think that nearly all navboxes satisfy condition 1. If not then more templates may be added to the list there. The name parameter in a navbox must be the template name to make V T E links to the correct page. This example makes a wrong name and would be caught by the report. The suggestion is similar to #Templates linking to other templates' edit pages, but I think that version would give too many false positives when a template transcludes a navbox from another template. The only goal is to find wrong name parameters. If other methods, for example checking the name parameter directly in the source code, are more efficient then just do that instead. PrimeHunter (talk) 01:15, 10 December 2013 (UTC)[reply]
Good point about sandboxes. I suspected, and still suspect, the first report will reveal issues requiring tweaking. I think sandboxes and all other template subpages should just be skipped. If a bad name ends up in the main navbox then it will be picked up in the next report. I don't know whether we have cases where the main navbox itself is on a subpage. PrimeHunter (talk) 14:41, 10 December 2013 (UTC)[reply]
Template protected pages outside template or module namespace
Template protection is intended for use on modules or templates, or on rare cases of highly transcluded pages outside those namespaces. Occasional mistakes are made in selecting the protection level, for example applying template protection instead of semi protection, so such a report would be useful. Cenarium (talk) 04:05, 13 November 2014 (UTC)[reply]
Spurious entries in Broken WikiProject templates report
For the past couple of weeks, Wikipedia:Database reports/Broken WikiProject templates has contained spurious entries for WikiProject_Architecure, WikiProject_University_of_Connectict, WikiProject_military_History and WikiprojectBannerShell (by "spurious entries" I mean that the templates are listed in the report but no articles actually link to them). Three(?) weeks ago these were valid entries in the report, and I fixed the offending articles, but the templates have remained in the report for each of the next two weeks, even though no articles link to them. I wasn't concerned when they were still listed last week (I thought maybe there was some sort of database lag), but now that they're still listed again, I'm wondering whether there might be some problem with the generation of the report. DH85868993 (talk) 23:56, 20 November 2014 (UTC)[reply]
Broken links on Wikipedia:Database reports/Polluted categories
Thank you for continuing to update Wikipedia:Database reports/Polluted categories. I use this report remove article categories from user pages. Today I tried the "main" and "user" links and noticed they still point to the toolserver. When I click on a link, it redirects to tool labs but gives a "404 - Not Found" error. Is this something that could be fixed? Thanks! GoingBatty (talk) 14:44, 6 July 2014 (UTC)[reply]
Templates containing links to disambiguation pages report
Am I right in thinking that the Templates containing links to disambiguation pages report only lists templates starting with numbers or the letters A-C? If so, would it be possible to have a report listing all the templates containing links to disambiguation pages? Also, most of the templates listed in the report contain "legitimate" links to disambiguation pages and don't actually need to be "fixed" - would it be possible to generate a variation on the report which excludes a small number of templates which contain many "legitimate" links (e.g. {{African topic}}, {{Asian topic}}) and a small number of commonly occurring target links (e.g. Article, Example, Sandbox, Test)? Thanks. DH85868993 (talk) 06:17, 18 April 2015 (UTC)[reply]
Hmm .. tricky. I can compile some raw stats, but they may be hard to interpret as filters act in a confounding manner, altering the behaviour of the agent they act against. Picking a recent example at random, IP editor User:86.27.119.250 attempted to vandalise the article Grenada. From from the abuse filter logs, we can see 10 attempts edits were blocked, variously triggering 4 different abuse filters a total of 17 times. A further two edits made it through, but were automatically reverted by User:ClueBot NG within a few minutes.
Arguably, we could say that the edit filter 'blocked' 10 edits here, but in reality had they not been present the editor would probably have only submitted two or three edits - we have no way of knowing. If you have a more specific ideas in mind, please let me know. - TB (talk) 11:28, 18 February 2013 (UTC)[reply]
Thanks, I see the problem. I'm looking in to the arguments as to whether the declining number of edits is evidence that the community is in decline. My suspicion is that if we allow for the edit filters we may find that total attempted editing has been stable or rising since 2009, and that the perceived decline in editing is simply down to the edit filters taking out much of the vandalism. All I want is enough to confirm that the edit filters have significantly reduced actual editing by preventing some vandalism and doing so without edits to revert that vandalism. I can see you won't be able to give me a neat "but for the edit filter we would have an extra x thousand vandalisms, plus for almost all of them a vandalism reversion and for a large proportion a talkpage warning". But could you give me something like "in a typical month we have x thousand IPs and new editors trying to make y thousand edits that the edit filter rejects. Z thousand of them go on to make an edit that gets reverted". Obviously the edit filters have reduced the amount of vandalism that gets through to the pedia, but are we talking less than 100,000 vandalisms a month or is it over a million? ϢereSpielChequers14:57, 19 February 2013 (UTC)[reply]
@Ϣere: If you are still interested in abuse filter stats, continue reading.
Total number of disallowed save attempts: 1,918,132
Disallowed save attempts in the last 30 days: 22,317
Total number of disallowed actions: 2,040,820 (these include edit, move, createaccount, autocreateaccount, delete and upload)
Technically abuse filters also prevent edits when they are set to warn, but the user does get an message when (s)he tries to save the page, and unlike with disallowed edits, warns do not prevent the user when (s)he re-tries to save the same edit.--Snaevar (talk) 11:41, 14 July 2015 (UTC)[reply]
Thanks Snaevar, that's much less than I'd expected, but I would assume that the warnings deter a lot of bad edits. Is it possible to get figures on the number of times a warning is issued and the warned editor then doesn't save the same edit? ϢereSpielChequers11:51, 14 July 2015 (UTC)[reply]
Container categories are currently defined as categories which should only contain subcategories. I have been doing some work on diffusing container categories that directly contain pages/files. It would be helpful to have a report of container categories which directly contain pages/files, perhaps weekly. Slivicon (talk) 03:20, 7 August 2015 (UTC)[reply]
Thank you, @TB for providing the SQL query for the report. I'll create a new report under Database Reports and set it for weekly updating. Cheers! -- NKohli (WMF) (talk) 04:30, 17 October 2015 (UTC)[reply]
User pages of inexistent users
It would be useful to have a report that lists user pages, user talk pages and subpages that are associated with an account that doesn't exist. These pages are eligible for deletion under CSD U2. They are usually created by error, by trollers or as a result of a bad page move. And it is often seen that these pages exist for months before they're deleted or moved to the correct location. 103.6.159.72 (talk) 15:24, 23 October 2015 (UTC)[reply]
I'll have a look this; it's not entirely straightforward to check accounts by a certain name exist (because of Extension:CentralAuth mediated 'global accounts'). Something must be possible though. - TB (talk) 19:18, 24 October 2015 (UTC)[reply]
Righty. We have 49289 pages in namespace 2 for which no local account exists. The majority of these exist with good reason:
Is there any way we could have the "linked misspellings" report run 2-3 times a week instead of once? Daily would be ideal, but I don't know how severe of a load it puts on the servers. Thanks. Faceless Enemy (talk) 02:57, 15 December 2015 (UTC)[reply]
I would like to see a database report for pages with ambiguous dates in cite templates for |accessdate=. An ambiguous citation date would be something of the form 01/02/03 which could either be January 2, 2013 or February 1, 2013. If one of the day/month numbers is greater than 12 it is not ambiguous i.e. 13/01/03 is clear what the date should be.
It should be run every week because some of these dates can be fixed (by a user or a bot) if they are caught soon enough, for instance if a date was found today (15 April, 2014) reading 05/04/14, it must mean April 5, 2014 since the other interpretation May 4, 2014 has not passed yet.
Jamesmcmahon0 (talk) 15:30, 15 April 2014 (UTC)[reply]
01/02/03 cannot be either January 2, 2013 or February 1, 2013. It might be: 1 February 2003; January 2, 2003; or 2001-02-03 (2001 February 3). For some people, it might even be 2001, 2 March; but however you read it, the year isn't 2013. --Redrose64 (talk) 18:33, 16 April 2014 (UTC)[reply]
Oops, I meant the date 01/02/13, or 1 February 2003; January 2, 2003. You are of course right, that was a silly typo on my part. I've never seen the date format YY/MM/DD or YY/DD/MM is that common?. I was assuming the ambiguity would only come from DD/MM/YY vs MM/DD/YY if people use year first then it it could be a harder task... Jamesmcmahon0 (talk) 20:54, 16 April 2014 (UTC)[reply]
See ISO 8601#Truncated representations. My father (who was a bit wacky) tended to write dates as YY/MM/DD, this began when he was creating computer databases (this was in the 1970s, no MySQL back then) and needed to sort records by date - so he programmed it to store dates as strings formatted YY/MM/DD. Then he started using the same format when writing dates down in everyday life...
Interesting concept - if the report could quickly identify dates such as "07/04/2014", we could assume that this is 7 April 2014 and fix it before July 4 when it would then appear to be ambiguous. GoingBatty (talk) 03:19, 17 April 2014 (UTC)[reply]
This is a great idea. If Category:CS1 errors: dates were already empty instead of having 80,000+ articles in it, it would be easy to watch it for new errors and correct them quickly, as we do with a dozen other CS1 error categories that have been emptied by diligent gnomes. We could also deploy ReferenceBot to notify editors when they insert an ambiguous date. Both of those require the category to be empty first, and it's going to be quite a while before that happens. Jamesmcmahon0's idea is a good one that could reduce the addition of new articles to the category while we work to empty it of old errors. – Jonesey95 (talk) 04:16, 17 April 2014 (UTC)[reply]
I thought I was really hammering the Red-linked categories with incoming links list, I've done several hundred of late. But now I've found that it seems to be missing a bunch of categories which are needed but not appearing in the report. For instance, Category:1994 in Fijian rugby union (and 1995/6/7/8) should have been in there but weren't, along with 1990/2001/2/3/6/7 which were in the report. The cats had been on the articles since creation back in August 2013. Another one was the article 1997–98 Coca-Cola Triangular Series which had been tagged (incorrectly as it happens) with both Category:1998 in Bangladeshi cricket and Category:1998 in Kenyan cricket since creation 10 days ago. The former was in the report, but not the latter. I'm not sure what's going on, it feels like being on an article when it's first created could be part of the "problem" but that doesn't explain the cricket example unless it's something like "the report only sees the first category in a new article"?!?!? I leave it to others to work out what's going on, but I thought I'd give a heads-up. Ah well, back to category creation - I've nearly done all the xxxx in Yyyyy categories on that report now, the next time it runs we might even get some non-year categories appearing!!!! Le Deluge (talk) 22:55, 26 February 2016 (UTC)[reply]
Months ago, I was creating dozens and dozens of these categories. It's nice to see I wasn't the only one taking on these mundane activities. LizRead!Talk!23:28, 26 February 2016 (UTC)[reply]
I'd noticed that over the last 6 months #1000 on that report had moved from the 1920s to the end of the 1980s, so someone had obviously been working hard! My Wiki time tends to be sporadic so I'm not much good for day-to-day stuff but I tend to get the bit between my teeth and attack a backlog now and then. In this case I set myself a target of taking the end of that report past 2016 and into the non-year categories., I did about 250 off the last report and have so far done 400 off the current one, I've very nearly killed everything that isn't a XXXX(s) (dis)establishment. I'll probably do a few disestablishments next, but I won't be doing many establishments until the next run of the report, if you fancy something to do.... <g> PS I've found another "miss" - the report only showed 1999 out of Category:Track racing by year but most of the 90s were needed. Each had a single Polish speedway category created as a single edit in May 2013 - but there's no obvious difference between 1999 and the others.Le Deluge (talk) 15:13, 27 February 2016 (UTC)[reply]
Looking at the example given (Category:1994 in Fijian rugby union) I see that it has an incoming category link (from 1994_Wales_rugby_union_tour) but no page links. Either the query underlying the report is looking at the pagelinks table (rather than categorylinks) in the database, or it is suppressing entries with no incoming links from namespace 0. - TB (talk) 12:02, 1 March 2016 (UTC)[reply]
@Topbanana - I don't think that's it, because I think 1990 is similar and that was in the report. I think it's tied up with the thing I've discovered in the section above, that when a new category has only a single edit, then it doesn't register in any categories it "should" belong to. At least, at the time the report first picks up on it - it may get more complicated once the bot starts caching its results.... To be honest I haven't fully characterized the problem, but single-edit categories seem to be a big part of it. A null edit should fix it as with the false positives above, but the trouble with false negatives is that you don't know they're out there! But we've got thousands of new categories to create (I'm just shy of 500 from this week's version of this report alone), so I guess there's no hurry, but it's annoying knowing that the report is incomplete.Le Deluge (talk) 22:06, 1 March 2016 (UTC)[reply]
Request to expand Broken WikiProject templates report
Thanks - although you'll notice from the lack of question marks I wasn't actually asking a question here, I was merely providing an update on an existing conversation for the benefit of others here who may have code that looks for Category:Wikipedia category redirects. Don't worry, any detailed queries on SQL will get asked over on Quarry - as you pointed out it looks like my problem was a lack of underscores, even though adding the underscores takes my code over the Quarry timeout. Le Deluge (talk) 16:50, 24 February 2016 (UTC)[reply]
(merge threads)
I didn't even know there was a talk page here that folks read. But now that I do, there has been something that has bugged me for a while. And that is in the Empty Categories list shows quite a few soft-redirect categories. This occurs when an admin moves a category (which only happens as a result of a CFD request) and leave a redirect at the old category that points to the new one. There are hundreds of these soft-redirect categories that are empty but it seems like ones that have been recently emptied appear day after day on the Empty Categories list.
I don't know if there is some page purge that needs to be done or some feature reset or a bug that needs to be fixed but could these categories (which are intentionally and perpetually empty) stop appearing on this list? Thanks for any help you can offer. LizRead!Talk!23:26, 26 February 2016 (UTC)[reply]
I just noticed that Le Deluge has already mentioned this problem (above). Well, I guess you can consider this confirmation that it's still a problem. LizRead!Talk!23:30, 26 February 2016 (UTC)[reply]
@Liz - I've been having a bit of a play with SQL over at query 7554 and although I haven't completely defined the problem, I think I've worked out a work-round. The report looks for categories containing no articles, and then filters out a subset based on certain criteria. We're interested in the criterion where it filters out empty categories that are a member of Category:Wikipedia soft redirected categories. However, there seems to be a bit of a bug when the empty category only has one edit, ie it was added to Category:Wikipedia soft redirected categories when it was created and there have been no further edits. In this case MediaWiki can't "see" the category until there's been another edit. That can include a WP:NULLEDIT (at least, adding a space at the end works but not making no change whatsoever), so I suggest you throw the list of categories in that report at AWB and get it to do a null edit on each article. I've done it to the XXXX in Comoros articles on that report and they dropped out of my SQL query, so hopefully they will also be missing from the next run of that report.
It's also something to be a little bit wary of with empty categories, they may appear to be empty because a daughter category/article isn't registering with MediaWiki. For instance I created Category:1724 establishments in Spain before creating Category:1724 establishments in the Spanish Empire as a daughter. Because the daughter was a new single-edit category, its categorization as a member of the parent was not registering in MediaWiki and so the parent appeared empty on screen, and was showing up in a SQL query looking for empty categories. A null-edit to the daughter category made it show up in the parent. Now I know that can happen, I'll try not to create parent before daughter in future, but it's worth being aware of. I suspect something similar is also causing the problem below.Le Deluge (talk) 22:06, 1 March 2016 (UTC)[reply]
Per WP:NULLEDIT "Adding new blank lines only to the end of the page is also usually a null edit" and adding a space at the end seems to count likewise, it doesn't get recorded as an edit. In my limited experience, a "pure" null edit doesn't always manage to purge the cache but it seems to work better if you make the system do a little bit of work in trimming a space off the end.Le Deluge (talk) 16:59, 2 March 2016 (UTC)[reply]
I got curious and ended up filing phabricator:T128701 about the MediaWiki categorylinks table not updating for en.wikipedia.org. This issue probably affects other Wikimedia wikis, too. In general, you shouldn't ever need to manually purge or null edit a page. Users have these tools to force pages to update, but they are hacks. If users are regularly feeling the need to manually purge or null edit pages, the software is broken and needs to be fixed. --MZMcBride (talk) 04:53, 3 March 2016 (UTC)[reply]
@MZMcBride: So where are we with this? My impression is that it's still happening, possibly not quite as much as it was but definitely still happening. For instance, my name comes up 7 times on the latest uncategorised categories report. Aside from one where I was just a bit too quick to revert an IP blank without checking there was cats on the previous version <cough> there's 6 recent examples of categories with cats not being recognised by the report. The Red-linked categories with incoming links report has two even more difficult cases, Category:Orphaned non-free use Wikipedia files as of 7 March 2016 and Category:Proposed deletion as of 8 March 2016 are stuck in limbo because they can't be null-edited because they've been deleted. I suspect the only way to clear them from the system is to create them and then delete them again, but for the time being they're probably more useful being preserved for techies to play with. Still, RLCWIL is looking a lot happier now - it's now (just!) under 300 to go, and I've got some ideas on how to clear out the User cats that make up much of the remainder.... Le Deluge (talk) 23:23, 23 March 2016 (UTC)[reply]
Wikipedia:Database reports/Editors eligible for Autopatrol privilege not running
Hi, Wikipedia:Database reports/Editors eligible for Autopatrol privilege hasn't run for a year now, I think we lost it in the move to labs. I asked the original writer a few months ago but they don't seem to be around much and haven't responded to my request on their talkpage. There is bound to be a whole crop of editors out there who merit being made autopatrollers, but we really need a fresh list to trawl for them. There are several advantages to appointing a bunch of Autopatrollers not least being that it lightens the workload at newpage patrol. I think the criteria were something like:
Not an Admin, Bureaucrat or Autopatroller
Has created an article other than a redirect in the last month
Has created over 50 articles other than redirects in total
Has not had the Autopatroller right revoked in the last 12 months
If the report could also exclude anyone with a copyvio block from the last 12 months that would be great, but I suspect that sort of thing would best be done by admins trawling the list. ϢereSpielChequers10:43, 29 June 2015 (UTC)[reply]
Hi, I'm working on fixing these broken reports. Could you confirm if the criteria you've stated above is correct? Or could you link me to where the criteria for users with auto-patrol privilege is stated? I'd be able to get it back up very soon hopefully. NiharikaKohli (talk) 17:55, 27 August 2015 (UTC)[reply]
Wikipedia:Autopatrolled mentions the criteria for getting the autopatrolled right. As per that page users can get the right when they have created 25 articles, excluding both redirect pages and disambiguations. That page mentions that administrators do not need this right as they already have the "autopatrol" right. The page does not however mention bots, but it would make sense to exclude them, because bots have the "autopatrol" right aswell (see Special:UserGroupRights). Excluding autopatrolled users also makes sense, as obviousally they allready have this right aswell.--Snaevar (talk) 15:13, 31 August 2015 (UTC)[reply]
Hi, this is fixed now but we do see some "bots" in the results. These are bots operating without a bot flag. Suggestions for improvement are welcome! NiharikaKohli (talk) 16:58, 11 September 2015 (UTC)[reply]
I'm not worried about the bots, but there are loads of accounts there that should fail the test "Has created an article other than a redirect in the last month". Far too many for the report to be worth manually going through as a source of potential autopatrollers. If we can get that part of the query working then the former bots will disappear along with former admins and the other retired accounts and the report will become useful. ϢereSpielChequers14:25, 17 October 2015 (UTC)[reply]
Would it be possible to split this to multiple reports, by namespace (some could be grouped) - problem is that for the most part this is always hitting the select limit due to certain namespaces having legit pages in this classification (or at least pages that need to be dealt with in a different manner). — xaosfluxTalk14:44, 6 March 2016 (UTC)[reply]
I'm not sure about multiple reports. But a single report with namespace sections or a single report with an extra "namespace ID" table column would probably solve this request. The advantage to doing an extra table column is that you could then sort by page namespace ID and you could continue to sort by total overall size. Sorting by overall size across namespaces wouldn't be possible with sections. Thoughts? --MZMcBride (talk) 22:58, 6 March 2016 (UTC)[reply]
MZMcBride The main issue is the 2000 fetch limit is getting consumed by pages that are "ok" to be there - possibly obscuring other cleanup that may be needed. If this will result in 2000 of EACH namespace it would fix it. — xaosfluxTalk01:57, 15 March 2016 (UTC)[reply]
For some reason, I thought this report had a page size column when I wrote the previous reply. I'm now realizing yet again that blank pages will all have a page length of 0 bytes, of course.
Looks like if you just sort by name/space, that would at least knock all the User_talks down to the bottom of the list, which would hopefully allow the first 2000 to be more interesting pages? As an aside, any chance of reviving Wikipedia:Database_reports/Categories_categorized_in_red-linked_categories - might encourage a few people to have a go at some of them, looking at a direct database query it's down to just over 5000 now, I plan to do some once I'm finished with the red cats with incoming links which should look a lot healthier when the next report runs tomorrow. <g> Le Deluge (talk) 15:52, 15 March 2016 (UTC)[reply]
@Thparkth: I noticed for the past few runs there has been a lot of false positives. Can you check the bot to see why it is picking up so many false positives? Thanks. -- Gogo Dodo (talk) 07:01, 1 May 2016 (UTC)[reply]
Technically shouldn't this report be excluding from it's counting entries where the sole link is a file inclusion on the report page itself? Sfan00 IMG (talk) 12:22, 28 May 2016 (UTC)[reply]
MediaWiki extension for database reports
I have started a discussion on Phabricator around the idea of having database reports built into Wikipedia's software, rather than rely on the work of bots. There are no firm plans yet; I'd like to gather people's thoughts on the idea. Please comment on the Phabricator ticket. (If you are not familiar with Phabricator, you can log in using your Wikipedia account information if you click the Login or Register: MediaWiki button.) Harej (talk) 12:14, 5 June 2016 (UTC)[reply]
Typically categories that are either a) involved in a CFD discussion, b) are redirect categories or c) have an {{empty category}} tag placed on them are not included in this list. But this isn't the case any more. There are a lot of categories that have Empty Category tags that are suddenly now included on this list and all three of these types of categories should be excluded. Can this be fixed? Thanks. LizRead!Talk!13:15, 17 May 2016 (UTC)[reply]