This is an archive of past discussions on Wikipedia:Database reports. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.
In other words, files that are currently tagged as having both a 'free' and 'non-free'
license, or which have a 'free' license but also have a 'fair-use' rationale present.
Is there some way to identify images using a free license? Is there a master category of free license templates somewhere? I know all non-free file go into Category:All non-free media, but there doesn't seem to be a corresponding category for free media. --MZMcBride (talk) 17:55, 4 January 2010 (UTC)
Category:Free_images and it's subcateogries is a possibility, but it's subcatagories, not just on great big one...
Further to last message , someone called erwin on #wikimedia-toolserver came up with the following -
SELECT page_title
FROM page
JOIN categorylinks AS c1
ON c1.cl_from = page_id
JOIN categorylinks AS c2
ON c2.cl_from = page_id
WHERE page_namespace = 6
AND c1.cl_to = 'All_free_media'
AND c2.cl_to = 'All_non-free_media'
Could I prod you into running the indef semi prot one? There's an ongoing WP:AN about indef semi's and I'd like to see the current data. –xenotalk18:42, 23 March 2010 (UTC)
I would like to generate a report on editors who are eligible for autoreview privileges (and don't already have them):
Search for all editors with more than 50 created non-redirect articles.
Remove entries which already have administrator or autoreview privileges.
I suppose this report could be run quarterly—once every three months or so. I asked around to see if anyone would be willing to make a bot to do this, and I was pointed here by Tim1357. —Preceding unsigned comment added by Cryptic C62 (talk • contribs)
Yay! I am more than slightly clueless, so bear with me here: Is there a way to use DBQ-87 to generate a new report every few months? Or is it just a static report? In the latter case, I think it would be helpful to make the report here. On the other hand, this query generated over 900 results, so it may not even be necessary to run it again any time soon considering how long it would take to process that many candidates. Your call. --Cryptic C62 · Talk04:01, 17 March 2010 (UTC)
The DBQ is static. The ticket has been resolved and isn't set to re-run in the future. A report here would be set to update every X weeks or months or whatever. (This is actually how most queries got run before I set up "Database reports" with automation.) If/when you need the query re-run, just leave a note here and I'll use that code that Mauro wrote to make something that updates however often you'd like. --MZMcBride (talk) 04:15, 17 March 2010 (UTC)
That sounds like an excellent plan to me, thanks mate! I'll come back in a few months when this batch is done. In the meantime, here's something to think about: Of the candidates that these queries generate, some will probably not be granted autoreview privileges. In order to avoid having these users show up in every subsequent report, I could maintain a subpage that lists the rejected candidates. I assume it would be fairly straightforward for the automated report to perform an additional check to see if the user is on that list, correct? --Cryptic C62 · Talk04:21, 17 March 2010 (UTC)
Simple enough to program. Something similar is already done with the edit count list (currently living at User:MZMcBride/Sandbox 3), which has both an opt out list (for unflagged bots) and an anonymize list. You may run into political issues making a list of users, though. Sometimes people get testy with such things (look up "User:Kelly Martin/B" sometime if you're bored). Just let me know sometime in the future and we'll figure out a decent solution then. --MZMcBride (talk) 04:29, 17 March 2010 (UTC)
Yikes, there's a problem! It seems that the line of code r2.rev_timestamp > '20091231235959' isn't working. Many of the editors on the list are no longer editing. For now I can just check their most recent contributions, but what can be done to rectify this permanently? --Cryptic C62 · Talk02:18, 18 March 2010 (UTC)
So... I guess I didn't get beat out after all? Can you give me an example user that's wrongly listed in the current list and an example user that's correctly listed? I should be able to run this report properly this week. Thanks. --MZMcBride (talk) 02:22, 18 March 2010 (UTC)
Certainly. 2fort5r, 45ossington, and 4meter4 were all correctly listed and approved, as they meet all the requirements and have edited in 2010. 40fifw0, 6afraidof7, and 9Nak were all incorrectly listed, as they meet the requirements but have not edited in 2010. On a side-note, it may be useful to add a check that the user is not currently blocked. I assume this would also be easy to implement? --Cryptic C62 · Talk02:33, 18 March 2010 (UTC)
The static report is becoming less useful over time. In the last batch of 6 candidates, 2 had already been given the autoreview privilege since the report was first compiled, plus there's always the problem of inactive users. I've started an exclusion list at User:Cryptic C62/Exclusion, any chance ya'll could run the report again using the new criteria? --Cryptic C62 · Talk16:05, 23 April 2010 (UTC)
If a user is not suitable for the autoreview privilege but might become suitable later (for example, if they haven't edited/created articles in a while), then I don't put them on the exclusion list. If a user is not suitable for the privilege and they are unlikely to become suitable later (for example, if they are indefinitely blocked or have a number of article warnings on their talk page), then I do put them on the exclusion page. Also, if I nominate a user but an admin rejects the nomination, then I will put that user on the exclusion list. Basically, I just want to avoid having to process the same users over and over again if there's no chance that the later nominations will be successful. The ideal report would produce users with all of the following attributes:
(edit conflict) Do you have any example pages? I don't think this is possible to (easily) query, but I won't know for sure until I try it out on a few example pages. Also, is this something you'd like a regularly updated report for or just a one-time thing? What's the ultimate goal of this report (there might be a better way to find similar data)? --MZMcBride (talk) 17:21, 26 March 2010 (UTC)
This is not a wikiproject-specific. I will try to find examples; I was thinking that the hard part is to know when was an article last rated, which do not need examples, am I wrong? A one-time report is enough, I think. Sole Soul (talk) 19:29, 26 March 2010 (UTC)
If it's not WikiProject specific, the report is going to be huge - and who is going to go through it? Most projects already have a trouble keeping their unassessed articles out of backlog - let alone re-rating existing ones... –xenotalk19:39, 26 March 2010 (UTC)
This is only a report for start pages that were doubled in size since last rated, I doubt that it is gonna be huge. If that is the case then we can start with the first 100. Outdated rate is less useful than no rate. Sole Soul (talk) 19:54, 26 March 2010 (UTC)
I'm no programmer, but maybe this will help:
Step 1: Find all talk pages that contain class=start.
Step 2: Search through revisions backwards to find timestamp of earliest revision with class=start.
Step 3: Search through revisions of the actual article to find revision with the largest timestamp that is older than the aforementioned talk page timestamp. (The revision of the article that was reviewed when class=start was added.
Step 4: Find prose size of that article revision.
Step 5: Find prose size of current revision.
Step 6: If current prose ≥ 2x old prose, add to report.
It could be useful to have a report listing user subpages, and maybe also user talk subpages, of indefinitely blocked users blocked over a year ago, as they often contain inappropriate content. A report on main userpages and usertalkpages would need plenty of exceptions to become usable but it's probably not the case for subpages. Cenarium (talk) 22:57, 11 April 2010 (UTC)
Very old user subpages of long-term inactive users
Per discussion at Wikipedia:Village_pump_(development)#Bot_to_delete_very_old_userspace_pages, would it be possible to get a report listing very old user subpages of long-term inactive users? We could start with the criteria (on an AND filter basis) i) subpage older than 5 years ii) user's last edit (to any page) more than 5 years ago. It's quite possible that this listing would contain nothing at all, in which case we could try it with 3 years instead of 5. PS Providing a diff and date of the last edit of the subpage could be helpful. Rd232talk12:45, 12 April 2010 (UTC)
Its running now. When it finishes it will be saved here. The SQL:
Thanks, but that's a list of old userpages. That may be of interest as well, but less so. I'm after user subpages. Also (and this must be a perennial question) can't the report provide clickable links to the pages in question? Copying and pasting will waste a lot of time here. Rd232talk07:32, 17 April 2010 (UTC)
Yeah, that list from Tim1357 is useless to you; it doesn't contain any subpages. I'll look at making a proper report. --MZMcBride (talk) 17:34, 17 April 2010 (UTC)
Wow I fail at reading. I didn't take anybody else's SQL, so maybe just great minds think alike? Ill do another query for subpages and post a link when I am done. Tim1357 (talk) 17:59, 17 April 2010 (UTC)
Great minds read documentation, I think you mean. I wasn't accusing you of anything, but the REPLACE(), DATE_FORMAT(), and SELECT MAX(rev_timestamp) bits all look familiar. Anyway, your problem is that you're JOINing page ON user, which will inherently eliminate all subpages. I wouldn't worry about this one, I'm about to write it. --MZMcBride (talk) 18:04, 17 April 2010 (UTC)
I already did. Running and will save to here when done. New SQL:
Thanks... but is there really no way for the reports to be linked to the pages in question, so they can be seen with one click? Rd232talk22:12, 19 April 2010 (UTC)
Pages with four or more WikiProject banners, no shell, and at least one heading
I would like a list of talk pages with four or more WikiProject banners1, no shell2, and at least one ==heading==3. –xenotalk18:38, 14 April 2010 (UTC)
I imagine the first iteration of this will be rather large, so you can just store it somewhere before making it into a auto-refreshing report. –xenotalk18:39, 14 April 2010 (UTC)
I was applying for a toolserver account and ran into a snag (couldn't log into JIRA). Emailed the people who were responsible and never got a reply. =\ –xenotalk19:23, 14 April 2010 (UTC)
Try again? I don't know of any known issues regarding account creation in JIRA. If you hit a wall, post the error message (here or on my talk page) and I'll see that it gets looked at. --MZMcBride (talk) 19:25, 14 April 2010 (UTC)
Why are you suggesting regex on the page text here? You're interested in pages not using Template:WikiProjectBannerShell that contain 5 or more instances of templates that start with "WP" or "Wikiproject"? That seems mostly like database work (the database will be far more accurate than your shitty regex, at least). The additional == header requirement isn't really too difficult to implement in a database report as well. Much as I loathe dealing with page text, some of the current reports do so. Would the implementation I suggested work, and if so, do you have a report title in mind? --MZMcBride (talk) 19:29, 14 April 2010 (UTC)
It would probably be more robust to find pages transcluding 4 or more members of Category:WikiProject banners (recursive), not transcluding {{WikiProjectBannerShell}}, but I am not a wizard like you. The == part can be left off if it's easier, but I'll probably still be ignoring pages without at least a header (no need to save space when there's nothing below to see) The above is the hack I came up with =)
Xeno, Im running a query that matches the first 2 criteria on your list. My guess that parsing for a section header won't be to hard for you. When the query is done, the output will save here and Ill make it pretty after that. Tim1357 (talk) 16:40, 21 April 2010 (UTC)
Thanks! Don't worry about making it pretty as long as it's an importable list. And yes, I can look for section headers easily enough. –xenotalk16:59, 21 April 2010 (UTC)
Grr it got stuck last night. I slapped a LIMIT 500 on there for now, and restarted it. MZMcBride, can you find anything that I am doing wrong? Tim1357 (talk) 22:59, 22 April 2010 (UTC)
Templates that contain inline citations occasionally cause cite erros if transcluded in articles that do not contain {{Reflist}} or if the template was placed below a {{Reflist}}. AWB does not fix this error. The conditions are:
Templates that contain inline citations, AND
Transcluded in mainspace articles that display "Cite error"
Not done Sorry, but it is impossible to use the database to find this information, as MzMcbride and I cannot do queries on the pages text. I would suggest requesting a dump scan instead. Thanks! Tim1357 (talk) 13:15, 21 April 2010 (UTC)
Cool, looks like I was wrong: I was not aware that there were maintenance categories that list pages with broken refs. I thought you wanted us to parse the text for broken refs, which we of course cannot do. Tim1357 (talk) 16:22, 21 April 2010 (UTC)
I have no intention of making this a regularly updated report as I don't understand Tim's output. I don't understand why there are two columns (article title and template title) and why there are results being listed multiple times. It's fairly trivial to obtain a list of pages in namespace 0 that are in Category:Pages with missing references list, if by no other means than by using yanker. --MZMcBride (talk) 05:42, 25 April 2010 (UTC)
request
Sorry but, I tried to use the scripts in italian wikipedia, I have an account on toolserver! what can I do for use this scripts? :( --.snoopy.10:22, 23 April 2010 (UTC)
sorry but I try to use one of your scrips, brokenredirects.py and insert itwiki_p where I find enwiki_p but in .my.cnf I found only my username dario_vet and the password, I don't find any configurations for my .snoopybot. --.snoopy.07:59, 26 April 2010 (UTC)
You need to rename "settings.sample" to "settings.py" and then modify it appropriately (example file here). Be sure to run "chmod 600 settings.py" and "chmod 600 settings.pyc" after modifying the file to prevent others from being able to read it. --MZMcBride (talk) 08:02, 26 April 2010 (UTC)
do you think is ok? --.snoopy. 08:41, 26 April 2010 (UTC)
you speak about settings.pyc, but I don't have any settings.pyc, thanks for you help anyway --.snoopy.08:55, 26 April 2010 (UTC)
The file looks fine, except the "host" line and the "rootpage" line. The host line needs to be 'sql-s2'. The rootpage lines needs to end with "/". When you run brokenredirects.py (or any of the other scripts), it will create the settings.pyc file. --MZMcBride (talk) 12:07, 26 April 2010 (UTC)
seems to be ok (link) if I copy line for line on a python shell on toolserver, but if I run python2.5 -c "import wikitool" wiki/brokenredirect.py don't work, I must write "import wikitool" in the command, I hope it's ok :| --.snoopy.12:39, 26 April 2010 (UTC)
ok, now it's ok, but the script run only under home directory, not on ~/block, I thinks that is a problem with export, now my configuration for export is:
You could try setting your PYTHONPATH to "/home/dario_vet". Mine is "/home/mzmcbride/scripts" because my wikitools/ directory is inside the scripts folder. Your wikitools directory isn't. Hope that helps. --MZMcBride (talk) 15:26, 26 April 2010 (UTC)
I move all file on /home/dario_vet/wikitools/block but not work, mmm I don't think where I done an error :| --.snoopy.15:55, 26 April 2010 (UTC)
You want "settings.py" to be in the same directory as whatever file you're running is in. It should not be in the wikitools/ directory unless your other scripts are also in that directory. So, for example, "broken.py" is in /home/dario_vet/scripts. "settings.py" should also be in /home/dario_vet/scripts. Then it should import properly. --MZMcBride (talk) 06:36, 27 April 2010 (UTC)
Not sure if this belongs here or WT:DBR. I imagine the initial run can be taken from a dump, and future cases being in a database report. If possible, please prioritize over my above requests. –xenotalk15:24, 4 May 2010 (UTC)
This is trivial. I'll try to get to it tonight. You just want pages in ns:0? Do you have a report title in mind and an update frequency? --MZMcBride (talk) 22:19, 4 May 2010 (UTC)
You say "ugh," but someone has to come up with a report title. This isn't WP:DDR, after all! And I'm already doing all the heavy lifting! --MZMcBride (talk) 22:27, 4 May 2010 (UTC)
Thanks again. This is a huge help. AWB was choking trying to compare a 200,000 list to a 356,000 one =) [I wonder if JIRA is fixed yet...] –xenotalk02:05, 5 May 2010 (UTC)
List all media in File: namespace having a 'Fair Use' tag but which is not indicated as actually being used
in an article.
Basic Query suggested by carl-m on the toolserver IRC channel:
select concat('File:', page_title) from page join categorylinks on page_id = cl_from where page_namespace = 6 and cl_to = 'All_non-free_media' and not exists (select 1 from imagelinks where il_to = page_title) limit 4;
My bot, DASHBot does this every night. I could have the list be saved somewhere, but I see no point because the bot tags them immediately. Tim1357talk16:11, 9 May 2010 (UTC)
Do you realize that all pagelinks count and that there's no way to separate meta-data from legitimate page content? For example, the {{disambiguation}} template alone will +1 all counts (due to [[Help:Disambiguation|disambiguation]]). --MZMcBride (talk) 02:18, 3 June 2010 (UTC)
I limited the count of the page_links to only links to the main namespace. Do you want a regular report? Tim1357talk03:55, 3 June 2010 (UTC)
Something is broken with that report. The pasted results seem fine, so you must've used a different query for the automated report. The results are simply wrong in the automated report.
I've been collecting MOS:DAB issues, numbered according to Josh Parris's list, when people use the Dab solver. The file is TSV formatted, and best worked with using filtered tables in Excel and then converted to wiki code in WikEd. The revision id (which increases with time, hint) is included so, in principle, we can prune updated pages. However, I will need to redo it to gather the number occurrences of each issues; this allows identifying pages which are in a particular bad state. — Dispenser05:59, 4 June 2010 (UTC)
Reviewers
It seems the flagged protection trial is planned for implementation next week, so it will be useful to have a list of candidates for the reviewer usergroup. So I ask for an update of Wikipedia:Database reports/Potential reviewer candidates (with links). Since it's kinda lots of users, we should probably use the 'most obvious first' strategy, by using higher standards for the first few reports. I suggest 2,500 edits, their first edit more than a year ago, and their latest edit within the past month for now. It can be one-time for now then daily or every 2-3 days when we start granting the rights. Cenarium (talk) 15:10, 9 June 2010 (UTC)
Easy enough to update the report. Personally, I am curious about this, though. I thought RobLa, et al. had declared that all autoconfirmed users were getting review rights. Did that change or am I mistaken? --MZMcBride (talk) 16:33, 9 June 2010 (UTC)
I'm running the updated report now. It may fail due to such a large page size (things like the motherfucking AbuseFilter like to kill large edits, even if they're completely legitimate). If it fails, I can add pagination easily enough or we can limit output, whichever. --MZMcBride (talk) 16:38, 9 June 2010 (UTC)
One way to reduce the size would be to exclude: rollbackers, account creators, autoreviewers, these people are likely trusted enough by virtue of those rights to receive the right implicitly or explicitly. –xenotalk16:43, 9 June 2010 (UTC)
Cenarium: Do you have a preference between pagination or limitation? Putting all the results on one page simply isn't feasible. --MZMcBride (talk) 20:50, 10 June 2010 (UTC)
Ok, we probably need to do several reports. The reports should of course exclude admins and users in the reviewer usergroup, when it comes live. A possibility is
a report listing users in the rollbacker, account creator or autoreviewer usergroups whose latest edit is within the past month
a report listing users with X edits, their first edit more than Y ago, and their latest edit within the past month, and who are not in any of the above usergroups; where X and Y are gradually downgraded as we grant the rights. We can start with X=2,500, Y= 1 year, and if there are still too many, then paginate. Cenarium (talk) 23:57, 10 June 2010 (UTC)
Ah, using the lists as strong suggestions is of course fine. It should have actually read your initial post, sorry about that. Amalthea16:01, 11 June 2010 (UTC)
I think one report is fine as the proposed report 1 is naturally a subset of the proposed report 2. Adding a "Groups" column that lists the user's current user groups might be nice, though. --MZMcBride (talk) 20:17, 11 June 2010 (UTC)
I'll also be implementing an "Exceptions" subpage for users who wish to not be listed (and therefore not be in the "reviewer" user group), like me. --MZMcBride (talk) 20:31, 11 June 2010 (UTC)
Indef-blocked editors are currently on the list, would of course be best if they can be excluded. But what's happening now? If you need me to mass-add people to the group, say the word. Personally I'd just add everyone from the lists and deal with hypothetical false positives once they are reported, but I'd certainly go ahead and add all rollbackers, autoreviewers, and account-creators since they all previously had someone look at them before. Amalthea09:32, 13 June 2010 (UTC)
The vast majority of users probably meet the (not yet established) criteria. There were a suggestion that users should be recently sufficiently active, so more than one edit in the latest month, how many subject to interpretation, and recent blocks should also be considered. Maybe users who had not requested the right should be given some explanatory notice, so either a talk page notice, or we wait until the trial begins where we'll have the sitnotice. Cenarium (talk) 04:09, 14 June 2010 (UTC)
Hmm, if people are still not happy with any kind of fixed criteria then I'll hold still for now (won't get around to do any sanity checks myself the next few days). But as indicated, feel free to give me a list of usernames and I'll semi-automatically put them into the group. Amalthea09:48, 14 June 2010 (UTC)
Would it be possible to generate a weekly report of English Wikipedia files whose names follow, more or less, the regex /^[0-9]{7,11} [0-9a-f]{10}.*\.jpe?g$/ (don't count the slashes, case-sensitive is OK, and convert to SQL format as appropriate). This would be very useful to identify both files that need renaming and also copyvios (you can prepend http://www.flickr.com/photo.gne?id= to the first space-separated part to get the original description page including licensing information). Thanks, PleaseStand(talk)05:22, 12 June 2010 (UTC)
Simple enough to do; this is pretty much the same methodology that User categories (configuration) uses. Do you want to query Commons as well? Or avoid Commons images? How often do you want the report updated? And do you have a good report title in mind? For the link bit, I'll probably just use a wrapper template that you can customize to meet your needs. --MZMcBride (talk) 05:51, 12 June 2010 (UTC)
Commons images should be in a separate section. Both reports should be run weekly. A good report title would be "Files with Flickr-generated filenames". PleaseStand(talk)06:16, 12 June 2010 (UTC)
To clarify: this large backlog (much from 2006 and 2007) remains from before such filenames were title-blacklisted, so the backlog should not be growing. I am not sure how many such filenames exist on Commons, but I know many exist herePleaseStand(talk)06:29, 12 June 2010 (UTC)
Shit, sorry, I just noticed it's been over two weeks since I looked at this. The month flew by. Silly life, always getting in the way. --MZMcBride (talk) 14:12, 4 July 2010 (UTC)
Thank you! I never knew that there was an article title grep program on the toolserver. It is good that the list is manageably short. PleaseStand(talk)14:31, 4 July 2010 (UTC)
Edit filters warnings effectiveness
I'm not sure if this is doable. A report that list edit filters with "warn" as action (exclude filters that have "disallow" in addition). For each of the filters 1) the number of edits that produced warning in the last 30 days 2) the portion of those edits that were not saved after the warning 3) calculate the percentage. Thank you. Sole Soul (talk) 23:57, 21 June 2010 (UTC)
I suppose you want number of edits (not users) that produced warning, right? Also, I'm not sure how to properly decide whether an edit actually followed a warning. But I think something like “warned user made some edit to the page he was warned about during one minute after that warning” should be pretty accurate. Svick (talk) 00:26, 22 June 2010 (UTC)
Yes, that's what I meant, corrected. I think the warning filters log the edit initially with the action "warn", and a second time when saved with action "none" or "tag" etc. See this for example. Sole Soul (talk) 00:44, 22 June 2010 (UTC)
Wow, that was fast. Thank you very much. The filters that warn and disable like 9 and 10 show 100% effectiveness, which is inaccurate. Apparently these filters warn users and then disallow them if they choose to save (which really doesn't make sense, either warn the user or disallow him). Thank you again. Sole Soul (talk) 05:27, 22 June 2010 (UTC)
I removed the filters that warn and disallow, like you said, it didn't provide any useful data. BTW, it's usually not a good idea to edit page that is generated in some automatic way – the next time it is generated, your changes will be lost (I added the new column to the program that generates the table). Also, if you wanted, I could update the table periodically or on demand. Svick (talk) 13:53, 22 June 2010 (UTC)
Is it possible to exclude images that have the following wording :
"It is held that the use of copyrighted images of individuals that are no longer living, are inherently unrepeatable, and where no new free-use image is available is permitted under US copyright law governing the fair use of copyrighted properties, and Wikipedia fair use policies. Its use is solely for the purpose of illustration and in no way limits the copyright holders rights."
This would assist greatly in finding the images that are actually without rationales (as opposed to having non-standard/non-templated ones.)
Sfan00 IMG (talk) 10:36, 6 July 2010 (UTC)
Yes, it's technically non-free, but it is treated as free- we are allowed to use it in the userspace and rationales are not required. J Milburn (talk) 12:03, 14 July 2010 (UTC)
If you search in your browser for "fair_use_strings", you can see the strings that the bot looks for to determine whether a file description page of a non-free file uses a proper rationale. --MZMcBride (talk) 06:10, 13 July 2010 (UTC)
Unprotected Templates by Number of Transclusions
Hello, It would be nice to see a list of templates that are not protected, sorted by the number of transclusions they have. That way some sysops could go through and protect the higher risk templates. Tim1357 (talk) 06:54, 17 January 2010 (UTC)
A column would probably be ok, but if you sorted by protection status would it sub-sort by number of transclusions? –xenotalk14:51, 23 March 2010 (UTC)
Yeah, there may actually be an advantage to creating a separate report. Not the sorting issue, necessarily, but it would drastically cut down on the noise of the new report and allow one to be purely statistical (the current report) while the other could be more production (the new report). Though I'm not a huge fan of using template protection, so I'd want to avoid any mass protections simply due to being listed.... In the meantime, Indefinitely protected templates without many transclusions (configuration) desperately needs some love. --MZMcBride (talk) 15:06, 23 March 2010 (UTC)
Gah! Can you exclude subpages of Template:Convert? Some of these are also protected because they are "frequently substituted"... Maybe a filter for that might be worthwhile as well. –xenotalk15:22, 23 March 2010 (UTC)
Yes, I can, but I'm not really inclined to. They're sorted alphabetically, just skip over them if you'd like. Or unprotect them, as they're not widely used and their protection is silly. --MZMcBride (talk) 05:24, 25 April 2010 (UTC)
Need to pick a report name for this and come up with a limit for number of transclusions. I guess it only matters if editing is sysop-only. Nobody should care about move protection, surely. --MZMcBride (talk) 05:24, 25 April 2010 (UTC)
While I don't necessarily object to this report, it may be sensible to change the cut-off point given the mess it has inadvertantly caused. Looking at the lower end of the report, do 501 transclusions really constitute "many"? PC78 (talk) 13:29, 13 September 2010 (UTC)
1000? Even that may be too low, but it would be enough to cut the report (as it is now) in half. Ultimately it depends on what this report is to be used for. I'll raise the question over at the WP:ANI discussion. PC78 (talk) 00:10, 14 September 2010 (UTC)
Someone called 'erwin' on the #Wikimedia-toolserver IRC, suggested the following via a pastebin,
1.SELECT page_title
2. FROM page
3. JOIN categorylinks
4. ON cl_from = page_id
5. WHERE page_namespace = 6
6. AND cl_to = 'All_free_media'
7. AND NOT EXISTS (SELECT *
8. FROM templatelinks
9. WHERE tl_from = page_id
10. AND tl_namespace = 10
11. AND tl_title IN ('Nocommons', 'Not commons')
12. )
13. LIMIT 10
but they pointed out that it's only a basic starting point and would need a LOT of tweaking before it's usuable
for the intended purpose of finding Commons candidate material that is not currently tagged as such.
It would need to exclude Files already in Category:Copy to Wikimedia Commons, as well as images with
PUI, No license or No source tags etc
Yar, it got to be too big for a single page and paginating it is an issue as it divides the content into redirects and non-redirects. This isn't the only broken report, see below. --MZMcBride (talk) 19:33, 9 March 2010 (UTC)
That might be an option, though it has two issues: (1) it would make the semi-protected report inconsistent with the fully protected report (though I guess that report could use some pagination too, heh), and (2) it would mean that the non-redirect report would use something clunky like "Indefinitely semi-protected non-redirects" or "Indefinitely semi-protected articles", neither of which are very nice. --MZMcBride (talk) 19:43, 9 March 2010 (UTC)
Good god you're obsessive-compulsive like me, aren't you? ;p The two reports (Indefinitely semi-protected (articles|redirects)), when listed side-by-side, should be self-explanatory. Alternatively the landing page could disambiguate the viewer. –xenotalk19:57, 9 March 2010 (UTC)
I made the above request at WP:BOTREQ a few weeks ago, but nothing happened. This is probably a better venue.
I have chosen to only do Unreferenced BLPs at the moment, but the report could be expanded to do all articles. It depends on how large the output will be. Also articles in Category:Category needed is already in a queue and does not need to be listed on this report. Rettetast (talk) 11:30, 15 March 2010 (UTC)
Thanks for the examples. I spent about an hour working on this, but I think I need to re-think how to do this. --MZMcBride (talk) 23:40, 15 March 2010 (UTC)
Thank you. Once a week is enough. I'll leave a note here if I need the report too be expanded too all articles and not just the unreferenced BLPs. Rettetast (talk) 22:19, 18 March 2010 (UTC)
I'm not sure expanding this report would be accurate. If you use Category:Living people instead of Category:All unreferenced BLPs, the report title would be "Uncategorized biographies of living people," not "Uncategorized and unreferenced biographies of living people." Presumably the biographies not in Category:All unreferenced BLPs are referenced. Would a separate report work?
The broken reports are noted above. I just filed another bug in Wikimedia's bug tracker (bugzilla:22883) to accompany the Toolserver bug report (jira:TS-549). I could probably rewrite these reports to use a different method of comparing against Commons, but I'd really rather see this bug resolved properly (setting the data types as varchar or varbinary consistently). I'm not sure if this is likely to happen in the near future, though, so I may be forced to rewrite these reports in the next few weeks.
I think we could use an automatically generated list of recently created (30 days would make sense, based on Special:NewPages) pages, whose author isn't an admin, which have a deletion log entry and no undeletion entry. Pages like this are likely recreations of the original pages, which would frequently be speedy deletable (if the original was a speedy deletion, then under the same CSD; if the original was deleted per XFD, then CSD G4). Weekly seems like a good frequency for this one. עוד מישהוOd Mishehu09:36, 14 April 2010 (UTC)
I think this would be a good idea, except I wouldn't necessarily limit it to those with no undeletion history. Some pages have a history of deletion/restoration only to be finally settled by an XfD resulting in delete. If such a category were recreated, filtering out ones with a restoration in history wouldn't catch this. An example of one that would fit Od Mishehu's original suggestion would be Category:Wikipedian cyclists, and an example of one that wouldn't be caught would be Category:Socialist Wikipedians. VegaDark (talk) 16:30, 14 April 2010 (UTC)
I looked at pages that were new (in the recentchanges table, which goes back about 30 days) and had a deletion entry. There were about 5,000 results total. I put 100 results below for analysis. These are only articles, but there are IP user talk pages, file pages, and other types of pages in the results as well. --MZMcBride (talk) 23:42, 14 April 2010 (UTC)
Potential biographies of living people (5) (configuration) — the intention was to split off pages with 0 categories from the "(4)" report; (4) is no longer listing pages with 0 categories, but "(5)" still hasn't been created
With regard to the second report, QCoder said here that there were some false positives, but he seems to enjoy the list nonetheless. I pointed him over here and asked if he could produce some common keywords for the false positives to make the report better. Killiondude (talk) 18:09, 3 May 2010 (UTC)
I think we could use a weekly report for categories linking to deleted categories. These are frequently the result of back-links to renamed categories not being fixed. עוד מישהוOd Mishehu12:39, 23 May 2010 (UTC)
Ah, so you're talking about Categories containing deleted red-linked categories, pretty much. (There's a bit of ambiguity in that report title, but oh well.) I posted a list here. The results look pretty noisy. Some results appear to be related to {{#ifexist:}} parser magic voodoo. Or perhaps just bad rows in the pagelinks table, though that's less likely. Let me know what you think of the results. --MZMcBride (talk) 05:21, 24 May 2010 (UTC)
I think we should exclude any case where the deletion reason contains the string "C1" (the CSD for empty categories), which seems to be filling up the report. עוד מישהוOd Mishehu10:51, 26 May 2010 (UTC)
This does contain some useable information (see this edit, where I foxed a link thenks to your report). To make this more useful, I think we should remove categories transcluding {{Progress box}} (these are maintenance categories, where the links don't need to be updated), transcluding {{cat class}} (these are WikiProject categories, where some of the corresponding levels have been deleted). עוד מישהוOd Mishehu08:18, 27 May 2010 (UTC)
Sorry, I seem to have missed your replies here (or I forgot about them). Are you still interested in getting an updated report with the new exclusions? --MZMcBride (talk) 05:01, 13 July 2010 (UTC)
I'm confused now. This most recent request seems to have veered off of the original request. We haven't been looking at category redirects (or pseudo-redirects, as the case may be with {{category redirect}}). I'm not sure how I can modify the current query to accommodate the new request. It seems like it would have to go into a new list (with the exception of point 2, I suppose). --MZMcBride (talk) 01:37, 30 July 2010 (UTC)
Category redirects are almost always categories which have been renamed, but at lewast one user thinks that the old name is one people might be looking for. As such, from a human prospective, it's similar to the main thiong I'm looking for - links to categories which have been renamed, where the links haven't been updated. If you need to do it in a separate report, feel free to do so.
Points 2 and 3 are consistant with my earlier request - point 2 removes a practcly useless part of the list; point 3 removes a class of categories which, while technicly fit the original request, there is no need for anything to be done with. עוד מישהוOd Mishehu07:31, 30 July 2010 (UTC)
I need to chart this, because I don't remember what the original request was or what I wrote and rewrote.
where the deleted categories' deletion reason isn't like ".*([Cc]1|[Ee]mpty|Datadump).*"
Now you want to:
exclude pages beginning with a digit
add {{Articles by Quality}} to the list of templates to exclude based upon on pages
These two pieces are easy enough. The part about category redirects is confusing the shit out of me, so you'll need to copy and paste from this list and re-explain what exactly you're after with those. It'll have to be in a separate report, I think. --MZMcBride (talk) 02:54, 31 July 2010 (UTC)
After an other thought, I think that links to category redirects may be a different report. What I want there is any case where:
Category:A links to category:B
Category:B is a category redirect (defined as containing a {{Category redirect}} template), or a catewgory disambiguation (defined as containing a {{Category ambiguous}}).
Actually a red-link in a progress box means that the category should be created, so there's still an action required. However I seem to remember using AWB's red-links function to try and pick these up, and getting the empty red-linked categories which {{Progress box}} knows about but is not showing, as well as the ones I wanted. RichFarmbrough, 09:22, 15 September 2010 (UTC).