Research talk:Quarry

Latest comment: 2 months ago by Prototyperspective in topic Query request

Page categories

edit

Should this page be categorized in Category:Research? --Bamyers99 (talk) 20:49, 10 February 2015 (UTC)Reply

Execution time

edit

Hi, is there any limit in the execution time of a query? If not, is it somehow possible to trigger a long-running query, close the browser window and later download the result? Thanks, --Aschroet (talk) 21:09, 15 October 2015 (UTC)Reply

Hi Aschroet. Right now, there's a 20 minute execution time. You can start a query, close the browser and expect that the query will continue to run. --EpochFail (talk) 21:13, 15 October 2015 (UTC)Reply
I don't know why, but i just got a result of a query that ran over night with leaving the browser open. How can i know the run_number when i start the query to get the result later? --Aschroet (talk) 03:51, 16 October 2015 (UTC)Reply
How do you need to get the result? Are you running a script which uses the result? The run number is given in the HTML source code of the query page. wget -O - http://quarry.wmflabs.org/run/`wget -O - http://quarry.wmflabs.org/query/nnnn | grep 'qrun_id' | sed 's/^.*qrun_id\": \([^,]*\),.*$/\1/g'`/output/0/tsv?download=true will send the query result set to the standard output on UNIX systems, if you replace nnnn with the query number. --Stefan2 (talk) 10:05, 16 October 2015 (UTC)Reply

Why not run queries in parallel?

edit

Today I experienced my queries running slowly or being queued. Why can't we have multiple queries in parallel, as it will help users to not wait for any trouble some queries of others to complete/timeout.Example query which might be causing my queries to wait. http://quarry.wmflabs.org/query/6185 . I found this query running for more than one hour.--Arjunaraoc (talk) 10:02, 27 March 2016 (UTC)Reply

Hello Arjunaraoc, there are two worker servers that process each one multiple queries at the same time. The problem you describe should no longer be present, fell free to ping me if it's re-appear. Thanks --Framawiki (talk) 17:32, 8 June 2018 (UTC)Reply

Help with query

edit

I'm trying to make a query that list of all "ns 0" articles (not disambigs or redirects) with 0 incoming links (alernativly 0 incoming "ns 0" links).

https://quarry.wmflabs.org/query/9103 is getting killed for taking too long time. Anyone able to help me optimize this, or perhaps run it on PAWS or similar? Josve05a (talk) 22:27, 10 May 2016 (UTC)Reply

mw:Talk:Quarry is generally better for asking for help.
Try using cl_to IN ('All_set_index_articles', 'All_disambiguation_pages') instead of using the page_props table. That might speed up the query a bit. --Stefan2 (talk) 22:45, 10 May 2016 (UTC)Reply

Tool labs' user databases?

edit

Is it possible to query tools' user databases? For example, the ToolTranslate tools' database s53069__tooltranslate_p? (And thanks for a brilliant thing; Quarry is great.) — Sam Wilson ( TalkContribs ) … 00:42, 27 July 2016 (UTC)Reply

Hello Sam Wilson, it's two different database servers, it should not be possible actually. It's phab:T151158. --Framawiki (talk) 17:32, 8 June 2018 (UTC)Reply

How long do results lag behind live data?

edit

Sorry if I've missed this, but it would be helpful if this page of documentation indicated how much time lag there is of the replicas being queried behind the live data. (I'm querying the recent_changes table in wikidatawiki_p.) Mhl20 (talk) 12:25, 9 May 2018 (UTC)Reply

Hello Mhl20, it's phab:T60841. In the meanwhile you can use https://tools.wmflabs.org/replag/ as indicator. --Framawiki (talk) 17:32, 8 June 2018 (UTC)Reply

Unable to log in

edit

I have been unable to log in to Quarry. Even I permit to access with OAuth, this message "Internal Server Error The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application." was displayed. Is it caused by my wrong? Please help me.--Miraburu (talk) 07:44, 1 May 2019 (UTC)Reply

Running a query in multiple wikis

edit

Is there a convenient way to run the same query in multiple languages?

Or do I have to change the "use xxwiki_p" line at the top and rerun it for every wiki? --Amir E. Aharoni (talk) 14:04, 5 December 2019 (UTC)Reply

Trying to ping: User:Zhuyifei1999, User:Framawiki. --Amir E. Aharoni (talk) 07:36, 12 December 2019 (UTC)Reply
Not that I am aware of :( --Zhuyifei1999 (talk) 15:18, 12 December 2019 (UTC)Reply
Not yet possible. That's phab:T95582. One day(TM) --Framawiki (talk) 18:04, 12 December 2019 (UTC)Reply
Heh, a bug I reported by myself and forgot :)
Thanks Framawiki! --Amir E. Aharoni (talk) 11:42, 17 December 2019 (UTC)Reply
As I understand, still no update on this problem? Would like to run same query on local wiki and meta dbs but not possible... Zygimantus (talk) 09:14, 22 March 2024 (UTC)Reply

Delete past query

edit

Hello, is there anyway I can delete my past queries? I have several drafts piling up now and would like to clean it while a few are concerning privacy. RXerself (talk) 06:04, 5 May 2021 (UTC)Reply

RXerself, you could overwrite them? https://quarry.wmflabs.org/query/21660 - Cabayi (talk) 14:54, 6 May 2021 (UTC)Reply

Global edits

edit

I don't know much about SQL, but I managed to scrape some code together that allowed me to get the top editors on my home wiki by edit count, but I was wondering if there is way to do this for global edits. – Ilovemydoodle (talk) 07:53, 4 September 2022 (UTC)Reply

Global edit count is stored in a separate table global_edit_count, but it is apparently not available in public replicas and it probaly wouldn't useful on its own either because 1) the edit count is not indexed, 2) the referenced table is not available, too. --Matěj Suchánek (talk) 09:57, 4 September 2022 (UTC)Reply

MediaWiki plugin QueryViz and Quarry ?

edit

Hello Quarry team,
I'm one of the maintainer of Lingualibre.org (a Wikimedia tool)'s Mediawiki pluging QueryViz. It currently works with Wikibase endpoints : switch to the right endpoint, POST the query, receive the (json) response, build and inject the results as a ikitable into the wikipage. We recently added Wikidata SPARQL endpoint successfully. Commons SPARQL endpoints being private, our xhr requests on it fail. I wonder if :

  • Is it possible to xhr POST crossdomain queries from https://lingualibre.org to quarry.wmcloud.org successfully ?
  • Will you attend the 2023 Wikimedia Hackathon in Athens ? Where we could meet.

Yug (talk) 18:57, 11 April 2023 (UTC)Reply

got error.

edit

when i run below code i got error: SELECT command denied to user 's52788'@'10.64.151.2' for table 'page'

use enwiki_p; SELECT p.page_title, pp.pp_value, count(l.ll_lang) FROM page as p JOIN langlinks l on l.ll_from = p.page_id join templatelinks tl on tl.tl_from=p.page_id and tl.tl_title="Infobox_Indian_constituency" and tl.tl_namespace=10 and tl.tl_from_namespace=0 JOIN page_props as pp ON pp.pp_page = p.page_id AND pp.pp_propname = 'wikibase_item' JOIN wikidatawiki_p.page as wdp ON wdp.page_title = pp.pp_value AND wdp.page_namespace = 0 LEFT JOIN wikidatawiki_p.pagelinks wdpl ON wdpl.pl_from = wdp.page_id AND wdpl.pl_title = 'P2043' AND wdpl.pl_namespace = 120 WHERE p.page_namespace = 0 AND wdpl.pl_from IS NULL group by l.ll_from order by count(l.ll_lang) desc

- IJohnKennady (talk) 18:15, 20 January 2024 (UTC)Reply

The query is too old to work with the current database. You reference removed columns and attempt to join tables from different databases which is not possible anymore. --Matěj Suchánek (talk) 15:48, 21 January 2024 (UTC)Reply

Check existence

edit

Hello. I have a list, like one that usually used in IN operators, ('X', 'Y', 'Z'). Is there a way to write a query that all it does is to return those from these names, that are not in some table? For example: "find pages that do not exist from this list". I've tryed something like

select a from ('X', 'Y', 'Z')
where not exists
 (select * from page
  where page_title = a
  and not page_namespace)

but, of course, it doesn't compile. Thank you. IKhitron (talk) 11:27, 9 May 2024 (UTC)Reply

@IKhitron: I think I found possible solution: https://stackoverflow.com/a/25806882.
Alternatively, create a page in your userspace that links to all the pages. Then, do a query with pagelinks, linktarget and LEFT JOIN page. --Matěj Suchánek (talk) 20:00, 12 May 2024 (UTC)Reply
Thank you. The first one did not help, because you can't refer to the result. The second one is posdibly helpful. IKhitron (talk) 20:07, 12 May 2024 (UTC)Reply

Query request

edit

Is there a place to request queries? I think it would be very useful to bring people looking for a query for some application and people experienced with the tool and SQL (including how tables are named etc) together. I'd like to request the following:

Commons categories with only redcategories

redcats = categories that don't yet exist. These are essentially uncategorized categories not part of the current category tree but are missing from c:Special:UncategorizedCategories. Example

I've already requested it here and I think it was VP/T but nothing was done so far and it seems like it would need a Quarry query.

Commons categories with only Wikidata Infobox categories

Wikidata Infobox categories are cats like "c:Category:Uses of Wikidata Infobox" (cats automatically set by c:Template:Wikidata Infobox). Example Example.

I think a query that can be used for this already exists and was used for two similar reports introduced here – the query seems to be quarry:history/85733/923437/896130.

It would be great if somebody could implement this.
Prototyperspective (talk) 23:42, 24 September 2024 (UTC)Reply

People usually ask for a query or troubleshooting on mw:Talk:Quarry. Some wikis also have a dedicated "request a query" page.
I will try to compile the queries later. --Matěj Suchánek (talk) 19:10, 25 September 2024 (UTC)Reply
Seems like these are only on Wikipedia (in 4 languages) if what you're referring to is Wikipedia:Request a query. That would be great, thanks! Prototyperspective (talk) 20:35, 25 September 2024 (UTC)Reply

Commons categories with only redcategories

SELECT page_title FROM page AS P1
WHERE page_namespace = 14 AND page_is_redirect = 0
AND NOT EXISTS (
  SELECT 1 FROM categorylinks
  JOIN page AS P2 ON P2.page_namespace = 14 AND cl_to = P2.page_title
  WHERE cl_from = P1.page_id
)
AND EXISTS (SELECT 1 FROM categorylinks WHERE cl_from = page_id)

Note that there is no way to prevent scanning all categories on Commons. Unless you are able to run the query unlimited without risking a time out, you will need to LIMIT it and may never get all results. The last line ensures you'll get only categories with at least one (red) category. Otherwise, the results would include the uncategorized categories, too. If you are interested in union of these lists, just remove the line.

Commons categories with only Wikidata Infobox categories

SELECT page_title FROM page AS P1
LEFT JOIN categorylinks ON cl_from = page_id AND cl_to NOT LIKE 'Uses\_of\_Wikidata\_Infobox%'
WHERE page_namespace = 14 AND page_is_redirect = 0 AND cl_to IS NULL
AND EXISTS (SELECT 1 FROM categorylinks WHERE cl_from = page_id)

Again, a LIMIT might be necessary, this time also due to many expected results. I didn't find any list of "cats automatically set by c:Template:Wikidata Infobox", so the criterion is "cats starting with 'Uses of Wikidata Infobox'". --Matěj Suchánek (talk) 09:16, 26 September 2024 (UTC)Reply

Awesome, thank you very much. See c:Commons:Village pump#New reports: categories with only redcats & cats with only infobox cats. Prototyperspective (talk) 17:13, 7 October 2024 (UTC)Reply
From the mw talk page it does seem like this page is a better place to ask for queries. What about creating a subpage for query requests and then adding a "Request a query" button to the page?
Two more queries would be useful and I haven't figured it out, would also be great if you or somebody else could create them / one of them:
  • a query to show all Commons categories that are empty that have no category set (one could combine query Commons categories with only redcategories that are empty with Commons Uncategorized Categories)
  • a query to show all Commons categories that are empty older than 2 years (in case this one isn't too difficult; empty categories are to be deleted on WMC but all empty cats are too many and it makes sense to often give people lots of time to notice the empty cat and populate it)
The two queries above have been very helpful so far. If these backlogs are solved, I think the only key thing missing to get Commons cats to an optimal shape would be some tool to make suggestions that sync Commons cats with ENWP cats (not sure if or to which Quarry could be used for that). There already is work to resolve self-categorizations. However, it currently only checks for direct self-categorizations sooner or later it would be very useful and maybe needed to have a way to also check several layers of subcategories whether it contains the (grand)parent category to fix the categorization loops.
  • I don't know if this can be achieved with Quarry queries but if so it would be really great if somebody could create a fork of Quarry:history/87030/938798/910988 to also check subcategories (e.g. B is subcat of cat A: cat A->B->C->A). By the way, such self-categorizations seem to currently break the deepcategory search operator.
Prototyperspective (talk) 11:05, 13 October 2024 (UTC)Reply
Return to "Quarry" page.