Talk:Web2Cit

Latest comment: 3 months ago by Omnilaika02 in topic Keep defaulting to fallback

Please leave any message about the project below.

Source code?

edit

Where does the source code for the service (https://web2cit.toolforge.org/) live? Cheers. Mvolz (talk) 13:29, 14 June 2022 (UTC)Reply

Hi, Marielle! Sorry for the delay. For some reason I wasn't notified of this message. The source code is hosted on Wikimedia's Gitlab: https://gitlab.wikimedia.org/diegodlh/w2c-server. It started as something temporary, but as it usually happens it continued to grow, so it's still a bit disorganized. The Phabricator project is https://phabricator.wikimedia.org/tag/web2cit-server/ Diegodlh (talk) 14:23, 21 June 2022 (UTC)Reply

Archive translated page

edit

Dear @Pppery. Further elaborating on what was raised here, I would like to archive this version of the Web2Cit home page for future reference (maybe move it to Web2Cit/Archive/Home) to leave room for moving the page from User:Diegodlh/Web2Cit over to here. Unfortunately I cannot move it myself because it's a page marked for translation. Could you please do that? Thank you! Diegodlh (talk) 18:12, 7 September 2022 (UTC)Reply

Happy to do this, but why Web2Cit/Archive/Home rather than Web2Cit/Archive? * Pppery * it has begun 18:13, 7 September 2022 (UTC)Reply
Web2Cit/Archive sounds better, yeah. Thanks! Diegodlh (talk) 18:19, 7 September 2022 (UTC)Reply
  Done * Pppery * it has begun 18:37, 7 September 2022 (UTC)Reply
Wow!! I'm impressed how fast that was!! Thank you!!! The new page is already on site. This time we will wait before requesting translation, but this one should be more stable already. Thanks again! Diegodlh (talk) 18:39, 7 September 2022 (UTC)Reply

Default on azwiki

edit

Hello, just wanted to let you know that the generator has been activated by default on Azerbaijani Wikipedia =) Toghrul R (talk) 10:52, 21 May 2023 (UTC)Reply

I broke it straight away ;)

edit

Sorry Diego! I was trying out Web2Cit for the NZGeo website following along with the demo you did at Wikimania, so that if and when we have a fix for the no title problem for PapersPast, I am ready to configure it. I had it working for NZGeo at least to recognise the correct item type, but then when I tried to go back and add translation for author name and date, I stuffed it up and it doesn't find the translation template, I think. I'm not sure how to fix it! DrThneed (talk) 01:03, 23 August 2023 (UTC)Reply

edit

Hello Web2Cit maintainers, contributors, and fans! I wanted to let you know that I highlighted the Web2Cit documentation as a shining example in the new Tool Docs guide that I just published. Thank you for creating lovely tool documentation that can serve as an example to help others create and improve tool docs :-) This guide was created as part of the Doc Your Tool project for the upcoming 2024 Hackathon. If you're interested, please join that project to work on or talk about tool documentation during the hackathon! TBurmeister (WMF) (talk) 16:57, 16 April 2024 (UTC)Reply

Wow, @TBurmeister (WMF)! Thank you so much! I wasn't getting notifications for changes on this page, even though it is in my watchlist; I haven't looked into why this was happening. I've just subscribed to new topics, though. Hopefully this will fix it! Diegodlh (talk) 19:20, 9 August 2024 (UTC)Reply

Author name

edit

Hi! I have been working on making translation templates for some Dutch news websites, and I've decided that splitting author names properly is hard, so I've been putting the full name in the last/full names field. However, when I then try to add a reference on the Dutch wiki, it puts the full name in the last name field. Is there some way to make it use the full name field if the first name field is not filled in? Alternatively, could you maybe explain how to properly split names, especially in edge cases where there are multiple authors and/or people have multiple first names or last names? Lwgph (talk) 10:46, 22 August 2024 (UTC)Reply

Hi @Lwgph! Thanks for your message :)
Yeah, splitting author names into first and last names is a complicated issue that goes well beyond Web2Cit. Not only are there technical considerations, but also cultural.
In Web2Cit we have decided to merge the author last and full-name fields because that's Citoid's behavior (the default service that provides automatic citations for Wikipedia) and Web2Cit is meant to simply patch or augment Citoid's responses.
Based on Citoid/Web2Cit response, the Wikipedia Editor chooses a citation template and populates its fields according to the configuration specified in the citation template's TemplateData. You can read more about this here: https://www.mediawiki.org/wiki/Citoid/Maps_TemplateData
For example, for the "Citeer nieuws" citation template in the Dutch Wikipedia, the Citoid configuration in its TemplateData (https://nl.wikipedia.org/w/index.php?title=Sjabloon:Citeer_nieuws&action=edit&templatedata=edit) indicates the following:
"author": [ [ "voornaam", "achternaam" ] ]
Which means that, for each author, Citoid's first field (Web2Cit's authorFirst) will be mapped to the template's "voornam" field, and Citoid's last field (Web2Cit's authorLast) will be mapped to the template's "achternaam". This is standard behavior.
I don't know what the case is for citation templates in the Dutch Wikipedia, but in the English Wikipedia the "last" and "author" (i.e. full name) fields of a citation template are simply aliases, as mentioned here: https://en.wikipedia.org/wiki/Template:Citation_Style_documentation/author. That is, as far as I understand, their values should be formatted exactly the same; it would be just in the wikitext that one would notice the difference.
Your suggestion that, for each author, the second Citoid field may be mapped to the corresponding "full name" field of the citation template (if the first Citoid field is empty) sounds interesting, but I don't think this is supported. Maybe @Mvolz (WMF) knows better?
Alternatively, it could be possible to change the citation template's configuration to always use the corresponding "full name" field (as mentioned here), but this would mean losing the information for split first and last names in all cases, which is probably undesired. Diegodlh (talk) 01:37, 23 August 2024 (UTC)Reply
Thanks for your answer! Good to know that at least I'm not missing anything obvious. I'll just put the names in the correct field to the best of my ability, and it might not always be right, but it's still better than not getting any names at all!
On a somewhat unrelated note, has the server been having some problems recently? I've noticed that sometimes over the past few days when I try to add a reference, the cite tool will just break unless I uncheck Web2Cit, and when I tried the website, it also couldn't translate the url I put in, even when trying urls that worked before. Lwgph (talk) 21:13, 25 August 2024 (UTC)Reply

Keep defaulting to fallback

edit

Hi,

Sorry, I don't know of I'm completely stupid, but I've been trying for three hours without results: why does this URL keeps defaulting to fallback? — Omnilaika02 (talk) 19:22, 28 August 2024 (UTC)Reply

Hi, @Omnilaika02! Web2Cit can be quite challenging, especially at the beginning. But you were really close! Just a very small fix was needed.
First, in cases like this I recommend clicking "Enable debugging" at the bottom of the results page. This detailed translation output can be a bit difficult to understand at first, but provides useful information such as which templates were tried, whether they were found applicable or not, what the output for each selection and transformation step was, etc. More (technical) information about this debug output here: Web2Cit/Docs/Server#Debugging
In this case I could see that the translation template that you had created was not even being tried. This usually happens when any of the mandatory fields (itemType and title) have not not been included in the template. In this case it was the title field that had not been included. Well, actually it was, only that under the wrong (and duplicate) name of itemType :). So I just changed the (duplicate) field name from "itemType" to "title". Now the template was no longer ignored!
However, translation was still defaulting to fallback. As seen in the debugging output, the reason was that the template was marked as not applicable because the output of the "title" field (which is a required field) was empty (therefore invalid). Again checking the debugging output, the reason why it was empty was because the XPath selection step was slightly misconfigured: there was no element matching the path //meta[@property="og:title"]/@content. The correct path is slightly different: //meta[@name="og:title"]/@content. Anyways, because Citoid was getting the title right, I changed it to use the Citoid output for title instead.
These two changes were enough to get the template working. But still there was no output for the "authorLast" field (note that in this case this didn't make the template not applicable because the field was configured as non-required). The reason here was the same as before: there was no element matching the XPath provided: //meta[@property="parsely-author"]/@content. Changed it very slightly to //meta[@name="parsely-author"]/@content.
Also there was no output in the "language" field, but this was just because it wasn't configured in the template. Added this field to the template and set it to use the Citoid output for language, which already matched the expected output.
In addition, I also removed two extra unnecessary selection steps in the "publishedIn" field. These are added by default to get this information from any of the three Citoid fields that may provide it, but are not needed if not using the Citoid output.
Finally, I took the liberty to add a date field to both the templates and tests files. You should get this data from Web2Cit now too :)
Please let me know if you have any other questions. I'm happy to help! Diegodlh (talk) 22:57, 3 September 2024 (UTC)Reply
Hi @Diegodlh thank you SO MUCH for your very detailed explanation and your help. I was able to understand exactly what I was doing wrong and correct it, and managed to do another URL without problem! I'm sure this will help other too.
Could you help me with blick.ch ? There is a french version blick.ch/fr and a german version blick.ch, but I cannot work the /patterns out... Thanks again, Omnilaika02 (talk) 06:29, 4 September 2024 (UTC)Reply
Sure, my pleasure! To help you with this, could you please start by defining a couple test cases (expected outputs), one for German-version path, and another for a French-version path? I've seen you've configured a translation template already for path "/fr/news/suisse/ludc-a-de-la-peine-a-y-croire-le-president-du-plr-thierry-burkart-durcit-le-ton-sur-lasile-id20101854.html". I recommend you use this path too for the French-version test case.
Let me know when you are done and I can help you with the patterns and templates :) Diegodlh (talk) 15:40, 4 September 2024 (UTC)Reply
Thank you :) it's done : french version and german version. Omnilaika02 (talk) 17:01, 4 September 2024 (UTC)Reply
Thanks, @Omnilaika02! Based on the expected outputs you specified for both the German and the French examples, I noticed that there don't seem to be differences between the ways how we may get these values from either version. That is, item type and "published in" is the same for both versions, title and author name can be found using the same XPath expression, and even language (which differs between versions) can be retrieved using XPath instead of using a fixed value. Therefore, I see no need to use patterns here.
There is a problem with this site, however, and that seems to be that they have blocked us, both Citoid and Web2Cit. I haven't looked into the details, but probably it's happening what's happening with some other sites as well, as described here: https://phabricator.wikimedia.org/T362379
This causes both Citoid and Web2Cit to fail because they can't retrieve the webpage to parse it. Unfortunately, as far as I know, we don't have a definitive solution for these cases yet. A couple solutions have been proposed, such as registering Citoid as a friendly bot with CDNs (see T370118) or having Citoid process webpages client-side (see T368980), and some of these may eventually be implemented in Web2Cit too; but we are not there yet.
The only workaround there is right now is having a template that does not include any selection step which relies on actually fetching the webpage. That is, only fixed-value selection steps. This is probably quite useless though, of course, but at least may give users a better citation to start with, rather than no citation at all. A URL selection step may be useful here to offer a better title guess, but it hasn't been implemented (see T304326).
I have done this. Note that because in this case we cannot infer the language from the HTML (since we cannot fetch the webpage) it does make sense to use URL patterns and have two separate templates, one for German and another for French pages. Please try this, check the configuration and let me know if you have any questions or comments!
Finally, we are aware that in cases like this understanding what's going on is definitely not clear at all. Task T317448 describes a possible solution, but it hasn't been addressed yet. Diegodlh (talk) 17:40, 5 September 2024 (UTC)Reply
Oh, by the way! We have recently created a Web2Cit userbox that Web2Cit contributors can add to their meta-wiki user page to advertise their Web2Cit skills and contributions. This userbox also automatically adds the user page to the Web2Cit contributors category, making it easier for Web2Cit users and other contributors to find and help one another.
If you think adding this userbox to your user page makes sense, you can do so by just adding {{User Web2Cit}} to it :) Diegodlh (talk) 17:56, 5 September 2024 (UTC)Reply
Thank you for your help. As I see, we are quite limited with Blick... I put the userbox on my profile, and will continue to create templates for newssources I use, and promote the tool in frwiki! Omnilaika02 (talk) 08:30, 6 September 2024 (UTC)Reply
Return to "Web2Cit" page.