Dalba
Kew POWO citations format
editFor Kew Plants of the World Citations can the format be change from this:
<ref name="Plants of the World Online k345">{{cite web | title=Melocactus estevesii P.J.Braun | website=Plants of the World Online | url=https://powo.science.kew.org/taxon/urn:lsid:ipni.org:names:938363-1 | access-date=2024-04-29}}</ref>
to this:
<ref name="Plants of the World Online k345">{{BioRef|powo | title=''Melocactus estevesii'' P.J.Braun | id=938363-1 | access-date=2024-04-29}}</ref>
One of the users complained on my talk page about the cites -Cs california (talk) 05:34, 4 May 2024 (UTC)
- That's a great suggestion about italics in the title. Unfortunately, italicizing the scientific name within the title field is currently difficult. Plants of the World Online doesn't provide distinct metadata for the scientific name and author.
- The BioRef template offers a cleaner format, but it's not widely adopted across Wikipedias.
- Continuing with the 'cite web' template ensures compatibility with most other wikis.
- And, honestly, the main issue for me right now is that maintaining additional code for alternative citation formats can be challenging. However, I'll certainly keep this feedback in mind for future development if resources allow. Dalba 15:02, 5 May 2024 (UTC)
- Can we get the ref name shortened? There's no need for it to be that long. "POWO" would be sufficient, or "POWO k345" if there needs to be the distinguisher, though it is cryptic and therefore no better than ":2". - UtherSRG (talk) 10:23, 7 May 2024 (UTC)
- Sure, but the algorithm needs to be general. Using website acronym does not work in general since many citations don't have a site name. One should also try to choose unique ref names ... I'm going to change it once again to just a random string. (the last time I changed it was nearly 8 months ago, see [1] for the related discussion.) Dalba 16:20, 7 May 2024 (UTC)
HTTPError
editMy assumption is that you would rather hear about issues than not. The changes you made to present PDF citations in partial form have been a terrific help. I just need to add the title and the author. However, the following URL
https://www.icj-cij.org/public/files/case-related/182/182-20220316-ORD-01-00-EN.pdf
produces: HTTPError
How popular is Citer? Do you keep track of how many uses per day it is getting? Best regards. Swood100 (talk) 19:28, 10 December 2023 (UTC)
- I do, thank you. I wish I had more time to work on parsing pdf files, it might be possible to extract more information about PDF files, I'm just concerned about the performance. Anyway, the problem with this particular URL is that it is behind some CloudFlare restriction mechanism. Not actually sure why, but I cannot download the file from command line either:
$ wget https://www.icj-cij.org/public/files/case-related/182/182-20220316-ORD-01-00-EN.pdf --2023-12-15 14:58:52-- https://www.icj-cij.org/public/files/case-related/182/182-20220316-ORD-01-00-EN.pdf Resolving www.icj-cij.org (www.icj-cij.org)... 104.22.41.99, 172.67.26.159, 104.22.40.99, ... Connecting to www.icj-cij.org (www.icj-cij.org)|104.22.41.99|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2023-12-15 14:58:52 ERROR 403: Forbidden.
- Citer cannot access the URL through HTTP protocol and hence the HTTPError. I guess, the result can be improved by returning a partial cite web template instead, but it may take a while before I can get to it.
- Regarding popularity, I really don't know and I regularly clear the limited logs that toolforge provides. But since you asked, I just looked, and for the past 6 hours there has been around 324 requests processed. Not sure how many of them are unique though, the logs are anonymized.
- Dalba 15:23, 15 December 2023 (UTC)
HTTPStatusError
editHi again,
This link:
https://www.jpeds.com/article/S0022-3476(22)00185-8/fulltext
produces the above error, though supplying the DOI listed on that page works fine:
doi.org/10.1016/j.jpeds.2022.03.005
Best regards, Swood100 (talk) 15:20, 27 December 2023 (UTC)
- Unfortunately the website has blocked toolforge's IP address. :( Dalba 09:53, 28 December 2023 (UTC)
- Seems to be Fixed using curl-impersonate. Dalba 07:27, 25 February 2024 (UTC)
ConnectError
editHi again, when I ran this URL I got the above message:
https://web.archive.org/web/20161105162350/https:/thejungsoul.com/guidance-for-parents-of-teens-with-rapid-onset-gender-dysphoria/
However, when I switched at random to this different saved version it worked fine:
https://web.archive.org/web/20171106084816/http://thejungsoul.com/guidance-for-parents-of-teens-with-rapid-onset-gender-dysphoria
I see what the problem is. In the first one the second https: is only followed by a single '/' instead of two. Looks like a screwball error from the page I got this URL from, because I got another URL from that page:
https://web.archive.org/web/20161209083621/http:/adflegal.org/detailspages/blog-details/allianceedge/2016/08/24/the-weekly-digest-8-24-16
This one also has a single '/' after the http: but it results in a ref that retains the error in two locations:
<ref name="Arnold 2016 i520">{{cite web | last=Arnold | first=James | title=The Weekly Digest: 8-24-16 | website=web.archive.org | date=24 August 2016 | url=http:/adflegal.org/detailspages/blog-details/allianceedge/2016/08/24/the-weekly-digest-8-24-16 | archive-url=https://web.archive.org/web/20161209083621/http:/adflegal.org/detailspages/blog-details/allianceedge/2016/08/24/the-weekly-digest-8-24-16 | archive-date=9 December 2016 | url-status=dead | access-date=29 December 2023}}</ref>
This results in a "{{cite web}}: Check |url= value (help)" red error message, a reference to this page, and a tooltip when I hover over the link:
Arnold, James (24 August 2016). "The Weekly Digest: 8-24-16". web.archive.org. Archived from [http:/adflegal.org/detailspages/blog-details/allianceedge/2016/08/24/the-weekly-digest-8-24-16 the original] on 9 December 2016. Retrieved 29 December 2023. {{cite web}}: Check |url= value (help)
When I add another '/' to the http: in the "url" param in the produced ref the error goes away. I suppose it is asking too much for Citer to correct errors in the URLs it is supplied.
Swood100 (talk) 04:28, 29 December 2023 (UTC)
- Hi there! For me, none of the URLs work. I believe this is another case of toolforge's IP address being blocked by a third party server. Unfortunately, there is not much I can do in these cases. There might be some workarounds, but it will take me a while to implement and test. Dalba 04:07, 31 December 2023 (UTC)
HTTPStatusError
editHi again,
This URL:
https://www.reuters.com/world/middle-east/iraq-pays-last-chunk-524-billion-gulf-war-reparations-un-2022-02-09/
Results in the above error. Another website blocking toolforge's IP address? Why do they do that? Is it always rate-limiting? Best regards, Swood100 (talk) 20:25, 6 January 2024 (UTC)
- Hi. Yes, reuters.com has blocked the IP address of toolforge. It's completely blocked as far as I can tell, no rate limiting here. I can only guess, but I believe after the recent OpenAI and New York Times confrontation, websites have become more stringent about who can access their contents. Toolforge, being the host of several citation generating tools is sending more than usual requests and therefore websites have started blocking its IP address. Dalba 08:17, 12 January 2024 (UTC)
- This seems to be Fixed now that citer is using curl-impersonate. Dalba 07:25, 25 February 2024 (UTC)
Allowing citer requests from en.wikipedia.org
editHi Dalba, I'm writing a citation script for myself on en.wikipedia.org and encountered a CORS error when trying to use citer.toolforge.org. Would it be possible to enable CORS by setting the "Access-Control-Allow-Origin" header appropriately on the citer web server? This page has more information. Your tool is awesome, by the way. Thanks. Daniel Quinlan (talk) 08:40, 6 February 2024 (UTC)
- Hi there! Done. Just note that since I'm not maintaining a stable API yet, the response format might change in the future without any deprecation period. (I have had some thoughts about using Citoid response format, but it's unlikely I'll be able to implement it anytime soon.) Dalba 17:27, 6 February 2024 (UTC)
- Thank you so much! One thing that might help scripts a bit would be adding a parameter to get a raw text response (if you have to choose, just the latter format). I haven't really used Citoid because it doesn't seem to extract enough information to make it worthwhile. Daniel Quinlan (talk) 13:43, 7 February 2024 (UTC)
- Not sure how you are using it right now, but if you send a POST request instead of a GET request and send the
user_input
in the body of the request, then citer will return a json response which I guess might be more easily digestible by scripts. Something likeawait (await fetch('https://citer.toolforge.org/', {'method': 'POST', 'body': 'https://example.com/somepath.html' })).json()
should work. Dalba 07:39, 8 February 2024 (UTC)- I've barely started, but I was doing a GET request and parsing the document. JSON is so much better. For easier updates in the future, you might consider returning a JSON dictionary with named keys like "sfn", "cite", and "ref-name". Also, can the date format be included in the POST request? Thanks again. Daniel Quinlan (talk) 13:49, 8 February 2024 (UTC)
- All parameters of a GET request also work on a POST request if they remain in the URL. The only difference between GET and POST is that `user_input` value should be the body and not in the URL. My previous example with a
date_format
parameter would become:await (await fetch('https://citer.toolforge.org/?date_format=%Y-%m-%d', {'method': 'POST', 'body': 'https://citer.toolforge.org/' })).json()
. You are right about returning a dictionary, it's more flexible and easier to understand. I will probably change it in the future. Dalba 14:06, 8 February 2024 (UTC)- Thanks! Daniel Quinlan (talk) 07:28, 9 February 2024 (UTC)
- All parameters of a GET request also work on a POST request if they remain in the URL. The only difference between GET and POST is that `user_input` value should be the body and not in the URL. My previous example with a
- I've barely started, but I was doing a GET request and parsing the document. JSON is so much better. For easier updates in the future, you might consider returning a JSON dictionary with named keys like "sfn", "cite", and "ref-name". Also, can the date format be included in the POST request? Thanks again. Daniel Quinlan (talk) 13:49, 8 February 2024 (UTC)
- Not sure how you are using it right now, but if you send a POST request instead of a GET request and send the
- Thank you so much! One thing that might help scripts a bit would be adding a parameter to get a raw text response (if you have to choose, just the latter format). I haven't really used Citoid because it doesn't seem to extract enough information to make it worthwhile. Daniel Quinlan (talk) 13:43, 7 February 2024 (UTC)
Citing via archive links
editHello again Dalba. I've been having some issues trying to use citer with archive.org links. It is frequently returning a 500 code with "ConnectError" in the JSON almost immediately. archive.org can be exceptionally slow retrieving archives, it often takes 15 to 30 seconds and sometimes is probably even more than that. It's also possible citer is just being rate limited by archive.org and my limited testing might be enough to drive it from bad to worse. Any ideas?
I've also tried using archive.today links like https://archive.today/N3fQ (they also use archive.is and archive.ph, and probably a few more aliases) and that always seems to result in a ReadTimeout error from citer. Would it be possible to support archive.today archive links?
By the way, I did reach out to archive.org to request that they enable CORS for *.wikipedia.org. If they do that, it's possible that clients could make the request to archive.org and then POST the archive link and the entire web page result to citer for data extraction. That might help if rate limits are the issue. Anyhow, I'll let you know if my request goes anywhere. Regards. Daniel Quinlan (talk) 07:13, 13 February 2024 (UTC)
- The more I look at it, the more archive.today is starting to look like a good addition for dead links. They do comment out scripts including
application/ld+json
, but that's easy to work around. I'm not sure how aggressive the server is about blocking non-interactive clients, but the maintainer has been willing to whitelist IP addresses in the past. Daniel Quinlan (talk) 18:48, 13 February 2024 (UTC) - Hi!
- archive.org: I currently cannot reproduce. It's probably a rate limit. Citer is set to wait for 10 seconds before aborting the request, if you are getting the response immediately then it is not a timeout, perhaps the server has declined the request sooner or some other issue. There might be some clues in the logs, I might need to dig into them. Let me know if they enable CORS for wikipedia, I'll implement a way to submit HTML content to citer.
- archive.today: I would love to add support, but apparently the server does not reply to toolforge requests, no matter the timeout. Here is the verbose output of a curl call:
:$ time curl -I https://archive.today/N3fQ --connect-timeout 300 -v :* Trying 51.38.69.52... :* TCP_NODELAY set :* Connected to archive.today (51.38.69.52) port 443 (#0) :* ALPN, offering h2 :* ALPN, offering http/1.1 :* successfully set certificate verify locations: :* CAfile: none : CApath: /etc/ssl/certs :* TLSv1.3 (OUT), TLS handshake, Client hello (1): :* TLSv1.3 (IN), TLS handshake, Server hello (2): :* TLSv1.2 (IN), TLS handshake, Certificate (11): :* TLSv1.2 (IN), TLS handshake, Server key exchange (12): :* TLSv1.2 (IN), TLS handshake, Server finished (14): :* TLSv1.2 (OUT), TLS handshake, Client key exchange (16): :* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1): :* TLSv1.2 (OUT), TLS handshake, Finished (20): :* TLSv1.2 (IN), TLS handshake, Finished (20): :* SSL connection using TLSv1.2 / ECDHE-ECDSA-AES256-GCM-SHA384 :* ALPN, server accepted to use h2 :* Server certificate: :* subject: CN=archive.today :* start date: Feb 4 02:20:57 2024 GMT :* expire date: May 4 02:20:56 2024 GMT :* subjectAltName: host "archive.today" matched cert's "archive.today" :* issuer: C=US; O=Let's Encrypt; CN=R3 :* SSL certificate verify ok. :* Using HTTP2, server supports multi-use :* Connection state changed (HTTP/2 confirmed) :* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0 :* Using Stream ID: 1 (easy handle 0x5645340fd110) :> HEAD /N3fQ HTTP/2 :> Host: archive.today :> User-Agent: curl/7.64.0 :> Accept: */* :> :* TLSv1.2 (IN), TLS alert, close notify (256): :* Empty reply from server :* Connection #0 to host archive.today left intact :curl: (52) Empty reply from server :real 1m0.404s :user 0m0.030s :sys 0m0.009s :
- Copying `User-Agent` and other headers from browser did not help either. I suspect they have blacklisted toolforge. Dalba 06:26, 22 February 2024 (UTC)
- I suspect archive.today has done something to block non-interactive requests. It might be necessary to use something like Selenium. As an alternative, would it be possible for Citer to support submitting the web page content in a POST request along with the original link and the archive link (if the content is from an archive server)? That would help with sites blocking tools like curl and it might help with rate limits and timeouts too.
- Also, archive.today responded positively to two of my requests: CORS requests now work and they also added back some
<meta>
tags as<old-meta>
. Theapplication/ld+json
data is available as well (it's commented out, but easy to extract). Daniel Quinlan (talk) 07:00, 22 February 2024 (UTC)- They are using SSL handshake fingerprinting to detect non-browser requests. I was able to access the website using https://github.com/lwthiker/curl-impersonate . I might be able to embed that into citer, it just might take me some time.
- The POST request idea is also possible and I do plan to implement it. Dalba 13:11, 22 February 2024 (UTC)
- OK, archive.today URLs are now expected to work (not tested thoroughly though).
- Also, you can now submit HTML using post request. In order to implement this I had to change the POST submit format. Now all parameters should be submitted within the body of the requests in json format. To submit HTML forms, "input_type" should be set to "html" and "user_input" should be an object containing two keys:
{"html": "<HTML string of the page>", "url": "<URL>"}
. Dalba 17:00, 23 February 2024 (UTC)
RequestsError
editHi again,
I copied a DOI address from a web page. It was split over two lines which resulted in a space being placed in the middle:
https://doi.org/10.1371/%20journal.pgph.0000245
This resulted in Citer returning the message: "RequestsError". When I removed the '%20' from the string I got the right result. If it is true that a space is never appropriate in the middle of a DOI string, then stripping any such spaces before running the query might result in more satisfied and less confused users (or in the alternative, substitute the message, "You did not enter a valid DOI. Please check your source."). Swood100 (talk) 20:45, 15 March 2024 (UTC)
- Hi. Thanks for the suggestion. I had to refer to DOI handbook to see if space is a valid character or not. According to section 3.2.1 GENERAL CHARACTERISTICS OF THE DOI SYNTAX: "The DOI name is case-insensitive and can incorporate any printable characters from the legal graphic characters of Unicode." Apparently, space is considered both a graphic character and printable character. That being said, I have not seen any DOI containing the space character.
- Currently citer does not consider the space a valid DOI character, but https://doi.org/10.1371/%20journal.pgph.0000245 is still a valid URL and citer tries to connect to its server, but it fails with
RequestsError
because the server responds with 404 error code. - It is possible to add a separate input type for DOIs. That way citer would not confuse a DOI for a URL. However I believe a separate input type would be a little less convenient for users. For now I'm going to leave citer as it is but might reconsider if other users report similar issues. Dalba 08:33, 22 March 2024 (UTC)
Twin ISSN generated by Citer in cite journal
editIn quite a few cases, Citer generates a twin ISSN in the form issn=<ISSN1>, <ISSN2> in the {{Cite journal}}. The magazines now routinely declare twin ISSNs, one for Internet, one for print. Is it possible to channel the second ISSN into eissn= ? Thank you in advance! Викидим (talk) 19:29, 23 April 2024 (UTC)
- Could you provide an example input that has this issue? Dalba 05:14, 26 April 2024 (UTC)
- For example, https://www.jstor.org/stable/1687467 produces "issn=00368075, 10959203" that does not work with cite templates. The first ISSN is print, the second - online. Викидим (talk) 18:19, 27 April 2024 (UTC)
- Fixed AFAICT, JSTOR does not provide any info about which ISSN is the electronic one. I decided to ignore the second one and use the first as
|issn=
. Dalba 18:03, 2 May 2024 (UTC)
- Fixed AFAICT, JSTOR does not provide any info about which ISSN is the electronic one. I decided to ignore the second one and use the first as
- For example, https://www.jstor.org/stable/1687467 produces "issn=00368075, 10959203" that does not work with cite templates. The first ISSN is print, the second - online. Викидим (talk) 18:19, 27 April 2024 (UTC)
DOI 10.1109/5992.805138
editWith input 10.1109/5992.805138 , the result is unexpected: the submit button stays grayed out, I( have to close the window to continue. There is no result either. While at it, this is a truly great tool! Thank you! Викидим (talk) 18:25, 27 April 2024 (UTC)
- Thank you! Should be fixed now. Dalba 17:58, 2 May 2024 (UTC)
Ref names
editHi Dalba, thanks again for this amazing tool. I had a question about the ref names that are generated by the tool. I noticed that, until a week ago, the tool would include the author's last name and the publication date in the reference name, e.g.:
<ref name="Valenti 2024 n238">{{cite web | last=Valenti | first=John | title=60 years ago, the World's Fair showcased dazzling inventions and international cultures | website=Newsday | date=April 20, 2024 | url=https://www.newsday.com/news/new-york/worlds-fair-60th-anniversary-v7xgi3gr | access-date=May 12, 2024}}</ref>
Recently, however, it appears the last name and publication date are not included in the reference name at all, so the references come out like this:
<ref name="n238">{{cite web | last=Valenti | first=John | title=60 years ago, the World's Fair showcased dazzling inventions and international cultures | website=Newsday | date=April 20, 2024 | url=https://www.newsday.com/news/new-york/worlds-fair-60th-anniversary-v7xgi3gr | access-date=May 12, 2024}}</ref>
Is this an intentional change? I am not sure about other projects, but on English Wikipedia, Help:Footnotes says that the reference names "should have semantic value, so that they can be more easily distinguished from each other by human editors who are looking at the wikitext". I am concerned that the current reference names might not be doing that. Epicgenius (talk) 18:00, 12 May 2024 (UTC)
- Hi! You're right, I did change it (again!) after another user complained that the generated names can sometimes be too long.[2] I'm aware of the guideline, but in practice, how much do you rely on the semantic meaning of the reference name? Personally, I don't find the reference name that important; using the browser's "find in page" function or a page preview works fine for me. That being said, I'm happy to revert the change (again!!) if you think the older method was better. I'm undecided on this one. Dalba 15:27, 14 May 2024 (UTC)
- Thanks for the response. I mainly rely on the author's last name (or the name of the publication, if there's no author). I could see why someone may think "Plants of the World Online" is too long, but for that particular case, spelling out the whole name may also be useful to people who wouldn't know what "POWO" stands for.I personally am not too bothered if you leave the names as is, since I primarily use Citer in conjunction with VisualEditor, which allows editors to reuse references without actually knowing the ref name. However, for those who use the wikitext editor, the reference names might be more helpful to them. Epicgenius (talk) 23:49, 14 May 2024 (UTC)
Citer / vauthors
editCiter's great, and it would be cool to have a vauthors option to generate author names in Vancouver style, since this is what (on en.wikipedia.org anyway) is usual for biomedical citations. Bon courage (talk) 07:14, 1 August 2024 (UTC)
- Thanks for the suggestion! I believe the main (if not only) difference is that authors should be added to the vauthors= parameter. If that's the case, it shouldn't be too difficult to implement. However, I'll need to rework some parts of the code, which might take some time. No promises, but I'll definitely look into it. Dalba 14:52, 2 August 2024 (UTC)
ISBN for bio Kamala's Way
editI tried
ISBN-13 : 978-1398504851 (hardcover - Amazon)
and
ISBN 13: 9781982175771 (paperback - AbeBooks)
for Dan Morain's biography of Kamala Harris => Kamala's Way
Neither worked. Turtlens (talk) 06:42, 4 August 2024 (UTC)
- Works now. Turtlens (talk) 15:03, 4 August 2024 (UTC)
- Most likely a temporary issue with the Google Books server. There might be a rate limit in place. It's difficult to say for sure. Dalba 15:26, 4 August 2024 (UTC)
An error occurred in citer
editHello, https://citer.toolforge.org/ cannot be used, what is displayed is "Webservice request timed out. The tool responsible for the URL you have requested, https://citer.toolforge.org/citer.fcgi?input_type=url-doi-isbn&user_input=https%3A%2F%2Ft.m.youth.cn%2Ftransfer%2Findex% 2Furl%2Fnews.youth.cn%2Fzc%2F202408%2Ft20240805_15427560.htm&dateformat=%25Y-%25m-%25d, is taking too long to respond."--日期20220626 (talk) 13:49, 7 August 2024 (UTC)
- I have restarted the webservice and it should be back, but that particular URL (t.m.youth.cn) looks to be inaccessible. Dalba 14:08, 7 August 2024 (UTC)
Another error in generating cites
editHi Dalba. Today I ran an error in generating citations while using your tool. For example, when I try to auto-generate a cite like this or this, the tool waits about 15 seconds, then displays the output "RequestsError" in the "Shortened footnote and citation" box. As far as I know, this only started happening a few hours ago; the tool was working fine yesterday, and it previously didn't give me any issues with websites such as Deadline and NBC News. Thanks in advance for your help, and thank you again for this awesome tool. Epicgenius (talk) 19:14, 25 August 2024 (UTC)
- Actually, this seems to be working again. Thanks again. Epicgenius (talk) 13:19, 26 August 2024 (UTC)
OCLC of CD's
edit<ref name="q189"></ref>
is always shown as result for OCLC's of CD's.
Examples:
<ref name="q189">{{cite | last=Bach | first=Johann Sebastian | last2=Bach | first2=Johann Christoph | title=Boulder Bach Festival | publisher=Dorian Sono Luminus | publication-place=Boyce, VA | year=2023 | oclc=1394022373 | language=de | page=}}</ref>
<ref name="q189">{{cite | last=Beethoven | first=Ludwig van | author2=Berliner Philharmoniker | title=Beethoven | publisher=Warner Classics | year=2022 | oclc=1355503732 | language=de | page=}}</ref>
Is this intended. Grimes2 (talk) 21:15, 8 September 2024 (UTC)