Perlwikipedia/Bugs

This page is a makeshift list of bugs in perlwikipedia for those without Google Code accounts or that don't want to create them.

New

Description: get_text does not work when on a non-English wiki (example bug)
- Summary: When using get_text on a non-English wiki, the function will error out with a 404 (I hope to God this isn't true).
- List any relevant steps to reproduce the bug to help the developers, or a nice *nix-style patch if you've got one. Shadow1 (talk) 19:14, 30 May 2007 (UTC)[reply]

Open

Closed

Description: When running on ActivePerl on a Windows machine, the get_text method hangs in an infinite loop.
- Summary: This loop seems to occur because the condition on line 295 is never met, because $res->content contains garbled text. (Looks like an encoding problem.)
- This occurs on my computer running ActivePerl on Windows, with the latest versions of all modules. – Quadell ^{(talk) (random)} 16:36, 6 June 2007 (UTC)[reply]
- A work-around has been found! Shadow1 suggested I go through Perlwikipedia.pm and change all instances of ->content to ->decoded_content. This fixes it. I'm not sure if a more seamless solution should be developed before closing this bug though. . . – Quadell ^{(talk) (random)} 19:55, 6 June 2007 (UTC)[reply]
Fixed in SVN. Shadow1 (talk) 15:56, 7 June 2007 (UTC)[reply]
The code at http://perlwikipedia.googlecode.com/svn/trunk/Perlwikipedia.pm has a bug, in the _put subroutine one declares the variable $res twice. That is easily fixed, and I can do it since I have access to the repository, but I am not sure if the googlecode version of the code is the most recent one. Oleg Alexandrov (talk) 15:53, 20 June 2007 (UTC)[reply]
I fixed it myself. Oleg Alexandrov (talk) 02:07, 23 June 2007 (UTC)[reply]

3. Description: get_text fails on certain UTF-8 characters

Summary: If you attempt to retrieve the text of a page such as Š, the following error is produced:

 Can't escape \x{0160}, try uri_escape_utf8() instead at {path}/perlwikipedia/Perlwikipedia.pm line 64

Test Case: The following code segment demonstrates the problem.

 my @results = $bot->what_links_here("Caron");
 for my $result (@results) {
   my $page = $result->{title};
   print "Getting $page\n";
   my $text = $bot->get_text($page);
 }

Resolution: I patched my copy of Perlwikipedia.pm by doing exactly what the error message states. I don't know if this is the best approach, but it works.

 $ svn diff
 Index: Perlwikipedia.pm
 ===================================================================
 --- Perlwikipedia.pm    (revision 88)
 +++ Perlwikipedia.pm    (working copy)
 @@ -7,6 +7,7 @@
  use XML::Simple;
  use Carp;
  use Encode;
 +use URI::Escape qw(uri_escape_utf8);
 
  our $VERSION = '0.90';
 
 @@ -61,7 +62,7 @@
      my $extra     = shift;
      my $no_escape = shift || 0;
 
 -    $page = uri_escape($page) unless $no_escape;
 +    $page = uri_escape_utf8($page) unless $no_escape;
      $page =~ s/\&/%26/g; # escape the ampersand
 
      my $url =

Thanks. -- JLaTondre 12:00, 18 July 2007 (UTC)[reply]

I applied the patch. I tested it too. Thanks! The new revision is available at the Google code repository for Perlwikipedia. Oleg Alexandrov (talk) 03:28, 19 July 2007 (UTC)[reply]

4. Description: get_pages_in_category() does not return images in the category

Summary: Can this be changed to include images as well? – Quadell ^{(talk) (random)} 13:38, 7 June 2007 (UTC)[reply]

Patch written, tested, and committed. Shadow1 (talk) 13:10, 25 August 2007 (UTC)[reply]

5. Description: get_history failing on articles with UTF-8 characters in the name

Summary: For articles with UTF-8 characters in the name, such as Kashō, get_history fails. The query does not retrieve the results as the UTF-8 characters need to be escaped. I added $pagename = uri_escape_utf8($pagename); to the start of get_history and it fixed the problem. This same problem will occur with any other function that uses _get_api. It cannot be fixed by simply escaping $query within _get_api as that will also escape characters that shouldn't be (ex. the & in &action).
The following is the diff for the change I made. Thanks. -- JLaTondre 00:14, 27 August 2007 (UTC)[reply]

236a237,238
>     $pagename = uri_escape_utf8($pagename);
>

Committed to SVN, along with some other functions with the same bug. Should be rolled out in version 1.01 soon. Shadow1 (talk) 01:05, 27 August 2007 (UTC)[reply]