PrePAN

Sign in to PrePAN

Profile

olafalders@twitter

Twitter: @olafalders GitHub: oalders PAUSE ID: OALDERS

User's Modules

LWP::ConsoleLogger Easy way to get pretty debugging output to your console

I really like the logging output of Catalyst and, by extension, https://metacpan.org/pod/Plack::Middleware::DebugLogging I'd like to see something like this easily available for LWP::UserAgent as well as the modules which subclass it (like the WWW::Mechanize) family.

Does something like this already exist? If not, I'd like to get some feedback on the name and how useful people might find it. Using something like wireshark or setting up a proxy server is not always feasible when debugging a mech script.

Something like this would be fairly trivial to write, since a lot of it could be stolen from Plack::Middleware::DebugLogging.

The content_regex param is a regex to run on the page content which will strip the header and footer from the page. It will then remove all HTML markup and display the raw content of the page to you. This makes it easier to debug page content (is this actually the page I want to be on?) etc. I could add an option to preserve markup and/or just dump the entire page contents.

The idea is that if you're watching a mech script crawl X pages, seeing the changing content per page will give you a better idea of what is going on.

olafalders@twitter 1 comment

Dist::Zilla::Plugin::PrePAN Set PrePAN URLs in your meta files

This is based on Module::Install::PrePAN We'd like to have PrePAN links on MetaCPAN and something like this would make it easier for authors using Dist::Zilla to automatically include links back to PrePAN for module discussion.

In your META.json, for example, you'd see something like this:

"resources" : {
  "X_prepan_author" : "http://prepan.org/user/3Yz7PYrBzQ",
  "X_prepan_module" : "http://prepan.org/module/429En4oFdi",
  "bugtracker" : {
     "web" : "http://github.com/oalders/HTTP-CookieMonster/issues"
  },
  "homepage" : "https://github.com/oalders/http-cookiemonster",
  "repository" : {
     "type" : "git",
     "url" : "https://github.com/oalders/http-cookiemonster.git",
     "web" : "https://github.com/oalders/http-cookiemonster"
  }
},

olafalders@twitter 5 comments

HTTP::CookieMonster Easily Read and Update your Jar of HTTP::Cookies

HTTP::Cookies gets the job done, but it's a bit weird in some ways. For instance, instead of returning you a list of cookies, you have to use a callback:

$cookie_jar->scan( \&callback )

The callback will be invoked with 11 positional parameters:

0 version 1 key 2 val 3 path 4 domain 5 port 6 path_spec 7 secure 8 expires 9 discard 10 hash

That's a lot to remember and it doesn't make for very readable code.

Now, let's say you want to save or update a cookie. Now you're back to the many positional params yet again:

$cookie_jar->set_cookie( $version, $key, $val, $path, $domain, $port, $path_spec, $secure, $maxage, $discard, \%rest )

Also not readable. Unless you have an amazing memory, you may find yourself checking the docs regularly to see if you did, in fact, get all those params in the correct order etc.

HTTP::CookieMonster gives you a simple interface for getting and setting cookies. You can fetch an ArrayRef of all your cookies:

my @cookies = $monster->all_cookies;
foreach my $cookie ( @cookies ) {
    print $cookie->key;
    print $cookie->value;
    print $cookie->secure;
    print $cookie->domain;
    # etc
}

Or, if you know for a fact exactly what will be in your cookie jar, you can fetch a cookie by name.

my $cookie = $monster->feeling_lucky( 'plack_session' );

This gives you fast access to a cookie without a callback, iterating over a list etc. It's good for quick hacks and you can dump the cookie quite easily to inspect it's contents in a highly readable way:

HTTP::CookieMonster::Cookie  { 
    Parents       Moo::Object 
    Linear @ISA   HTTP::CookieMonster::Cookie, Moo::Object 
    public methods (12) : discard, domain, expires, hash, key, new, path, path_spec, port, secure, val, version 
    private methods (0) 
    internals: { 
        discard   undef, 
        domain   ".google.ca", 
        expires   1407698600, 
        hash   {}, 
        key   "PREF", 
        path   "/", 
        path_spec   1, 
        port   undef, 
        secure   undef, 
        val   "ID=38ab0cca20346b6f:FF=0:TM=1344626600:LM=1344626600:S=B_24d7BggBJEkhwi", 
        version   0 
    } 
}

If you want to mangle the cookie before the next request, that's easy too.

$cookie->val('woohoo');
$monster->set_cookie( $cookie );
$mech->get( $url );

Or, add an entirely new cookie to the jar:

# You can add an entirely new cookie to the jar via this method
use HTTP::CookieMonster::Cookie;
my $cookie = HTTP::CookieMonster::Cookie->new
    key       => 'cookie-name',
    val       => 'cookie-val',
    path      => '/',
    domain    => '.somedomain.org',
    path_spec => 1,
    secure    => 0,
    expires   => 1376081877
);

$monster->set_cookie( $cookie );
$mech->get( $url );

olafalders@twitter 5 comments

HTTP::RoboDetect Detect bots based on UserAgent String and/or hostname/IP

Bots can, to some extent, be sniffed out based on UserAgent strings, but they can also be flagged with some reliability based on IP/hostname. This module will use HTTP::BrowserDetect to check UserAgent strings and will use some basic rules to flag traffic which comes from the Rackspace and other cloud offerings as likely bots.

olafalders@twitter 11 comments