Mike Schilli's Friendly Neighborhood Perl Shop

Home
USArundbrief.com
Resume
CPAN Modules
Articles in English
Articles in German
Mike's Script Archive
English-Japanese Translation Trainer
Adventures with O'Reilly's Safari
10 Easy Steps to Become a California Driver
Unofficial perlmonks.com IRC Channel
My Collection of Outage Pages
Prisma (Computer Club Deutschland)
Mike's Monologues
Mike's Script Archive: POEFetch.pm

POEFetch - Module for bulk-fetching URLs


NAME

    POEFetch - Module for bulk-fetching URLs


DOWNLOAD

POEFetch.pm


SYNOPSIS

    use POEFetch;
    use HTTP::Request::Common;
    my $f = POEFetch->new( max_kids => 5 );
    $f->register(GET 'http://www.yahoo.com');
    $f->register(GET 'http://www.amazon.com');
    $f->register(GET('http://www.protected.com'), "user", "passwd");
    my @results = $f->process();
    for my $resp (@results) {
        if($resp->is_success()) {
            print $resp->request->uri(), ": ", 
                  $resp->content(), "\n";
        } else {
            print $resp->request->uri(), ": ", 
                  $resp->code(), "\n";
        }
    }


DESCRIPTION

POEFetch uses the smart/crazy POE framework to reel in URLs en masse. POE is a nice alternative to pre-forking or multi-threading. The POE framework relies on cooperative multi-tasking and runs its tasks in a single-process and single-threaded mode and takes advantage of the fact that there's free processor cycles while we're waiting for slower operations (like waiting for a network response).

POEFetcher works both with plain HTTP and SSL. Also, basic authentication is supported.


METHODS

my $fetcher = POEFetcher->new( [max_kids => $nof_kids] );
Constructor, creates the fetcher object, capable of registering and then reeling in an arbitrary number of HTTP::Requests. It accepts optional parameter settings, currently the number of requests processed in parallel can be provided in max_kids:
    my $fetcher = POEFetcher->new( max_kids => 10 );

will cause the POEFetcher to run 10 requests in parallel. The default is 5. The maximum throughput has to be determined empirically, reasonable values for max_kids have been found to be between 5 and 10.

$f->register( $req, [$user, $pass] );
Register a request with the fetcher. $req is of type HTTP::Request and created easily with the HTTP::Request::Common module's GET and POST methods.

$user and $pass are optional and are used for basic authentication if provided.

my @results = $f->process();
Fires up all registered requests in a quasi-parallel way (but never more than max_kids (see new()) at a time. The function will block until the last URL has been reeled in. Every element of the result array is of type HTTP::Response and can be queried for success/failure with its standard is_success/is_error methods.

Please note that the PoCo::HTTP::Client used internally won't follow 302s automatically but return them as errors.

Results are stored in the same order as the original requests were registered in. But just in case you lost track of the URLs requested in the first place, every HTTP::Response object provides the original URL in $resp->request()->url().


LEGALESE

Copyright 2002 by Mike Schilli, all rights reserved. This program is free software, you can redistribute it and/or modify it under the same terms as Perl itself.


AUTHOR

2001, Mike Schilli <m@perlmeister.com>


Latest update: 20-Oct-2013