LWP::Parallel::UserAgent
Perl
module
-
Part of CPAN
distribution
ParallelUserAgent 2.51.
LWP::Parallel::UserAgent - A class for parallel User Agents
require LWP::Parallel::UserAgent;
$ua = LWP::Parallel::UserAgent->new();
...
$ua->redirect (0); # prevents automatic following of redirects
$ua->max_hosts(5); # sets maximum number of locations accessed in parallel
$ua->max_req (5); # sets maximum number of parallel requests per host
...
$ua->register ($request); # or
$ua->register ($request, '/tmp/sss'); # or
$ua->register ($request, \&callback, 4096);
...
$ua->wait ( $timeout );
...
sub callback { my($data, $response, $protocol) = @_; .... }
This class implements a user agent that access web sources in parallel.
Using a LWP::Parallel::UserAgent as your user agent, you typically start by
registering your requests, along with how you want the Agent to process
the incoming results (see $ua->register).
Then you wait for the results by calling $ua->wait. This method only
returns, if all requests have returned an answer, or the Agent timed
out. Also, individual callback functions might indicate that the
Agent should stop waiting for requests and return. (see $ua->register)
See the file LWP::Parallel for a set of simple examples.
The LWP::Parallel::UserAgent is a sub-class of LWP::UserAgent, but not all
of its methods are available here. However, you can use its main
methods, $ua->simple_request and $ua->request, in order to simulate
singular access with this package. Of course, if a single request is all
you need, then you should probably use LWP::UserAgent in the first place,
since it will be faster than our emulation here.
For parallel access, you will need to use the new methods that come with
LWP::Parallel::UserAgent, called $pua->register and $pua->wait. See below
for more information on each method.
- $ua = LWP::Parallel::UserAgent->new();
-
Constructor for the parallel UserAgent. Returns a reference to a
LWP::Parallel::UserAgent object.
Optionally, you can give it an existing LWP::Parallel::UserAgent (or
even an LWP::UserAgent) as a first argument, and it will "clone" a
new one from this (This just copies the behavior of LWP::UserAgent.
I have never actually tried this, so let me know if this does not do
what you want).
- $ua->initialize;
-
Takes no arguments and initializes the UserAgent. It is automatically
called in LWP::Parallel::UserAgent::new, so usually there is no need to
call this explicitly.
However, if you want to re-use the same UserAgent object for a number
of "runs", you should call $ua->initialize after you have processed the
results of the previous call to $ua->wait, but before registering any
new requests.
- $ua->redirect ( $ok )
-
Changes the default value for permitting Parallel::UserAgent to follow
redirects and authentication-requests. The standard value is 'true'.
See $ua-register> for how to change the behaviour for particular
requests only.
- $ua->nonblock ( $ok )
-
Per default, LWP::Parallel will connect to a site using a blocking call. If
you want to speed this step up, you can try the new non-blocking version of
the connect call by setting $ua->nonblock to 'true'.
The standard value is 'false' (although this might change in the future if
nonblocking connects turn out to be stable enough.)
- $ua->duplicates ( $ok )
-
Changes the default value for permitting Parallel::UserAgent to ignore
duplicate requests. The standard value is 'false'.
- $ua->in_order ( $ok )
-
Changes the default value to restricting Parallel::UserAgent to
connect to the registered sites in the order they were registered. The
default value FALSE allows Parallel::UserAgent to make the connections
in an apparently random order.
- $ua->remember_failures ( $yes )
-
If set to one, enables ParalleUA to ignore requests or connections to
sites that it failed to connect to before during this "run". If set to
zero (the dafault) Parallel::UserAgent will try to connect to every
single URL you registered, even if it constantly fails to connect to a
particular site.
- $ua->max_hosts ( $max )
-
Changes the maximum number of locations accessed in parallel. The
default value is 7.
Note: Although it says 'host', it really means 'netloc/server'! That
is, multiple server on the same host (i.e. one server running on port
80, the other one on port 6060) will count as two 'hosts'.
- $ua->max_req ( $max )
-
Changes the maximum number of requests issued per host in
parallel. The default value is 5.
- $ua->register ( $request [, $arg [, $size [, $redirect_ok]]] )
-
Registers the given request with the User Agent. In case of an error,
a HTTP::Request object containing the HTML-Error message is
returned. Otherwise (that is, in case of a success) it will return
undef.
The $request should be a reference to a HTTP::Request object
with values defined for at least the method() and url() attributes.
$size specifies the number of bytes Parallel::UserAgent should try
to read each time some new data arrives. Setting it to '0' or 'undef'
will make Parallel::UserAgent use the default. (8k)
Specifying $redirect_ok will alter the redirection behaviour for
this particular request only. '1' or any other true value will force
Parallel::UserAgent to follow redirects, even if the default is set to
'no_redirect'. (see $ua-redirect>) '0' or any other false value
should do the reverse. Please note that POST requests are not being
followed, regardless of the $redirect_ok value!
If $arg is a scalar it is taken as a filename where the content of
the response is stored.
If $arg is a reference to a subroutine, then this routine is called
as chunks of the content is received. An optional $size argument
is taken as a hint for an appropriate chunk size. The callback
function is called with 3 arguments: the data received this time, a
reference to the response object and a reference to the protocol
object. The callback can use the predefined constants C_ENDCON,
C_LASTCON and C_ENDALL as a return value in order to influence pending
and active connections. C_ENDCON will end this connection immediately,
whereas C_LASTCON will inidicate that no further connections should be
made. C_ENDALL will immediately end all requests and let the
Parallel::UserAgent return from $pua->wait().
If $arg is omitted, then the content is stored in the response
object itself.
If $arg is a LPW::Parallel::UserAgent::Entry object, then this
request will be registered as a follow-up request to this particular
entry. This will not create a new entry, but instead link the current
response (i.e. the reason for re-registering) as $response->previous
to the new response of this request. All other fields are either
re-initialized ($request, $fullpath, $proxy) or left untouched ($arg,
$size). (This should only be use internally)
LWP::Parallel::UserAgent->request also allows the registration of
follow-up requests to existing requests, that required redirection or
authentication. In order to do this, an Parallel::UserAgent::Entry
object will be passed as the second argument to the call. Usually,
this should not be used directly, but left to the internal
$ua->handle_response method!
- $ua->on_connect ( $request, $response, $entry )
-
This method should be overridden in an (otherwise empty) subclass in
order to present customized messages for each connection attempted by
the User Agent.
- $ua->on_failure ( $request, $response, $entry )
-
This method should be overridden in an (otherwise empty) subclass in
order to present customized messages for each connection or
registration that failed.
- $ua->on_return ( $request, $response, $entry )
-
This method should be overridden in an (otherwise empty) subclass in
order to present customized messages for each request returned. If a
callback function was registered with this request, this callback
function is called before $pua->on_return.
Please note that while $pua->on_return is a method (which should be
overridden in a subclass), a callback function is NOT a method, and
does not have $self as its first parameter. (See more on callbacks
below)
The purpose of $pua->on_return is mainly to provide messages when a
request returns. However, you can also re-register follow-up requests
in case you need them.
If you need specialized follow-up requests depending on the request
that just returend, use a callback function instead (which can be
different for each request registered). Otherwise you might end up
writing a HUGE if..elsif..else.. branch in this global method.
- $us->discard_entry ( $entry )
-
Completely removes an entry from memory, in case its output is not
needed. Use this in callbacks such as on_return or <on_failure> if
you want to make sure an entry that you do not need does not occupy
valuable main memory.
- $ua->wait ( $timeout )
-
Waits for available sockets to write to or read from. Will timeout
after $timeout seconds. Will block if $timeout = 0 specified. If
$timeout is omitted, it will use the Agent default timeout value.
- $ua->handle_response($request, $arg [, $size])
-
Analyses results, handling redirects and security. This method may
actually register several different, additional requests.
This method should not be called directly. Instead, indicate for each
individual request registered with $ua-register()> whether or not
you want Parallel::UserAgent to handle redirects and security, or
specify a default value for all requests in Parallel::UserAgent by
using $ua-redirect()>.
- $ua->simple_request($request, [$arg [, $size]])
-
This method simulates the behavior of LWP::UserAgent->simple_request.
It is actually kinda overkill to use this method in
Parallel::UserAgent, and it is mainly here for testing backward
compatibility with the original LWP::UserAgent. The following
description is taken directly from the corresponding libwww pod:
$ua->simple_request dispatches a single WWW request on behalf of a
user, and returns the response received. The $request should be a
reference to a HTTP::Request object with values defined for at
least the method() and url() attributes.
If $arg is a scalar it is taken as a filename where the content of
the response is stored.
If $arg is a reference to a subroutine, then this routine is called
as chunks of the content is received. An optional $size argument
is taken as a hint for an appropriate chunk size.
If $arg is omitted, then the content is stored in the response
object itself.
- $ua->request($request, $arg [, $size])
-
Included for compatibility testing with LWP::UserAgent. Every day
usage is depreciated! Here is what LWP::UserAgent has to say about it:
Process a request, including redirects and security. This method may
actually send several different simple reqeusts.
The arguments are the same as for simple_request().
sub request {
my $self = shift;
my $ua = LWP::Parallel::UserAgent->new();
$ua->agent($self->agent);
$ua->from ($self->from);
$ua->redirect(1);
&_single_request($ua, @_);
}
- $ua->as_string
-
Returns a text that describe the state of the UA. Should be useful
for debugging, if it would print out anything important. But it does
not (at least not yet). Try using LWP::Debug...
- $ua->use_alarm([$boolean])
-
This function is not in use anymore and will display a warning when
called and warnings are enabled.
You can register a callback function. See LWP::UserAgent for details.
Probably lots! This was meant only as an interim release until this
functionality is incorporated into LWPng, the next generation libwww
module (though it has been this way for over 2 years now!)
Needs a lot more documentation on how callbacks work!
LWP::UserAgent
Copyright 1997-2001 Marc Langheinrich <marclang@cs.washington.edu>
This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
|