Skip to content

manakai/perl-web-resource

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

=head1 NAME

Web::Transport - Transport layer protocols for the Web

=head1 DESCRIPTION

These modules contain implementations of various protocols forming the
Web platform's transport layer.

Following protocols are supported by these modules and underlying
platform (partially or fully; see documentation of relevant modules):
TCP (for IPv4 and IPv6), UNIX domain sockets, TLS (including SNI, OCSP
stapling), HTTP (including Cookies,
C<application/x-www-form-urlencoded>, C<multipart/form-data>, Basic
authentication, OAuth 1.0, OAuth 2.0 Bearer, C<CONNECT>), SOCKS4,
SOCKS5, DNS, MIME types, MIME sniffing.

=head1 MODULES

There are following modules that expose public APIs:

=over 4

=item L<Web::Transport::BasicClient>

A basic HTTP client.

=item L<Web::Transport::WSClient>

A WebSocket client.

=item L<Web::Transport::PSGIServerConnection>

A PSGI interface of HTTP server.

=item L<Web::Transport::ProxyServerConnection>

A HTTP proxy server.

=item L<Web::Transport::PlatformResolver>

A name resolver using system's resolver.

=item L<Web::Transport::CachedResolver>

A DNS cache.

=item L<Web::Transport::ENVProxyManager>

An environment variable based proxy manager.

=item L<Web::MIME::Type::Parser>

MIME type parser.

=item L<Web::MIME::Sniffing>

MIME type sniffing.

=item L<Web::Transport::AWS>

AWS API signature calculation.

=item L<Web::Transport::Base64>

A Base64 API.

=item L<Web::Transport::DataURL::Parser>

A C<data:> URL processor.

=item L<Web::Transport::FindPort>

Find TCP ports for testing.

=back

The module L<Web::Transport::ConnectionClient> is deprecated.

=head1 REQUEST OPTIONS

For the C<request> method of a L<Web::Transport::BasicClient> object,
following key/value pairs can be used to specify the parameters of the
request to send:

=over 4

=item url => $url

Specify the request URL.  If specified, the value must be a
L<Web::URL> object.

The scheme of the URL must be one of C<http>, C<https>, C<ftp>, C<ws>,
or C<wss>.

Any username, password, or fragment of the URL is ignored.

Exactly one of C<url>, C<path>, and C<path_string> options must be
specified.

=item path => [$seg1, $seg2, ...]

Specify the path segments of the request URL.  If specified, the value
must be an array reference of zero or more character strings.  They
are encoded in UTF-8 and then concatenated with C</>.

If a "prefix" is defined by the context, the path segments given by
this option is appended to the prefix.

Exactly one of C<url>, C<path>, and C<path_string> options must be
specified.

=item path_string => $string

Specify the path of the request URL.  If specified, the value must be
a character string.  It is encoded in UTF-8.  If the first character
is not C</>, a leading C</> is prepended.

If a "prefix" is defined by the context, the path segments given by
this option is appended to the prefix.

Exactly one of C<url>, C<path>, and C<path_string> options must be
specified.

=item method => $string

Specify the HTTP request method.  Note that HTTP request methods are
case-sensitive.  If not specified, C<GET> is used.

=item headers => {$name => $value, ...}

=item headers => [[$name => $value], ...]

Specify HTTP headers.  The value must be a headers value (see
L</HEADERS>).

=item cookies => {$string => $string, ...}

Specify cookies, to be included in the C<Cookie:> header.  The value
must be a hash reference of zero or more key/value pairs, where keys
are cookie names and values are corresponding cookie values.  Any key
whose value is C<undef> is ignored.

Names and values must be strings of zero or more characters.  They are
encoded in UTF-8 and percent-encoded before included in the header
value.  If this behavior is inappropriate, use C<headers> option
instead.

This option also adds C<Cache-Control: no-store> and C<Pragma:
no-store> headers.

=item params => {$string => $string, ...}

Parameters to be included in the request.

The value must be a hash reference of zero or more key/value pairs,
where keys are parameter names and values are corresponding parameter
values.

Parameter names and values must be strings of zero or more characters.
Parameter values can be either a string or an array reference of zero
or more strings, representing parameter values sharing same parameter
name.  Any parameter whose value is C<undef> is ignored.

They are encoded in the C<application/x-www-form-urlencoded> format
(or C<multipart/form-data>, if C<files> option is also specified) in
UTF-8.  If the C<method> is C<POST> and there is no conflicting
options, they are put into the request body.  Otherwise, they are
appended to the request URL.

=item files => {$string => $file, ...}

Files to be included in the request.

The value must be a hash reference of zero or more key/value pairs,
where keys are parameter names and values are corresponding files.

Parameter names must be strings of zero or more characters.

Files can be either a hash reference or an array reference of zero or
more hash references, representing files sharing same parameter name.
Any pair whose value is C<undef> is ignored.

A file hash reference must contain a C<body_ref> key, whose value must
be a reference to a scalar containing zero or more bytes, i.e. the
file content.

A file hash reference can also contain C<mime_type> key, whose value
is a MIME type string (C<Content-Type:> header value) and
C<mime_filename> key, whose value is a file name string
(C<Content-Disposition:> header's C<filename> parameter value;
defaulted to a string C<file.dat> if missing).  Their defaults, used
when the key is omitted or the value is C<undef>, are
C<application/octet-stream> and the empty string, respectively.

They are encoded in the C<multipart/form-data> format in UTF-8.

=item basic_auth => [$userid, $password]

Specifies the credentials of the C<Basic> authentication.  The value
must be an array reference of two strings, which are used as user ID
and password.

=item bearer => $string

Specifies the credentials of the C<Bearer> authentication.

=item oauth1 => [$string, $string, $string, $string]

If a non-C<undef> value is specified, the request is to be signed
using OAuth 1.0 C<HMAC-SHA1> signature method.

The value must be an array reference of strings, which are consumer
key, consumer secret, access token, and access token secret.  They
must be specified even though they can be empty strings.

This option also adds C<Cache-Control: no-store> and C<Pragma:
no-store> headers.

=item oauth1_container => 'authorization' | 'query' | 'body'

Where to add OAuth 1.0 request parameters.  This option is ignored
unless the C<oauth1> option is also specified.

The value C<authorization> designates the HTTP C<Authorization:>
header.

The value C<query> designates the URL query component.

The value C<body> designates the request body.

If this option is not specified, or C<undef> is specified, parameters
are added to the HTTP C<Authorization:> header, unless there is
another C<Authorization:> header, in which case they are added to the
same slot as C<params>.

=item oauth_verifier => $string

The OAuth 1.0 C<oauth_verifier> request parameter value.  If a
non-C<undef> value is specified and the C<oauth1> option is also
specified, this value is taken into account as a request parameter.
Otherwise this option is ignored.

=item oauth_callback => $string

The OAuth 1.0 C<oauth_callback> request parameter value.  If a
non-C<undef> value is specified and the C<oauth1> option is also
specified, this value is taken into account as a request parameter.
Otherwise this option is ignored.

=item aws4 => [$access_key_id, $secret_access_key, $region, $service]

If specified, signature of the request is attached using AWS Signature
Version 4 and the L<Authorization:> header.

The value must be an array reference of four values.  The zeroth item
must be the access key ID.  The first item must be the secret access
key.  The second item must be the AWS region, such as C<us-east-1>.
The third item must be the AWS service, such as C<s3>.

=item aws4_signed_headers => {$name1 => 1, $name2 => 1, ...}

Specifies additionally signed headers.  This option is ignored unless
C<aws4> option is also specified.

If specified, the value must be a hash reference, whose keys are
header names (ASCII case-insensitive) and values are true values.

=item ws_protocols => [$name1, $name2, ...]

Specifies WebSocket protocol names.  The value must be an array
reference of zero or more byte strings.  Only applicable to WebSocket
requests.  To establish a WebSocket connection with WebSocket protocol
names as a WebSocket client, this option must be used (rather than
directly specifying the C<Sec-WebSocket-Proto:> header value) as the
protocol names are directly handled by the client module as part of
the protocol handshake.

=item superreload => $boolean

If true, C<Cache-Control: no-cache> and C<Pragma: no-store> headers
are added.

=item body => $bytes

The request body.  If a non-C<undef> value is specified, it must be a
string of zero or more bytes.

This option is not allowed when C<body_stream> is specified.

=item body_stream => $readable_stream

The request body.  If a non-C<undef> value is specified, it must be a
L<ReadableStream> with type C<bytes> (i.e. a readable byte stream)
which is not locked.

If this option has non-C<undef> value, the C<length> option must also
be specified.  This option is not allowed when C<body> is specified.

=item length => $byte_length

The byte length of the request body.  If a non-C<undef> value is
specified, it must be equal to the number of bytes contained by the
C<body_stream> readable byte stream.

If this option has non-C<undef> value, the C<body_stream> option must
also be specified.  This option is not allowed when C<body> is
specified.

=item stream => $boolean

Whether the result of the operation should contain a L<ReadableStream>
object (true) or a byte string (false) for the response body.  See
L<Web::Transport::BasicClient> for more information.

=item forwarding => $boolean (for proxy)

Whether this client is a component of a proxy, forwarding a request to
the upstream.  This option is only applicable to the C<request> hash
reference of the argument (or return value) to the C<handle_request>
callback of L<Web::Transport::ProxyServerConnection>.

=back

Different subsets of these options are also supported by relevant
methods of deprecated modules L<Web::Transport::ConnectionClient> and
L<Web::Transport::WSClient>.

=head2 Relevant specifications

Fetch Standard <https://fetch.spec.whatwg.org/>.

URL Standard <https://url.spec.whatwg.org/>.

Encoding Standard <https://encoding.spec.whatwg.org/>.

When a text is encoded in UTF-8, the UTF-8 encode steps of the
Encoding Standard MUST be used.

RFC 6265, HTTP State Management Mechanism
<https://tools.ietf.org/html/rfc6265>.

RFC 5849, The OAuth 1.0 Protocol
<https://tools.ietf.org/html/rfc5849>.

HTML Standard <https://html.spec.whatwg.org/>.

Signature Calculations for the Authorization Header: Transferring
Payload in a Single Chunk (AWS Signature Version 4) - Amazon Simple
Storage Service
<https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html>.

=head1 HEADERS

A B<headers value> is either a headers hash reference or a headers
array reference.

A B<headers hash reference> is a reference to a hash whose keys are
header names and its values are corresponding header values.  If a
value is an array reference, that array's items are used as the header
values (i.e. multiple headers with same name are generated).

A B<headers array reference> is a reference to an array which contain
zero or more array references which contains header name (index 0) and
header value (index 1).  A B<canonical headers array reference> is a
headers array reference whose items contain third item (index 2),
which is the ASCII lowercased variant of the header name.

Header names and values must be byte strings.  Header names and values
cannot contain 0x0D or 0x0A bytes.  Header names are case-insensitive.
Header names cannot be empty and cannot contain certain bytes
(e.g. 0x3A).

=head1 PROXY MANAGERS

A B<proxy manager> is used to retrieve proxy configurations used for
fetching.  It is an object with the C<get_proxies_for_url> method.

The C<get_proxies_for_url> method is expected to return the list of
the proxies for the specified URL.  It has a required argument, which
must be a URL record (L<Web::URL>), for which a proxy configuration is
requested.

Additionally, following named argument chould be specified:

=over 4

=item signal => $signal

The abort signal (L<AbortSignal>) for the operation.  If the abort
controller (L<AbortController>) associated with the signal is aborted,
the operation is expected to be aborted and the method's promise is
rejected with an object whose C<name> is C<AbortError> whenever
possible.

=back

The method is expected to return a promise (L<Promise>), which is to
be resolved with a proxy configuration list (see L</"PROXY
CONFIGURATIONS">), or to be rejected with an error.

There are following proxy managers in this repository:
L<Web::Transport::ENVProxyManager> and
L<Web::Transport::ConstProxyManager>.  Any other object can be used as
well.

=head1 PROXY CONFIGURATIONS

A B<proxy configuration list> is ...

XXX

host => ...

XXX

For a proxy configuration returned by a proxy manager, the value must
be a L<Web::Host> object.  For a proxy configuration used as part of
an input to the L<Web::Transport::ConstProxyManager>'s constructor,
the value may be a string which is a valid host string.

debug => $mode

Specify the debug mode.  The default value is the C<WEBUA_DEBUG>
environment variable's value.  See C<WEBUA_DEBUG> section for
available mode values.

This option is only applicable to C<http> and C<https> protocols.


=head1 RESOLVERS

XXX

The C<resolve> method is expected to return a promise.  XXX If
aborted, or the resolver is unable to resolve (note that this is
B<different> from the resolver is ready but the result is "not
found"), the promise is expected to be rejected with the exception
describing the failure.

There are following resolvers in this repository:
L<Web::Transport::PlatformResolver>,
L<Web::Transport::CachedResolver>.

=head1 CERTIFICATE MANAGER

A B<certificate manager> is an object from which TLS certifciates can
be retrieved.

It is expected to have following methods:

=over 4

=item $cm->prepare ($name => $value, ...)->then (sub { ... })

Run preparation steps for the certificate manager.  The concrete steps
are implementation specific.  The method must return a promise, which
is fulfilled or rejected when the certificate manager is ready to
return relevant certificates.

It receives zero or more options as name/value pair arguments.  The
C<server> option has a boolean value, representing server (true) or
client (false).  If the C<server> value is true but the certificate
manager is not ready to return the server certificates (including CA
certificates), the method must reject the promise.

=item {$name => $value, ...} = $cm->to_anyevent_tls_args_sync

Return a hash reference containing zero or more arguments to the
C<new> method of the L<AnyEvent::TLS> class.  It should only contain
certificate-related options.

If the certificate manager's C<prepare> method has been invoked with
C<server> option set to true and its promise has been fulfilled, the
hash reference must contain the server certificates and all other
relevant options.  Otherwise, if the certificate manager's C<prepare>
method has been invoked and its promise has been fulfilled, the hash
reference must contain all relevant options.

=item {$name => $value, ...} = $cm->to_anyevent_tls_args_for_host_sync ($host)

Same as C<to_anyevent_tls_args_sync> but returns argument for a
host-specific certificate.

The method is invoked with an argument of type L<Web::Host>,
representing the domain name given in the SNI extension's field
received from the client.

The method can return C<undef> if there is no host-specific
certificate (such that the default certificate returned by
C<to_anyevent_tls_args_sync> should be used instead).

=back

There is following module in this repository:
L<Web::Transport::DefaultCertificateManager>.

=head1 UNDERLYING PLATFORM INFORMATION OBJECT

An B<underlying platform information object> represents the underlying
platform on which the application is running, used to retrive
platform-dependent configurations.

It is expected to have following methods:

=over 4

=item $string = $info->user_agent

Return the default C<User-Agent> value.

=item $string = $info->accept_language

Return the appropriate value for the HTTP C<Accept-Language> header.

=back

There is following module in this repository:
L<Web::Transport::PlatformInfo>.

=head1 ENVIRONMENT VARIABLES

The C<WEBUA_DEBUG> and C<WEBSERVER_DEBUG> environment variables can be
used to enable the debug mode of client and server modules,
respectively.  If a true value is specified, debug output, such as
some of network input and output, are printed to the standard error
output.  If its value is greater than C<1>, more verbose messages are
printed.

=head1 DEPENDENCY

These module requires Perl 5.14 or later.  They require several core
modules such as L<Digest::SHA> and L<Math::BigInt>.

They require L<AnyEvent> and L<Net::SSLeay>.  The L<Net::SSLeay>
module requires OpenSSL or equivalent (e.g. LibreSSL).  For Web
compatibility and security, OpenSSL version must be latest enough.

They require following modules (which are submodules of this Git
repository):

=over 4

=item modules from perl-promise repository <https://github.com/wakaba/perl-promise>, e.g. L<Promise> and L<AbortController>

=item modules from perl-web-url repository <https://github.com/manakai/perl-web-url>, e.g. L<Web::URL>, L<Web::Host>, and L<Web::Origin>

=item modules from perl-web-encodings package <https://github.com/manakai/perl-web-encodings>, e.g. L<Web::Encoding>

=item modules from perl-web-datetime package <https://github.com/manakai/perl-web-datetime>, e.g. L<Web::DateTime::Parser>

=back

Additionally, modules L<Web::Transport::TCPStream>,
L<Web::Transport::UnixStream>, L<Web::Transport::TLSStream>,
L<Web::Transport::SOCKS4Stream>, L<Web::Transport::SOCKS5Stream>,
L<Web::Transport::H1CONNECTStream>, L<Web::Transport::BasicClient>,
L<Web::Transport::ProxyServerConnection>, and
L<Web::Transport::PSGIServerConnection> require following modules
(which are also part of submodule of this Git repository):

=over 4

=item modules from perl-streams package <https://github.com/manakai/perl-streams>, e.g. L<ArrayBuffer> and L<ReadableStream>

=back

=head1 AUTHOR

Wakaba <wakaba@suikawiki.org>.

=head1 ACKNOWLEDGEMENTS

Some of modules derived from various earlier effort on these areas.
See documentations of modules and comments in source codes for more
information.

=head1 LICENSE

Copyright 2009-2013 Hatena <https://www.hatena.ne.jp/>.

Copyright 2014-2022 Wakaba <wakaba@suikawiki.org>.

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.

The license of the C<tls-certs.pem> file is:

  This Source Code Form is subject to the terms of the Mozilla Public
  License, v. 2.0. If a copy of the MPL was not distributed with this
  file, You can obtain one at <http://mozilla.org/MPL/2.0/>.

=cut