Blog Home  Home Feed your aggregator (RSS 2.0)  
kevin Mocha - Summary of HTTP Developer’s Handbook (part1)
Bookmarks collected from web.
 
 Friday, March 05, 2010

 

image image image

Server: new a socket –> bind to listen port –> Accept a connection –> send/receive –>close
Client: new a socket –> connect –>send/receive –> close

http://myname:mypass@httphandbook.org:80/mydir/myfile.html?myvar=myvalue#myfrag

image

HTTP is often referred to as a stateless protocol. Although this is accurate, it does little to explain the nature of the Web. All this means, however, is that each transaction is atomic, and there is nothing required by HTTP that associates one request with another. A transaction refers to a single HTTP request and the corresponding HTTP response.

When I speak of a connection in HTTP, I refer to a TCP connection.

A single connection can support multiple HTTP transactions. In many cases, multiple HTTP transactions are required to properly render a URL in a Web browser due to images and other associated content.

image

Get and Post

GET and POST basically allow information to be sent back to the webserver from a browser (or other HTTP client for that matter).

Imagine that you have a form on a HTML page and clicking the "submit" button sends the data in the form back to the server, as "name=value" pairs.

Choosing GET as the "method" will append all of the data to the URL and it will show up in the URL bar of your browser. The amount of information you can send back using a GET is restricted as URLs can only be 1024 characters.

A POST on the other hand will (typically) send the information through a socket back to the webserver and it won't show up in the URL bar. You can send much more information to the server this way - and it's not restricted to textual data either. It is possible to send files and even binary data such as serialized Java objects!

A PUT allows you to "put" (upload) a resource (file) on to a webserver so that it be found under a specified URI. DELETE allows you to delete a resource (file). These are both additions to HTTP/1.1 and are not usually used. HEAD returns just the HTTP headers for a resource. TRACE and OPTIONS are also HTTP/1.1 additions and also rarely used.

 

Although client-side data validation can add to user convenience by avoiding unnecessary HTTP transactions, you should never depend on this technique to ensure the data is valid.

GET /search?hl=en&q=HTTP&btnG=Google+Search HTTP/1.1 
Host: www.google.com 
User-Agent: Mozilla/5.0 Galeon/1.2.0 (X11; Linux i686; U;) Gecko/20020326 
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9, 
        text/plain;q=0.8, video/x-mng,image/png,image/jpeg,image/gif;q=0.2, 
        text/css,*/*;q=0.1 
Accept-Language: en 
Accept-Encoding: gzip, deflate, compress;q=0.9 
Accept-Charset: ISO-8859-1, utf-8;q=0.66, *;q=0.66 
Keep-Alive: 300 
Connection: keep-alive 

Broken down, the request line is

 

An HTTP request, which is the message sent from a Web client to a Web server, is comprised of three basic elements:

  • Request line

  • HTTP headers

  • Content

The first line of an HTTP request is always the request line. The request line specifies the request method, the location of the resource, and the version of HTTP being used. These three elements are delimited by spaces. For example:

GET / HTTP/1.1 

This example specifies the GET request method, the resource located at / (document root), and HTTP/1.1 as the version of protocol used.

 

The second section of an HTTP request is the headers. HTTP headers include supporting information that can help to explain the Web client's request more clearly. There are three types of HTTP headers that can appear in a request:

  • General headers

  • Request headers

  • Entity headers

There is no requirement pertaining to the order of the headers. Also, because entity headers specify information about the content, they are rarely present in HTTP requests.

 

In general, it is fairly easy to discern which category a header belongs to. Request headers specifically relate to something unique to an HTTP request, such as the User-Agent header which identifies the client software being used. General headers are common headers that can (at least theoretically) be used in either an HTTP request or an HTTP response. Entity headers relay information about the content itself (the entity). As this request has no content, it also lacks entity headers.

 

There are eight request methods in HTTP/1.1: GET, POST, PUT, DELETE, HEAD, TRACE, OPTIONS, and CONNECT. HTTP/1.0 specifies three methods (GET, HEAD, and POST), although four others are implemented by some servers and clients claiming to be HTTP/1.0. The support for these four other methods (PUT, DELETE, LINK, and UNLINK) is inconsistent and mostly undefined, although they are each briefly referenced in Appendix D of RFC 1945, the HTTP/1.0 specification.

 

A GET request is basically a request to receive the content located at a specific URL. Obtaining a URL using the GET method allows users to bookmark the URL, create a link to the URL, email the URL to a friend, and the like. There is a limited amount of data that can be sent from the Web client using get, and this limit is very inconsistently implemented.

 

The POST method is commonly supported by browsers as a method of submitting form data.
As with the query string of a URL, the data in a POST consists of name/value pairs separated by the & character. Special characters are URL encoded, and the Content-Type header references this fact.

 

For many forms, the POST method is preferable.

 

The PUT method is not nearly as common as GET or POST. However, it is useful in certain situations because it allows the Web client to send content that will be stored on the Web server.
It should be noted that the PUT method is very rarely implemented in Web clients. A common misconception is that the PUT method is required for uploading files. However, this capability is actually an enhancement to the POST method as identified in RFC 1867, "Form-based File Upload in HTML".

 

HEAD is a very useful request method for people who are interested in finding out more information about the way a certain transaction behaves. The HEAD method is supposed to behave exactly like GET, except that the content is not present. Thus, HEAD is like a normal GET request with all of the HTML stripped away.

 

TRACE is another diagnostic request method. This method allows the client to gain more perspective into any intermediary proxies that lie between the client and the server. As each proxy forwards the TRACE request on route to the destination Web server, it will add itself to the Via header, with the first proxy being responsible for adding the Via header. When the response is given, the content is actually the final request including the Via header.

 

Sometimes it is helpful to simply identify the capabilities of the Web server you want to interact with prior to actually making a request. For this purpose, HTTP provides the OPTIONS request method.

 

The CONNECT request method is reserved explicitly for use by intermediary servers to create a tunnel to the destination server. The intermediary, not the HTTP client, issues the CONNECT request to the destination server.
The most common use of the CONNECT method is by a Web client that must use a proxy to request a secure resource using SSL (Secure Sockets Layer) or TLS (Transport Layer Security). The client will tunnel the request through the proxy so that the proxy will simply route the HTTP messages to and from the Web server without trying to examine or interpret them.

 

Accept Header

Authorization Header
Once the browser has successfully authenticated with a Web server in this way, it will appear to a user as if all further requests do not require reauthentication. However, due to the stateless nature of the Web, every request must include the Authorization header, otherwise the server will respond with a 401 Unauthorized response. The convenient behavior of most modern Web browsers involves the browser storing the access credentials and sending the Authorization header with all HTTP requests for a URL within a domain previously discovered to be protected. Because this utilizes the browser's memory, this convenience lasts as long as the browser (at least one instance of the browser) remains active, and the user will be unaware that this authentication takes place in subsequent requests. This can be a very important factor when debugging HTTP authentication, because if you receive a 401 Unauthorized response without being prompted for a username and password, this suggests that the browser is using incorrect credentials in the Authorization header. Restarting the browser will resolve this situation.

Friday, March 05, 2010 9:58:27 PM UTC  #    Comments [0]    |  Trackback
Copyright © 2010 Kevin Mocha. All rights reserved.
DasBlog 'Portal' theme by Johnny Hughes.
Pick a theme: