Guide to HTTP

HTTP is a request/response protocol that is used for transmitting hypermedia documents such as HTML. HTTP connections start by a client sending a request to the server in the form of a request method, URI, and protocol version, followed by a MIME-like message containing request modifiers, client information, and possible body content over a connection with a server.

The request method, URI and protocol version make up the request line. Followed by request headers defining the parameters of the HTTP transaction. Both of these are what make up the request message header. Then lastly a blank line and the optional request message body.

image of HTTP request

The server responds with a status line that includes the message's protocol version and a response code for success or error, followed by by a MIME-like message containing server information, entity meta-information, which is our response headers and possible a entity-body content being response message body.

image of HTTP request

HTTP is also used as a generic protocol for communication between user agents(browsers, web crawlers) and proxies/gateways to other Internet systems, including those supported by the SMTP, FTP, Gopher, and WAIS protocols. In this way, HTTP allows basic hypermedia access to resources available from diverse applications.

TCP

Hypertext Transfer Protocol (HTTP) is an application level protocol. We covered application level protocols in the OSI model blog post. HTTP was designed to use TCP of the transport layer, but can be used with other protocols such as UDP.

An HTTP client initiates by starting a TCP session to port 80 of the server. The server's TCP stack uses Transmission Control Block (TCB) for distinct connections. Now we begin with a 3 way handshake.

  1. The client will sent a syn packet, requesting the server to establish a session.
  2. Server will respond to syn with syn-ack, stating it wishes to synchronize with the requesting client, and acknowledges its initial request.
  3. Finally, the client will respond with ack to acknowledge the server's synchronize request.

image of 3 way handshake for TCP connection

TCP will then commence with data transmission, and will close the session upon completion.

Sessions

HTTP is a stateless protocol, that is dependent on a stateful TCP protocol, which relies on IP that is also stateless. Stateless protocols treat requests an independent transactions unrelated to any previous requests. HTTP follows the RESTful architecture, while TCP maintains state in the form of window size for data transmission. With HTTP being stateless, a web client can hold information in cookies or session IDs, such as authentication credentials to keep a session but still be stateless.

Encryption

HTTPS is the protocol for secure communication over the network. Its encrypted by the Transport Layer Security (SSL) or Secure Socket Layer (SSL), which is also found on the application layer of the OSI model.

Request methods

HTTP defines methods to indicate the desired action to be preformed on the identified resource. These methods are as follows:

Method Function
GET Requests a representation of the specified resource.
HEAD Used when retrieving meta-information from response headers. Similar to GET, but server responds with no body.
POST requests the server accepts the entity enclosed in the request as a new subordinate
PUT Requests enclosed entity be stored under supplied URI
DELETE Deletes the specified resource
TRACE Echoes the received request, to see if any changes are made by intermediate servers
OPTIONS Returns supported HTTP methods for specified URL. Used to check functionality
CONNECT Converts connection to TCP/IP tunnel, Used for encrypted communication through an unencrypted proxy.
PATCH Applies partial modifications to a resource

URI

A URL is a type of URI. HTTP will use a URL such as http://www.example.com/ or ftp://example.com, to send requests. It includes a prefixed access mechanism, being http and ftp in our examples.

A URN is also a type of URI, although it is much simpler as it only refers to the name of a resource example.com or urn:isbn:0451450523. As you can see, there is no application layer protocols associated in these examples.

Now that we have an overview of different types of URIs, we can now say that URIs are a string of characters that are used to identify a resource.

Protocol Versions

HTTP went through different versions starting from 1991, when HTTP/0.9 was first documented.

HTTP/0.9

Simple protocol for raw data transfer. Connection is closed after a single request/response pair

HTTP/1.0

Improved the protocol by allowing messages to be in the format of MIME-like messages, contain meta-information about the data transferred and modifiers on the request/response semantics. Connection is still closed after a single request/response pair

HTTP/1.1

Can reuse a connection multiple times to download resources. Such as images, scripts, and stylesheets, after the page has been delivered. Keep-alive-mechanism was introduced, allowing a connection to be reused for more than one request. This greatly reduced latency, as the connection does not need to re-negotiate the 3-way handshake.

HTTP/2.0

Goal of HTTP/2.0 is to improve performance by reducing latency and the number of TCP requests. Its utilizes multiplexing for multiple TCP connections, header compression reduces sending the same headers repeatedly, server push handles resource dependencies, and resource prioritization judges what the user will need first.

MIME

MIME (Multipurpose Internet Mail Extensions) was designed for SMTP/email service, however, it became important when using HTTP to enable browsers to display or output files that are not in HTML format.

Response Codes

The status line of an HTTP response includes a numeric status code and a textual reason phrase. Below is a small sample set, but servers can even have custom codes and proprietary codes from vendors.

Code Status Description
100 Continue For large body requests, server has received the header and is ready for request body
200 OK Standard response for HTTP requests
201 Created Request has been fulfilled
202 Accepted Request has been accepted for processing, but processing has not been completed
301 Moved Permanently This and all future requests should be directed to the given URI
400 Bad Request Server cannot process the request due to error
401 Unauthorized Authentication has failed
403 Forbidden User does not have proper permission for request
404 Not Found Returns supported HTTP methods for specified URL. Used to check functionality
500 Internal Server Error Generic error, no specific information is available
502 Bad Gateway Server was acting as proxy/gateway and received invalid response from upstream server