Almighty Bus Error

Loading search...

HTTP Proxy: Concept introduction

An HTTP proxy is a program that acts as a seamless server (since the user does not notice its existence under normal circumstances) and forwards HTTP traffic.

A problem for people that are creating a proxy for learning purposes is that most browsers nowadays days use HTTP/1.1 which is far more complex than its HTTP/1.0 counterpart implementation-wise. For this very reason we will convert every HTTP/1.1 requests to HTTP/1.0.

For a browser using HTTP/1.0 to fetch a webpage, it first needs to establish a connection with the webserver and use a “GET” command on the index (or another page if specified). When it finishes receiving and processing that page, the browser identifies every resource the page uses (be it images, scripts or Cascading Style Sheets). For each resource it needs, the browser creates a connection and sends a request. The webserver closes the connection after it finishes sending the resource.

The proxy will work like the following:

A browser using HTTP/1.1 does, however use pipelining and persistent connections, which after requesting and receiving the webpage the server does not close the connection, and can keeps receiving requests from the browser for resources avaiable locally until the client closes the connection (by using the Connection property with keep-alive value).

For the proxy to convert the request headers to HTTP/1.0 it is needed to read and modify them, removing some fields from the HTTP/1.1 request.

For our purposes we will remove “Connection” and the non-obligatory “Keep-Alive” and “Proxy-Connection” properties since they are unique to HTTP/1.1 and change the protocol version from the header. Every line of a HTTP request and response ends using two special characters “\r\n”.

Example:

GET http://almightybuserror.com/ HTTP/1.1
Connection: keep-alive

And this last request gets converted and sent to the server “http://almightybuserror.com” as:

GET / HTTP/1.0

Note: There needs to exist an extra “\r\n” at the end of the request since it is the only way the server involved knows that the request has ended.

Thank for reading. Comments are welcome.