Originally, the web was mostly just a system for sending and receiving HTTP requests. A browser would ask to be sent a page with an HTTP request and the server would send the page to the browser.
The page itself might include things like references to images that the browser would ask for with more HTTP requests. It was all very simple. But it didn’t allow for processing to be done by the server. As a result, the Common Gateway Interface (CGI) was developed.
With CGI, a browser could send a request with inputs to the server, and a CGI program would send back a web page processed based on the sent inputs. Take an early example: a CGI program that returned information about chemical compounds.
The browser would send a request to the CGI program with the compound the user wanted data about, and the program would send back a page filled with information about that compound.
Eventually, server programming languages like PHP and Python were developed, but in those days, CGI was all there was.
Still, CGI had a unique ability: it was language independent. If the server could run the program, CGI could handle it. So it could be a compiled C++ program or an interpreted Perl script or just about anything else.
Today, CGI programs are mostly legacy. But there are times when it is still the best way to solve a problem. Let’s take a closer look at the environment variables that are the backbone of the system.
If you are considering CGI programming, the following variables will be very useful for handling various server requests for processing form data resulting in powerful and versatile programs.
To access these variables, you would have to retrieve an entry from an array of values referring to the environment. For example in Perl, you would retrieve values in the $ENV array using environment variable keys similar to the following:
In the above code excerpt, env_var refers to an environment variable key or string-like SERVER_NAME. Other programming languages have their own systems for managing environment variables. Check the reference for your particular language.
Here are the CGI variables and what they do:
Some web servers protect access to CGI scripts using authorization. The AUTH_TYPE variable refers to the Authorization Type that the server uses to verify users.
For example, a possible value for this variable could be Basic referring to Basic authentication. Note that not all servers support authorization.
CONTENT_LENGTH gives the length of the content delivered through the request as the number of bytes. If the length is unknown, then the variable would be set to -1.
The CONTENT_TYPE variable contains the type of file that is returned by the request.
For example, if a web page is requested, the CONTENT_TYPE variable would be set to the MIME type text/html.
If you want to know what version of the CGI specification the server handles, then you can query the GATEWAY_INTERFACE. This variable will help to ensure you are using the right version of the specification and valid commands.
Just like CONTENT_TYPE provides the data or MIME type that is delivered, the HTTP_ACCEPT lists all the possible MIME types that a client making the request can accept. The list of types is separated by commas.
The HTTP_USER_AGENT gives the name of the program that a client uses to send the request.
For example, if a user executes a CGI script from Mozilla Firefox, the HTTP_USER_AGENT would indicate that the user made a request to the web server through Firefox.
The PATH_INFO variable contains additional information that is seen after the CGI script name.
For example, if you execute www.placeholder.com/cgi-bin/hello.pl/index.html, then the PATH_INFO for this would be the characters that come after the CGI script name or /index.html in this example.
When you type an address of a CGI script on a web browser, you usually type in a virtual path which is mapped to a physical location on the server.
For example, if you go to http://www.somewebsite.com/cgi-bin/index.cgi and you query the PATH_TRANSLATED variable, you will get the actual physical path. If you are on a shared Unix server, that might be /home/placeholder/public_html/cgi-bin/index.cgi.
It is common to see query information appended to a URL after the question mark. For the URL http://www.placeholder.com/cgi-bin/hello.cgi?name=Leroy&exclamation=true, requesting the QUERY_STRING would return in name=Leroy&exclamation=true being returned.
The REMOTE_ADDR variable gives the IP address of the client computer making the request. Essentially, REMOTE_ADDR is REMOTE_HOST resolved to an IP address.
Web servers constantly accept both connections and requests from clients. The REMOTE_HOST variable refers to the hostname of the client that performs the request.
For example, if your webhost accepts a request from webhost2.com, then REMOTE_HOST would be populated with webhost2.com.
The REMOTE_IDENT variable stores the user ID running the CGI script. The user ID is stored only if the ident process is running since ident returns a response containing not only user ID information, but also the name of the OS running the script.
Querying the REMOTE_USER variable will give the username information of the entity making the request. This is only valid if authentication is enabled.
The REQUEST_METHOD gives the type of HTTP request completed which includes values like GET, POST, and PUT.
If instead, you want to get the virtual path of the script being executed, you can simply query the SCRIPT_NAME variable.
For example if you run the script http://www.placeholder.com/cgi-bin/ping.sh and retrieve SCRIPT_NAME, you will get the virtual path of the script or /cgi-bin/ping.sh.
The SERVER_NAME variables give the full name of your server.
For example, if you query for this variable, the result will be the website’s domain name — something like www.placeholder.com.
Any server running on the web has both an address and a port. The server uses a port to accept connections and listen for requests. The standard port is 80, but it can be other numbers — particularly for specialized applications. Querying the SERVER_PORT variable will result in the value of the listening port.
You can find out what protocol a server is using to handle requests.
For example, if the server you are working with uses the HTTP protocol it will return a string like “HTTP/1.1” which means that the server is using HTTP version 1.1. Basically the string returned is in the format protocol/version.
The SERVER_SOFTWARE environment variable contains the name and version of the software running on the web server.
For example, if you output the value of this variable and you are running a version of Apache, you may get something similar to the following:
One of the first steps you can take to understand CGI or the HTTP protocol is to familiarize yourself with the underlying variables and syntax. This includes the environment variables just outlined.
Though CGI is rarely used today, many current web development languages like PHP also use many of these variables. As a result learning them will also help you write robust programs even for current web development languages.