Mod_rewrite Introduction and Cheat Sheet
mod_rewrite is part of the Apache web server software that runs on your web host's computer. It allows a URL to be dynamically changed, or "rewritten."
You'll see mod_rewrite in action if you use WordPress with "pretty" permalinks switched on. By default, all WordPress pages are loaded using a URL that contains an ID number, and possibly other variables that tell WordPress what to load. But when pretty permalinks are enabled, mod_rewrite rules change those ugly variables into words. We sometimes call the resulting URLs "clean."
Rewriting with mod_rewrite is essentially a process of translating a URL from a dirty to a clean one on the fly. The visitor never sees the URL change in the address bar, since the processing is handled by Apache before anything is sent to the visitor's browser.
It's perfectly possible to run a website without using mod_rewrite, or clean URLs. But here's why most people use it:
- To make URLs easier for humans to read. "Pretty" URLs are more memorable, easier to type, and easier to read aloud.
- To make URLs easier for search engines to interpret. A pretty URL tells the search engine what the page is about, but a jumble of ID numbers and variables doesn't. Search engines love semantics, so rewriting the URL helps to ensure that the page is categorized and indexed correctly.
- To temporary create redirects from one file or path to another.
Composing Mod_rewrite Rules
If you use Apache, you should ask your web host to enable mod_rewrite in the http.conf file on your account. In many cases, mod_rewrite is enabled by default.
Next, you'll need to create an .htaccess file in the root directory of your hosting account. The .htaccess file controls the root directory and all of the files underneath it, unless you override it with another .htaccess file lower down the folder tree. If you already have an .htaccess file, make a backup copy before proceeding.
Rewrite rules have two parts. The first line activates the rewrite engine directive in Apache. (The line beginning with # is a comment, which is ignored.) The second part of the mod_rewrite statement tells the rewrite engine how the URL should be transformed. So we state the condition, and then we tell Apache what to do. Here's an example of a rule that redirects the visitor from one file to another.
# Switch on Rewriting RewriteEngine on # Transform oldname.html into newname.html RewriteRule ^oldname.html$ newname.html
This rule checks for a file called
oldname.html (the condition). If it finds a match, it dynamically replaces it with
newname.html. The end user sees the content of
oldname.html, but sees
newname.html in their browser URL bar. If mod_rewrite didn't match
oldname.html, the rule would be ignored.
Looking For Patterns
Creating one rule for every URL would take a lot of time, so we need to use regular expressions. Regular expressions look for patterns and replace one chunk of a string with another. Take this example:
# Switch on Rewriting RewriteEngine on # Turn author/NAME/ into author.php?id=NAME RewriteRule ^author/([a-z]+)/?$ author.php?id=$1 [L]
Breaking it down, here's how the condition
^is the beginning of the rule
author/is the string we're looking for in the original URL
([a-z]+)is a wildcard — it looks for words that appear after
/?adds the slash at the end of the URL
$marks the end of the condition.
And here's how
author.php?id=$1 action works:
author.php?id=is the string we want to write
$1is a placeholder for the words we found in the condition, above.
At the end of the rule,
[L] is a flag that tells Apache to stop applying any more rules if this one is processed.
This is just a very basic example of what mod_rewrite can do. Instead of writing
[a-z], we could use
[xyz] to find the letters x, y or z, or
(y|n) to find y or n. You can find more complete references to regular expression syntax in the resources below.
In the example above, we added a flag,
[L], at the end of the rule. Flags are optional. They must be includes in a single set of square brackets at the end of the line, and if you want to use multiple flags, you must place commas between them.
Flags can be written out in full, or as a shortened version. While the short flags area easier to type, using long flags is a good idea if you want to make your code easy to read.
B: escapes non-alphanumeric characters; may require
chain: chains the rule to the next rule in your .htaccess file; the second is only executed if the first results in a match.
cookie: create a cookie when a rule matches; requires additional attributes.
discardpath: discards PATH_INFO in the rewritten URL.
env: sets an environment variable.
END: similar to
L, this stops any more rewrite processing.
forbidden: returns a 403 Forbidden status with the response to the rule.
gone: returns a 410 Gone status with the response to the rule.
handler: forces the rule to use a particular handler, specified as a variable.
last: stops rule processing.
next: starts the current set of rules again, using the result of the rule as the input.
nocase: switches off case sensitivity for the rule.
noescape: converts special characters into their equivalent hex codes.
nosubreq: stops rewrites from being applied to subrequests.
proxy: pushes the result of the rule to
mod_proxy, and ignores any remaining rules.
passthrough: converts a file path into a URL.
qsappend: combines the query string with a new one.
qsdiscard: discards the old query string and replaces it.
qslast: splits a query string at the last question mark in the string.
redirect: issue an HTTP redirect.
skip: skips a number of rules; must be defined as
[S=n], where n is the number of rules to skip.
type: Sets the MIME type of the rule response.
The list below contains many server variables, some of which you will be able to use with mod_rewrite. Not all variables are supported by all servers, and equally, you may be able to use server variables not listed here.
If you already know about HTTP headers, many of the server variables in this list will be familiar to you, but there are a few that are specifically provided for use with mod_rewrite.
API_VERSION: the date of the API version.
AUTH_TYPE: the authtype; returns NONE, BASIC, DIGEST or FORM.
CONN_REMOTE_ADDR: the peer IP address.
CONTEXT_DOCUMENT_ROOT: information about the directory mapping in Apache.
CONTEXT_PREFIX: information about the directory mapping in Apache.
DOCUMENT_ROOT: the absolute path for the document.
HANDLER: the handler name.
HTTP_ACCEPT: the HTTP accept header, if present in the HTTP request header.
HTTP_COOKIE: the cookie, if present in the HTTP request header.
HTTP_FORWARDED: the actual path, if present in the HTTP request header.
HTTP_HOST: the current server, if present in the HTTP request header.
HTTP_PROXY_CONNECTION: the HTTP proxy path, if present in the HTTP request header.
HTTP_REFERER: the referring page URL.
HTTP_USER_AGENT: the user agent that was used to access the page.
HTTP2: whether the connection is using HTTP2; returns ON or OFF.
HTTPS: whether the connection is using HTTPS; returns ON or OFF.
IPV6: whether the connection is using IPVS; returns ON or OFF.
IS_SUBREQ: whether the request is a subrequest; true or false.
PATH_INFO: path data that follows a filename.
QUERY_STRING: the characters in a URL, after the question mark.
REMOTE_ADDR: the user's IP.
REMOTE_HOST: the user's fully qualified domain name.
REMOTE_USER: the authenticated user's username.
REMOTE_IDENT: the authenticated user's username, returned by identd.
REQUEST_FILENAME: the local path to the file or script in the request.
REQUEST_METHOD: the request method; HEAD, PUT, GET or POST.
REQUEST_SCHEME: the scheme in the request URI.
REQUEST_URI: the request URI, as a path.
SCRIPT_FILENAME: the absolute path for the script.
SCRIPT_GROUP: the script group name.
SCRIPT_USER: the user that owns the script.
SERVER_ADDR: the server IP that the .htaccess file is stored on.
SERVER_ADMIN: the server administrator, as configured in Apache.
SERVER_NAME: the server name, as configured in Apache.
SERVER_PORT: the port number that the request was sent to.
SERVER_PROTOCOL: the protocol and revision of the request.
SERVER_SIGNATURE: the server version and host name.
SERVER_SOFTWARE: the ID string for the server.
THE_REQUEST: the request in its entirety.
TIME: the date and time in the format YYYYMMDDHHMMSS.
TIME_DAY: the current day.
TIME_HOUR: the current hour.
TIME_MIN: the current minute.
TIME_MON: the current month.
TIME_SEC: the current second.
TIME_WDAY: the current day, returned as a number (starting with 0 for Sunday).
TIME_YEAR: the current year.
- A Beginner's Guide to Mod_rewrite: this guide is from 2004, but provides a good grounding in the principles of mod_rewrite.
- URL Rewriting for Beginners: a comprehensive guide for beginners and intermediate users.
- Apache Rewrite Cheatsheet: an HTML version of a cheat sheet originally published on iLoveJackDaniels.com.
- Introduction to Advanced Regular Expressions: develop your knowledge of regular expressions with this guide.
- RegEx Pal: check your regular expression syntax before deploying it on your site.
mod_rewrite is a useful and powerful way to control Apache behavior. You can do many things with mod_rewrite that we haven't covered here. These include redirection, preventing image hotlinking, banning particular visitors from your site, and more. The best way to leverage mod_rewrite is to learn about regular expressions.
Further Reading and Resources
We have more guides, tutorials, and infogragphics related to web development:
- PHP Introduction and Resources: learn all about the most popular backend language in use on the web.
- Network Programming with Internet Sockets: learn all about networking on the internet.
- MySQL Introduction and Resources: MySQL is the is one of the most popular databases on the internet.
How to Choose the Right CMS
Not sure what CMS you want to use? Check out our article, How to Choose the Right CMS.