Stata Programming

Stata is an application designed to support statistical analysis. It was developed by StataCorp, and released in 1985. Its name is derived from "statistics" and "data," and it's used primarily in data analysis and specialist research.

Despite being more than 30 years old, Stata is still in common usage. It allows every analysis to be fully documented, and it can produce graphics, simulations, and charts.

There are four different versions of the application, ranging from a student version through to a version for very large databases. Stata can be installed on Mac, Windows, and Unix computers. The most common version is Stata/IC (IC stands for "Inter Cooled").

Getting Started With Stata

Stata has its own built in data editor, which looks similar to a spreadsheet editing window. At the bottom of the application, there is a command prompt window. This window logs all of the commands that are entered during a session. Results are shown in the central window.

When a dataset is loaded, Stata shows the variables and labels within it in the Variables and Properties windows.

If you want to play around with Stata without creating your own data, Stata comes with a range of example datasets, and an additional library of manual datasets that can be downloaded from the internet. Load the datasets with the sysuse dir command, then click the use link next to the file name, or click the describe name to find out more about it.

Bringing Commands and Data Into Stata

Stata can be programmed using the command line, using the command prompt we mentioned above. Once you've used a command, you can re-use it by pressing PgUp until the command reappears in the window.

The application can also be controlled through a graphical user interface, or by importing a Do file (also called a syntax file), which is a series of pre-defined commands that are run as a script.

Seasoned Stata users usually recommend that the graphical interface is best avoided, but it provides an easy way to learn Stata's programming language. Every time you point and click on a command, the corresponding code is displayed in the Command window, so you can see what Stata is doing in the background.

The datasets you use can be imported into Stata from a CSV file, or a Stata file. In recent versions of Stata, you can directly import data from Excel using the import excel command.

Basic Stata Commands

Stata can perform different types of calculations and analysis, so it helps to have a basic working knowledge of its commands. Every command is case sensitive, although certain abbreviations can be used.

In the section above, we mentioned the import excel command. This is a simple example of a Stata command in action:

import excel using filename.xls, ///
sheet(Sheet'1') cellrange (A1:D20) clear

This command specifies the sheet and specific cells to import using the sheet and cellrange commands. If a single cell is specified as the cellrange, all of the data beyond that cell will be imported.

You will come across many other commands as you start working with Stata. Some of the basics are good to know:

  • display shows the result of a calculation
  • summarize displays a summary of the data in a file (follow it with the variables you want to examine)
  • help shows the help for a command or function (use it alone, or follow it with the name of the command you need help with)
  • if missing() is one of the many ways you can filter the data Stata returns when you query a dataset
  • graph draws a graph of the data in the dataset; it must be followed by the type of graph, the X-axis variable and the y-axis variable
  • describe displays information about a file
  • nonew stops Stata from opening a new Results window each time you run a command
  • snapshot creates an undo point for your project (remember: Stata has no built-in undo command)
  • clean returns the results of a query without a table border
  • clear empties all data from RAM once a command is run. This is important, because Stata loads all of its data into RAM unless otherwise instructed. When working with large datasets, this can cause the computer to slow down or crash
  • findit searches for Stata extensions, or plug-ins, that can enhance its functionality
  • /// tells Stata that the command continues on the next line; you can comment after the slashes if you wish, and the comments will be ignored providing they are on the same line
  • ; tells Stata the command is finished
  • exit closes the application; this is the equivalent of clicking File -> Exit with your mouse

Once you get used to working in Stata, you can save commands to a Do file by using the Editor window. You can also save a text file with the .do extension, and then run your Do file in Stata using the do command, followed by the filename. Stata uses the same commenting methods as C++ and other languages; a double slash // comments out everything at the end of a line, while /* and */ can be used at the beginning and end, to comment out an entire section.

Keeping Track of Your Work

One of Stata's biggest strengths is its ability to log queries, which makes it invaluable for researchers who need to prove how they reached certain conclusions. In order for logging to be active, there are a few steps to follow.

  1. Create a directory for your project. By default, Stata will work in C:\DATA, so creating a separate directory keeps things neat.
  2. Turn logging on. Use the log using command, following it with the filename you want to use.
  3. Always save commands in a Do file. While this isn't strictly necessary, it's helpful when you want to reproduce a result or backtrack over your commands.

Further reading

Summary

Stata is an older programming language and development environment designed for solving statistical problems. But it is still widely used by an active community. If you do serious statistical work, Stata is a good language to know. With this introductions and our recommended resources, you should be on your way.