Getting Started with Stata Tutorial #6: How Stata Code Works

If you’ve tried coding in Stata, you may have found it strange. The syntax rules are straightforward, but different from what I’d expect.

I had experience coding in Java and R before I ever used Stata. Because of this, I expected commands to be followed by parentheses, and for this to make it easy to read the code’s structure.

Stata does not work this way.

An Example of how Stata Code Works

To see the way Stata handles a linear regression, go to the command line and type

h reg or help regress

You will see a help page pop up, with this Syntax line near the top.

(If you need a refresher on getting help in Stata, watch this video by Jeff Meyer.)

This is typical of how Stata code looks.

We start each command with a command word, in this case regress.

Then there is usually at least one required input, in this case the dependent variable. We can tell it’s required because it is not surrounded by brackets.

Adding Options

Since all other inputs are in brackets, we could run the command without any of them; type “regress” and the name of your dependent variable to run an empty model. If we wanted to run an intercept-only model for mpg, we would type:

regress mpg

After the dependent variable we enter any number of independent variables, one after the other, without commas. To use price and turn to predict mpg, this would look like:

regress mpg price turn

If we wanted to only use certain observations, we could use “if” or “in” to select them. If we wanted to only consider the first 30 cars, and only use observations with length less than 200, we could type:

regress mpg price turn in 1/30 if length < 200

And if we wanted to run a weighted regression, we could add the word “weight”.

After all the main inputs to a command, we will see “[, options]”. This tells us that once we’ve put a comma in the command, we can enter any options we need.

You can see the list of options for a command right below the Syntax header in the help page

Each command and option has some part of its name underlined. In Stata you can abbreviate words by using any portion of the name as long as the full name and as short as the underlined portion.

For example, if you wanted to use the option mse1, you could type “ms”, or “mse”, or “mse1”, but if you typed “m” or “mse12”, it wouldn’t work.

Because Stata code lacks some of the structure I would expect (more parentheses and commas), Stata commands will often feel like a jumble of words that can be difficult to parse.

Looking at the help file for a command to see what goes where can help sort out that mess.

Running Stata Commands

Let’s put it all together using the auto data. We can use the command line for this import.

Type sysuse auto,clear into the command line (if you’re curious about this command type h sysuse!).

Imagine we want to run a regression of mpg on weight and length, we only want observations with a gear ratio less than 3, and we want beta coefficients reported.

It’s not too hard with the help file to assist! Try to figure out the right command to use and enter it into the command line. Scroll down below the image when you’re ready to see what I did to get this output

output

We got this by typing

regress mpg weight length if gear_ratio <3, beta

One last important thing to remember with Stata syntax is that it’s all case-sensitive. If you capitalized any of the previous commands, you’d get an error.

Making Stata Comments

If you’re unfamiliar with how to run do-files in Stata, we have a blog post to get you started on that subject.

An important part of do-file syntax is knowing how to make comments. This can be done in three ways:

  • Use // at the beginning of a comment
  • Use * at the beginning of a line
  • Surround the comment with /* and */

When Stata encounters a comment, it skips right over it and doesn’t try to run it as code.

Comments are important for primarily two reasons

  1. Explaining what your code is doing to other people and your future self
  2. Turning off code you don’t want to run or that isn’t working, but that you don’t want to lose

Three slashes can also be used as a “comment” that allows you to combine two lines into one line of code.

Imagine I had some long command like the following:

graph bar (mean) headroom (mean) mpg in 1/60, over(trunk) stack title(`"Bar chart"') legend(off)

I might not want so much all on one line, since it could be hard to read.

To fix this, after any of the words in the command I can put a space and three slashes, then go to the next line.

The “///” and other examples of comments being used are shown below

Note that the third and fourth lines are one command – joined together by “///”.

With comment and command syntax down, you should be ready to make your own Stata code!

by James Harrod


About the Author:

James Harrod interned at The Analysis Factor in the summer of 2023. He plans to continue into a career as an actuary, and hopes to continue finding interesting ways of educating people about statistics. James is well-versed in R and Stata programming and enjoys teaching the intuition behind common statistical methods.  James is a 2023 graduate of the University of Rochester with bachelor’s degrees in Statistics and Economics.

Getting Started with Stata
Jeff introduces you to the consistent structure that Stata uses to run every type of statistical analysis.

Reader Interactions


Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.