Tips for using Stata
This document describes some tips to enhance your efficient use of Stata. We will keep adding tips to the top of our home page to encourage you to visit it each month! We will move the monthly tips to the bottom of this page when we place new tips on our home page.
Our bookshop has several publications to assist in learning Stata data management and analyses.
Options
One of the
strengths of Stata is the system of options, typed after a comma. If
you list your data you see
the value labels of the variables. If you add the nolabel option after a comma you will see
the underlying code values.
list
var1 var7var10 var3 if gender==1
To see the underlying
value codes rather than labels, add the nolabel option after a comma:
list var1 var7var10 var3 if gender==1,
nolabel
Note that you can list
the variables in any order that you define.
BTW, the comma is
a toggle. If used a second time it turns off the options. We could
have writen the above command as:
list var1 var7var10 var3, nolabel, if
gender==1
Edit and browse
You know
that you can use the Editor button to invoke a spreadsheet format
for entering or changing data. However, if you type the command edit you can limit what you see to a few
variables, in any order that you define:
edit var1 var7var10 var3 if gender==1
To see the value codes rather than labels, add the nolabel option after a comma:
edit var1 var7var10 var3 if gender==1,
nolabel
When you want to leave the Editor, Stata checks that
you want to preserve the changes you made.
Do Editor
The Dofile
Editor is very handy, invoked from a menu button or by typing doedit. You can enter several lines or
insert another file, such as one of your earlier *.do files. You can select a line in the
Do Editor and Do only that line. Or Do from that line to the end of
the set of instructions. Or select several lines and Do them. At the
end you can save the contents of the Do Editor as a *.do file.
Folders
Type adopath or sysdir to see the location of various
Stata folders for your main files, updates, STB files, personal ado
files, etc.
Statalist
There is a Stata
list server with useful advice about Stata, including new programs
to help with special problems. If you subscribe to Statalist you get
messages throughout the day. If you subscribe to Statalistdigest
you get a single file with all the messages once each day. See http://www.stata.com/support/statalist/faq
The Stata Journal and Stata
Technical Bulletin (STB)
These publications contain
supplementary information about Stata commands and the use of Stata
in research. The first issue of The Stata Journal was released at
the end of 2001 The last issue of the Stata Technical Bulletin (#61)
was in May 2001.
From within Stata you can see what STB
procedures are available and download the ones that interest you.
From the Help menu, select STB
and Userwritten Programs. After that, choose the hypertext links
(clickable blue words in the help window) for the Stata site then
then click on stb
For a complete list of all STB articles, see http://www.stata.com/info/products/stb/stbvols.html
Useful links
The Stata resources web page is worth a look at. It has links to free downloadable tutorial etc.
http://www.stata.com/links/resources1.html
UCLA graphics page using Stata may also be of interest:
http://www.ats.ucla.edu/stat/stata/Library/GraphExamples/default.htm
Setting up docked windows (Great for learning to set up Stata 9 windows)
http://www.ats.ucla.edu/stat/stata/faq/stata9gui/dockfloatpin.html
These are tutorials for learning Stata
http://data.princeton.edu/stata/
http://dss.princeton.edu/online_help/stats_packages/stata/
Working with Hilda data  PanelWhiz
http://www.panelwhiz.eu
Stata Programming
http://www.stata.com/meeting/11uk/baum.pdf
Tips from our home page
The following tips were initially presented on our home page. To see the current monthly tip click here
Stata Tips
List of past tips
contract  the table  May 2014 
The Stata cond() functions  April 2014 
Combinations of variables  March 2014 
Determining the value of PI by simulation in Stata  February 2014 
Using Mata to produce an Excel table  January 2014 
Do Editor  Additional Toolbars  December 2013 
Group renaming of variables  November 2013 
Put your own font in Stata graphs  October 2013 
Getting to the function quickly  September 2013 
Value Labels  using labels in an expression  August 2013 
The Label command  save option  July 2013 
Breaking up dates  June 2013 
Stata's separate command  May 2013 
Getting Stata to automatically open a web page  April 2013 
Adding to Stata and User written programs  March 2013 
Stata Editor  selecting columns  February 2013 
adoupdate  January 2013 
compress  December 2012 
Reshaping long and attaching variable labels afterwards  November 2012 
Graph marker labels  October 2012 
Stata on Youtube  September 2012 
Gathering prefixes for the reshape command  August 2012 
Fuzzy merge  base on time  an alternative  July 2012 
Fuzzy merge  base on time  June 2012 
Getting the data right for Stata's Excel command  May 2012 
Using Stata's file command  April 2012 
Cleaning data  consistent naming  using soundex()  March 2012 
Cleaning data  consistent naming  manually  February 2012 
Working with Dates 3  January 2012 
Working with Dates 2  December 2011 
Working with Dates 1  November 2011 
Doing things by levels of a variable  October 2011 
Speeding up Stata  the if statement  September 2011 
Stata 12's new Excel command  August 2011 
Stata 12 PDF files of logs and graphs  July 2011 
Using value labels for bar graph labels  June 2011 
Automatically sending emails from Stata  Windows platform  May 2011 
Generating a dataset  April 2011 
Printing log files  March 2011 
Working with dates  February 2011 
Producing Multiple graphs  January 2011 
Stata 11 PDF  December 2010 
Regular Expressions  November 2010 
Stata's profile.do command  October 2010 
Use System variables' _n and _N  September 2010 
Producing an edited log file  August 2010 
The input command  July 2010 
Splitting the Do Editor  June 2010 
Factor variables and lincom to produce a table  May 2010 
Stata graphs  April 2010 
Tabdisp  March 2010 
Tables to spreadsheet  February 2010 
Tables to spreadsheet  January 2010 
Point Estimates for a Regression  December 2009 
Doing thing quietly in Stata  November 2009 
Graphing functions  October 2009 
Stata 11  Variable manager  September 2009 
Getting a Subset of a large dataset into Stata  August 2009 
Capture  July 2009 
Transparent Graphs  June 2009 
Getting Stata's Graph editor commands into Stata graphs  May 2009 
Weaving Stata results into a Word Report  April 2009 
Stopping Stata during the running of a do file  March 2009 
Putting Greek symbols in graphs  February 2009 
Doing things by levels of a variable  January 2009 
Automation of Tables in Stata  December 2008 
Memory usage in Stata  November 2008 
Stata Comment  October 2008 
Stata user written graphs  September 2008 
Stata tables  August 2008 
Sending Command(s) to the Stata Do Editor from the Stata Review Window  July 2008 
Creating a Stata dataset from multiple Excel worksheets  June 2008 
catplot  May 2008 
Stata Users' Group Meeting Proceedings  April 2008 
Programming Stata  learning by examples  March 2008 
Mata  learning by examples  February 2008 
Stata's display command  January 2008 
Creating a binary variable from a continuous variable  December 2007 
New subcommand for listing user written commands  November 2007 
Undocumented commands  October 2007 
User written program  examples  September 2007 
Settings for Stata  August 2007 
Copy as picture  Copying from the results windows to Word and Excel  July 2007 
Estout  Stata Regression Tables  June 2007 
Adoupdate  May 2007 
Nested Do file  April 2007 
Personal help file  March 2007 
spmap  Visualization of Spatial Data  February 2007 
stcmd  Using Stat/Transfer within Stata  January 2007 
encode  December 2006 
contract  the table
Stata's contract command is a very useful command that every Stata user should be aware of. The contract command reduces the Stata dataset to a number of variables; that you specify and their frequencies. This may be what you wish to achieve. However, if you only wish to see these in table form then using the Stata editor will achieve this with just a few additional commands.
Example 1
sysuse auto, clear bysort for mpg : gen index=1 if _n==1 bysort for mpg : gen freq=_N edit for mpg freq if index==1 //Or reducing this further to: sysuse auto, clear bysort for mpg : gen freq=_N if _n==1 edit for mpg freq if !missing(freq)
The result:
Foreign mpg freq Domestic 12 2 Domestic 14 5 Domestic 15 2 Domestic 16 4 Domestic 17 2 Domestic 18 7 Domestic 19 8 Domestic 20 3 Domestic 21 3 Domestic 22 5 Domestic 24 3 Domestic 25 1 Domestic 26 2 Domestic 28 2 Domestic 29 1 Domestic 30 1 Domestic 34 1 Foreign 14 1 Foreign 17 2 Foreign 18 2 Foreign 21 2 Foreign 23 3 Foreign 24 1 Foreign 25 4 Foreign 26 1 Foreign 28 1 Foreign 30 1 Foreign 31 1 Foreign 35 2 Foreign 41 1
The equivalent contract command:
contract mpg for
A twoway table can even by produced in the editor.
Example 2
sysuse auto, clear set more off bysort mpg for: gen freq=_N separate freq, by(for) bysort mpg (freq0): replace freq0=freq0[1] bysort mpg (freq1): replace freq1=freq1[1] bysort mpg : gen index1=1 if _n==1 rename freq0 domestic_cars rename freq1 foreign_cars edit mpg domestic_cars foreign_cars if index1==1
The result:
mpg domestic foreign 12 2 14 5 1 15 2 16 4 17 2 2 18 7 2 19 8 20 3 21 3 2 22 5 23 3 24 3 1 25 1 4 26 2 1 28 2 1 29 1 30 1 1 31 1 34 1 35 2 41 1
The equivalent table command:
tabulate mpg for
For further help:
help contract
help bysort
help edit
The Stata cond() function
The condition function allow you to decide on the contents of a variable based on a criteria that you specify.
From the online help ( help cond() ):
cond(x,a,b,c) or cond(x,a,b)
Description: returns a if x is true and nonmissing, b if x is false, and c if x is missing. returns a if c is not specified and x evaluates to missing.
The following are some examples and comments.
Example 1
You wish to fill in a variable with 5 and where the variable a is greater than 5 then the value of a
//generate data clear set obs 15 generate a=_n generate b=cond(a>5,a,5) list
Example 2
There are of course many other ways of doing this:
//(1) generate c=5 replace c=a if a>5 list //(2) generate c1=(a>5)*a replace c1=5 if a<6 list //(3) generate c2=5 replace c2=a if inrange(a,5,.) list //(4) // This options while doing this in one line is not as easy to understand as // that using the cond() function gen c3=(a>5)*a + (a<=5)*5 list //(5) // This can also be done with the max() function eg. generate c4=max(a,5) list
Example 3
However where the variable a contains missing values the results from cond() are different from max():
// generate data clear set obs 11 generate a=_n if _n<10 generate b=cond(a>5,a,5) generate c=max(a,5) if !missing(a) list exit
Example 4
Using cond() is a spell checker
//generating the data clear input str10 a "thsi " "that " "the" "tree" "these" end list gen b=cond(trim(a)=="thsi","this",trim(a)) //The alternative is to use: gen c=a replace c="this" if trim(a)=="thsi" list exit
Example 5
Using cond() to categorise a variable into negative, zero and postive.
//generate the data clear input a 2 1 0 1 2 end list generate b=cond(a>=0, a!=0, 1) list //or using nested cond() functions generate c= /// cond(a>0, 1, /// greater than zero cond(a==0, 0, /// equal to zero cond(a<0, 1, . /// less than zero ))) list
Example 6
An interesting way of dealing with missing values (as seen on Statalist)
generate avprice = (total  cond(missing(price), 0, price)) / cond(missing(price), n, n  1)
Example 7
Another example seen on statalist. This time for dropping duplicates
quiet bysort hhid hhsize:gen dupobs=cond(_N==1,0,_n) drop dupobs // Probably a better way of dealing with this is: quiet bysort hhid hhsize:keep if _n==1
Example 8
From the Stata press book: "An Introduction to Stata programming by Christopher F. Baum"
Section 3.3.2
generate netmarr2x=cond(marr/divr>2.0, 1, 2) // The above is OK but could also be replaced with: gen netmarr2x=(marr/divr<2.0)+1
Example 9
From the statalist
clear set obs 10 generate x=_n if _n<8 list generate z=cond(x>5,1,0,.) list // the above produces the correct values; as a missing value is always greater than // any number in the variable. However // you may wish to have missing where a value in a is missing. // http://www.stata.com/statalist/archive/200802/msg01204.html // This puts missing values in where they occur in the x variable generate z1 = cond(missing(x), ., x > 5) list // An alternative to the above is to use the 2nd syntax of the cond() function. Note the 3rd term // in the fucntion does nothing so instead of "." any number would have been OK. generate z2=cond(x,x>5,.,.) list
Example 10
The stata press book "Data Anlaysis using Stata by Ulrich Kohler and Frauke Kreuter" p460 Show how the cond() function can be used provide a default title for a graph
local title = cond( `"`title'"' == `""', `"`varlist' by `by'", `"`title'"') graph twoway connected `yvars' xhelp, title(`title') .….
For further help:
help cond()
Stata Journal Vol 5 No 3
Combinations of variables
From time to time I'm asked how to write a program
that used all combination of a subset of variables in an
estimation command.
There are many ways to approach this. This is one way.
This approach:
(1) Produces all combination in numbers
(2) Using
value labels to attached the variabl name
(3) Decode to
produce a variable in the dataset
I have used this approach because of the limited storage in
macros.
program combin syntax , NUMTot(integer) NUMPick(integer) tempvar cat bbb q con con1 tempfile temf temf1 quiet { save temf, replace //save the dataset currently in Stata labels1 `numtot' //program clear numlist "1/`numtot'" set obs `numtot' foreach i of numlist `=r(numlist)' { egen a`i'=fill( 1/`numtot' ) } //all permutations of the data fillin a1a`numtot' egen `cat'=concat(a*), p(" ") gen `bbb'="" gen `q'=. // all combinations of the data forvalues i=1/`=_N' { local b=`cat'[`i'] replace `bbb'="`: list sort b'" in `i' local q1 : list uniq b replace `q'=`:list sizeof q1' in `i' } //keep only the number of variables (items) required local z=1 foreach i of varlist * { if `z'>`numpick' { drop `i' } local ++z } egen `con'=concat(*) , punct(" ") generate `con1'="." forvalues i=1/`=_N'{ local a "`=`con'[`i']'" local aa :list dups a replace `con1'= "`aa'" in `i' local aaa :list sort a replace `con'= "`aaa'" in `i' } drop if !missing(`con1') duplicates drop `con', force labels2 //program save temf1, replace merge 1:1 _n using temf, nogen } //quiet end
//store labels in do file
program labels1 args max_vars describe, replace keep name keep in 1/`max_vars' encode name, gen(name1) label save name1 using filename , replace end
//attaching value labels and decoding etc.
program labels2 do filename label value a* name1 foreach i of varlist a* { decode `i', gen(z`i') } egen levels=concat(z*), punct(" ") keep levels end
input
numtot(#): this is the number of variables from which the "numpick" are taken out. The order of
the variables is important as mumtot start at the first variable in the current order
numpick(#): the number of variables out of numtot() you select
sysuse auto, clear order make, last combin, numtot(6) numpick(2) forvalues i = 1/6 { di "`i'" regress turn `=levels[`i']' estimates store a`i' } estimates table a* , stats(r2) exit
**The resulting variable that contains the combinations:
++  levels   1.  headroom mpg  2.  headroom price  3.  headroom rep78  4.  headroom trunk  5.  headroom weight  6.  mpg price  7.  mpg rep78  8.  mpg trunk  9.  mpg weight  10.  price rep78  11.  price trunk  12.  price weight  13.  rep78 trunk  14.  rep78 weight  15.  trunk weight  16.   
For further help:
help program
help macro
help egen
Determining the value of PI by
simulation in Stata
Time for some fun with Stata! If you randomly generate an x and y value between 0 and 1 and have these as scatter points, some x,y points will fall into a unit radius area and other outside of this. The ratio of the number of points in the areas times 4 (4 quadrants) will give Pi. The more points the more accurate the result.
Solving for pi yields: pi = 4 * (scatter points in quadrant area)/(scatter points in square area ie. 1*1 square)
the code of the above graph
clear set obs 1000 gen x=runiform() gen y=runiform() gen height=sqrt(1  (x)^2) twoway (function y = sqrt(1  (x)^2), /// range(0 1) lwidth(thick) lcolor(red)) /// (area height x , sort) /// (scatter x y if y>height, mcolor(blue)) /// (scatter x y if y<=height, mcolor(green)) /// ,aspect(1) legend(off)
The program that simulate calls:
clear all set more off program a, rclass //1 args n //2 set obs `n' local z=0 //3 local zz=0 //3 forvalues i=1/100 { //4 local x=runiform() local y=runiform() local dis=sqrt(`x'*`x'+`y'*`y') if `dis'<=1 local ++z //5 local ++zz } //end loop return scalar stuff=4*(`z')/`zz' //6 endThe simulate command that calls the above program "a"
simulate mean=r(stuff) , reps(10000): a 100 //7 summarize mean //the output should should be 3.415...Notes:
1  Program called by the simulate command. The program we called "a". Note class r ie returns r values
2  The args command. When we call the "a" program we also pass to the program the number of observations required
The terms passed is mapped on the first term of args eg. n
3  Defining local macros.
4  forvalues loop command.
5  if statement that increments local macro z.
6  statement that defines a return value ie stuff.
For further help:
help return
help simulate
help macro
help runiform()
help summarize
Using Mata to produce an Excel table
(January 2014)If you are producing table and wish to input these into an Excel spreadsheet you could use the following.
The number of commands could have been reduced by using return results from preceding commands or you can even turn this into a program.
sysuse auto, clear cd c:/ // 1 capture erase Results.xls // 2 regress price weight length if foreign // 3 return list // 4 matrix list r(table) // 5 matrix a1=r(table) // 6 matrix list a1 // 7 regress price weight length if !foreign matrix a0=r(table) matrix list a0 mata // 8 b=xl() b.create_book("Results","Sheet1") // 9 b.put_string(1,1,"Variable") // 10 b.put_string(1,2,"Coefficientforeign=1") b.put_string(1,3,"Coefficientforeign=0") b.put_string(2,1,"weight") b.put_string(3,1,"length") b.put_number(2,2,st_matrix("a1")[1,1]) // 11 b.put_number(3,2,st_matrix("a1")[1,2]) b.put_number(2,3,st_matrix("a0")[1,1]) b.put_number(3,3,st_matrix("a0")[1,2]) end // 12 display "{browse results.xls : results}" // 13
Notes:
1  Change directory; this is where the excel file is to be saved to.
2  Erasing any previous excel file saved with the same name in the folder.
3  The regress command. The results of which we wish to table.
4  Not required to run the above but lets us see what return results Stata produces for this command.
5  Not required to run the above but lets us see the results that Stata saves in the matrix.
6  Save the Stata matrix called r(table) into a Stata matrix we will call a1.
7  Not required to run the above but lets us see the contents of the matrix we have just saved.
8  Starting Mata.
9  Mata command to name the file in which we wish to save the results. We call this file "Results".
10  Mata command to put a name into a cell.
11  Mata command to put a result; obtained from the Stata matrix "a1" into a cell. The cell being row 2 and column 2 [2,2].
12  End Mata.
13  Hyperlink the name of the file.
For further help:
help return
help matrix
help mata
help help m5_intro
help help mf_xl
Do Editor  Additional Toolbars
(December 2013)The Stata 13 do file editor allows you to add your own tool bars. Easy to do:
Right click on the tool bar section of the do file editor
Customize
Click on New button
Name your new tool bar
Then click on the Commands tab and select a command
Drag the command to your new tool bar and your ready to run.
Example of additional Tool bars on Do Editor
For further help:
help rename
help rename group
Group renaming of variables
(November 2013)The rename command will be familiar to most Stata users eg.
rename var1 var2
which rename variable "var1" to "var2"
however, the group rename may have been over looked by some users. Detailed examples and syntax can be found at:
help rename group
This extends the functionality of the rename command to include renaming groups of variable names.
Example 1
Say that you have imported your data set and all the variable name come capitalised; you prefer lower case.
The following can be done:
//using foreach to loop over the variables to make the change foreach i of varlist * { rename `i' lower(`i') } But the rename command makes this even easier eg. rename * , lower
Some further examples
First generating some variables names
clear forvalues i=1/15 { gen var`=`i'^2' = missing() } save data, replaceExample 2
Add suffix to the variable names var1 to var100 (based on the current variable name order)
use data, clear rename (var1var100) =AExample 3
example 2 but with the variable names in no particular order
use data, clear order var225 var169 rename (var(#) var(##) var100) (var(#)A var(##)A var100A)Example 4
Adding the string "four" to the variable name when ever the name contains the number "4"
use data, clear rename (var*4*) (fourvar*4*)Example 5
Swapping the prefix and suffix where they both contain a character.
use data, clear rename * *A rename ?ar#? ?[3]ar#[2]?[1]Example 6
Swapping year to a suffix.
clear input currentliabilities2000total currentliabilities2001total currentliabilities2002total 1 1 1 end rename currentliabilities#total currentliabilitiestotal#
For further help:
help rename
help rename group
Put your own font in Stata graphs
(October 2013)From the Statalist, Allan Reese 1/11/2013
Installation of new fonts may vary between operating systems. These instructions are for Windows XP
A font can be obtained from: (http://www.fonts4free.net/.html).
The fount we are looking at today is from: (http://www.fonts4free.net/femaleandmalesymfont.html).
Copy the font file to the disk file (temp directory suggested)
Extract file from zip file to a suitable directory
Go to the Control panel and click on font
The File>install new Font ..
The new font "Female and male symbols" has now been installed
To see a graph with the new font
//make up some data set seed 1 clear set obs 50 generate a=runiform()*1000 generate b=runiform()*10 generate s=runiform()<.5 label define sexlab 1 `"{fontface "Female and male symbols":M }"' 0 /// `"{fontface "Female and male symbols":F }"' label values s sexlab scatter a b , ms(i) mlab(s) mlabpos(c) /// mlabsize(*2) title( "Example" `"{fontface "Female and male symbols":M F}"')
For further help:
help smcl
Getting to the function quickly
(September 2013)Stata has lots of functions. So many that at times you may need to look up the syntax. However, accessing the online help for this is a bit tedious eg:
help functions
or the pulldown menu:
data>other utilities>calculator then pressing create and then functions
To speed up access a keyboard function key can be assigned to bring up this page.
To do this is:
global F4 "help functions; This global macro statement is put into your profile.do
************profile.do************** global F4 "help functions;" //other profile settings ***********************************For further help:
help profile
or see
Previous tip on profile.do
Value Labels  using labels in an
expression(August 2013)
The actual value can be used in an expression
Example
sysuse auto, clear list if foreign==1The above is fine if you know what the number means. This could be looked up with:
label list origin
A safer way of specifying this is:
sysuse auto, clear list if foreign=="Domestic":originFor further help:
help label
Also see Stata 13 Users Guide 13.10
or click here
The Label command  save option
(July 2013)The save options allow the label definitions to be save to an do file. This can sebsequently be imported into Stata, manipulated and output as a do file. The do file can then be executed; labeling a variable in the data set.
In this examle we are using the label command with the save option.
//setting up the data for the example clear set more off input a b 1 1 2 2 3 3 end label define a 1 "take bus to work" 2 walk 3 bike label define b 1 "drive alone" 2 "drive with 1 passenger" 3 "some times drive alone" label list label values a a label values b b list //finished setting up the data set label listeplace // (1) type c:/label1.do // (2) filefilter c:/label1.do c:/label2.do , from("`") to("") replace // (3) filefilter c:/label2.do c:/label3.do , from("'") to("") replace preserve // (4) infile str100 (a b c d e f) using c:/label3.do, clear // (5) list replace d="9"+d if c=="b" // (6) replace c="a" // (6) replace f="" in 1 // (6) replace e=char(34)+e+char(34) // (6) egen t1=concat(af), punct(" ") // (7) replace t1=subinword(t1,"modify", ",modify",1) // (6) keep t1 // (8) outfile using "c:/try.do", noquote replace wide // (9) type c:/try.do restore // (10) decode a, gen(a1) // (11) decode b, gen(b1) stack a1 b1, into(c1) // (12) type c:/try.do do c:/try // (13) label list encode c1, gen(d1) label(a) // (14) keep d1 list list, nolabNotes on the above:
(1) Using the label command with the save option.
(2) Using the type command to view what was save in the do file.
(3) Using filefilter to remove the single quote character from the do file.
(4) summarize writes a copy of the current data in Stata memory to the hard drive.
(5) Inputing the contents of the do file into Stata.
(6) Modifying the label definition as required.
(7) Using one of egen's many handy commands to concatenate the strings.
(8) Keep only the t1 variable.
(9) Output .
(10) Inputing the data previously temporarily stored on the hard drive into Stata.
(11) decode generates a new variable (a1) that contains the string values labels of the variable (a).
(12) combining the 2 variable with the stack command.
(13) execute the do file that now contains the label definitions so that it is part of the data set.
(14) encode the string variable using the label definition previously loaded.
For further help:
help label
help type
help filefilter
help filefilter
help egen
help infile
help outfile
help stack
help decode
help encode
Breaking up dates
(June 2013)Sometimes a long time span may need to be broken up into years or months etc., because say a particular year in the data is of interest or the dataset is otherwise too big (wide) for Stata.
In this examle we break up a time span into years.
Use the following:
clear set more off input /// //(1) str30 date_in str30 date_out ward "7/22/2009 22:59" "7/24/2011 10:12" 1 "7/22/2011 12:05" "8/25/2011 21:07" 2 "8/27/2011 10:46" "8/28/2017 19:45" 1 "8/28/2011 15:34" "8/28/2011 16:43" 2 "8/28/2011 23:24" "8/29/2011 13:43" 1 "8/27/2011 14:32" "8/28/2011 15:15" 2 "8/28/2011 09:43" "8/28/2011 17:49" 1 "8/28/2011 01:33" "8/28/2011 02:32" 2 "8/28/2011 04:43" "8/29/2011 05:53" 1 "8/31/2011 07:30" "8/31/2011 08:11" 2 end l generate double date_in2a=date(date_in,"MDY hm") // (2) format date_in2 %td // (3) summarize date_in2a // (4) local mint=r(min) // (5) generate double date_out2a=date(date_out,"MDY hm") format date_out2 %td summarize date_out2a local maxt=r(max) generate flag=0 //(6) forvalues i= `=year(`mint')'/`=year(`maxt')' { //(7) replace flag=inrange(`i', year(date_in2a),year(date_out2a)) //(8) //start date generate y`i'_s1=date_in2a if flag & year(date_in2a)==`i' // (9) replace y`i'_s1=td("1Jan`i'") if flag & missing(y`i'_s1) // (10) //end data generate y`i'_f1=date_out2a if flag & year(date_out2a)==`i' replace y`i'_f1=td("31Dec`i'") if flag & missing(y`i'_f1) } format y* %td list exitNotes on the above:
(1) Using the input command to produce a data set.
(2) Converting the date in string format to elapsed time (a number) with the date() function eg. the number of days from 1 Jan 1960.
(3) Formating the just created elasped date to make it easier to read for checking if the conversion went correctly.
(4) Using the summarize command to get the earliest date. This is stored in the return scalar: r(min) All the return value from the summarize command can be seen by typing return list.
(5) Storing the min value in a local macro.
(6) Generating a flag variable to indicate if a particular year is within the date span of the observaion.
(7) Looping through all the years in the data.
(8) Making flag equal to 1 if the year is in range.
(9) If the date span for the observation contains the year in the looping index and this is the starting date then put the starting date in the variable.
(10) If the starting date is earlier than the start of the year in the looping index put in the 1 Jan for that year.
For further help:
help input
help summaraize
help generate
Stata's separate command
(May 2013)The separate command is useful for splitting up a variable into variables based on the levels of another variable. These new variables can then be used to produce a graph.
Example. To produce the following graph:
Use the following:
sysuse auto, clear separate weight , by(rep78) twoway scatter weight1weight5 mpg , name(a3) ytitle(Weight (lbs.))
For further help:
help separate
Getting Stata to automatically open
a web page(April 2013)
Sometimes there are Web pages that you would like
to access every now and then. For example Stata's forthcoming
web page (you can sign up for email updates but writing a
program is more fun) or Stata blogs etc.
Below is a way that this might be done. You could turn
this into a ado file or just put it into your profile.do
The time interva1 (macro t), in days, is specified at
comment 8 below.
clear mata local a : sysdir PERSONAL //1 cd `a' mata: //2 if(!fileexists("mymatrix")){ //3 v=st_global("c(current_date)") //4 X=date(v,"DMY") //5 fh = fopen("mymatrix.myfile", "rw") //6 fputmatrix(fh, X) fclose(fh) } fh = fopen("mymatrix.myfile", "rw") //7 X = fgetmatrix(fh) fclose(fh) st_local("date",strofreal(X)) end //end mata local t=1 //interval //8 if `date'+`t'< date(c(current_date),"DMY") { //9 shell "C:\Program Files\Mozilla Firefox\firefox.exe" /// "http://www.statapress.com/forthcoming/" //10 } else { display "No required to check web page" //11 } exit
Note on the above:
(1) Extended macro saving the path of Stata's PERSONAL location in the local macro a. PERSONAL is on Stata's adopath
(2) Using a Stata Mata matrix to store the date that a Web page was last accessed. You could store the information in other forms but a Mata matrix seemed a handy way of doing this
(3) The first time that this is run there is no Mata file; so just checking if one needs to be created. If not, jump into the loop
(4) Save the current date in a scalar matrix called v
(5) Convert current date to Stata elapsed time using the date() function
(6) Saving the Mata matrix to a file
(7) Reading the saved Mata file
(8) Save the interval (days) that you wish to display the web page. In this case every day
(9) If the duration that the web page was last accessed is greater then the specified interval and it has not been accessed today then jump into loop
(10) The web page that you wish to see
(11) Comment indicating that program is working but is not required to access Web page
For further help:
help clear
help adopath
help extended_fcn
help mata
help comments
help macro
help date functions
Adding to Stata and User written
programs(March 2013)
Sometimes official Stata's commands or downloaded user
written commands may not supply all the information that you
require or the information may not be in the form that you
require. One way of addressing this is to write your own
command that includes the additional feature.
For example the official Stata command levelsof
only return the levels of variable specified. Often the number
of levels is also required. This is easily included in your own
command, as shown as follows:
program s_levelsof , rclass //(1) levelsof `0' //(2) local b: word count `r(levels)' //(3) return scalar a=`b' //(4) return local levels=r(levels) end //(5)
The above program is run as follows:
sysuse auto, clear //(6) s_levelsof mpg,local(z) //(7) display "`z'" return listNote on the above:
(1) program command with a new name for the command. Never over write existing commands; create a new name. A common prefix will allow you to easily identify the new command. The rclass options is used where the program is required to return some values.
(2) The levelsof command with `0'. The macro `0' contains what was passed to the new command eg. mpg,local(z)
(3) Using the macro extended function: word count , to count the nunber of levels
(4) The return value as a scalar
(5) end of the program
(6) Loading the Stata data set
(7) Calling the new program: s_levelsof
For further help:
help levelsof
help program
help extended_fcn
Stata Editor  selecting columns
(February 2013)Stata 12's do editor has many great feature including selecting columns. The following is an example were this can be used.
Say you wish to clean up the following do file by putting all the /// into a column. You adjust each triple forward slash individually but if there are many this would take time and be boring. Using Stata's column select this is easier:
( /// is used for concatenating the next line )
Highlight the column of ///. Adjust the end /// so that it is the closest to the rhs. At the top lhs of the column place place the cursor. Then on the keyboard simultaneously press the Ctrl and Alt keys, then with the mouse select the column require.
At the top of the column drag the column to the left hand side. Then using the do editor pull down menu: Edit>Find>Replace, tick the regular expression box and type in the show regular expression and execute.
Then select the column of ///. (on the keyboard simultaneously press the Ctrl and Alt keys, then with the mouse select the column require ) At the top of the column drag right to the required position.
For further help:
findit regular expressions
help comments
adoupdate
(January 2013)The Stata command adoupdate is used to update user written packages obtained from ssc. However, it may be the case that not all your user written packages have been obtained from ssc, and hence will not be updated with adoupdate.
For example:
You have just read an article in the Stata Journal titled " Error–correction–based cointegration tests for panel data" it sounds interesting so you download load the program using:
findit xtwest
The following comes up:
Search of official help files, FAQs, Examples, SJs, and STBs SJ82 st0146 . . Errorcorrectionbased cointegration tests for panel data (help xtwest if installed) . . . . . . . D. Persyn and J. Westerlund Q2/08 SJ 8(2):232241 implements the four errorcorrectionbased panel cointegration tests developed by Westerlund
You click on the hyperlink and it downloads (confirmed by ado dir or using the pull down menu: Help>SJ and user written programs and then previously installed packages>list.
To get the version of this program you type which on the Stata command line.
. which xtwest c:\ado\plus\x\xtwest.ado *! xtwest 1.1 1Apr2008 *! Damiaan Persyn, LICOS centre for Development and Economic Performance www.econ.kuleuven.be/licos *! Copyright Damiaan Persyn 20072008.
adoupdate does not update this because it was not loaded from ssc eg.
. adoupdate xtwest, update (note: adoupdate updates userwritten files; type update to check for updates to official Stata) (no packages match "xtwest")
Now getting the package from ssc
. ssc install xtwest, replace checking xtwest consistency and verifying not already installed... the following files will be replaced: c:\ado\plus\x\xtwest.ado c:\ado\plus\x\xtwest.hlp installing into c:\ado\plus\... installation complete. . which xtwest c:\ado\plus\x\xtwest.ado *! xtwest 1.5 1Jul2010 *! Damiaan Persyn, LICOS centre for Development and Economic Performance www.econ.kuleuven.be/licos *! Copyright Damiaan Persyn 20072008.
Now the version number is 1.5 (previously 1.1). The later version includes some bug fixes. Therefore care should be taken to see that the version of a user package is the one that you require.
Other cases of adoupdate not updating user packages is where the author of a user written package has this on their personal web page; not ssc. eg.
Spost
http://www.indiana.edu/~jslsoc/web_spost/sp_install.htm
some programs written by Eric Booth at:
https://sites.google.com/site/ericabooth/Home/software
and other..
In these cases you need to go to the authors site and down load the package as per the authors instructions.
Compress
(December 2012)Stata's compress command can be used to achieve two thing:
1) More efficiently store your data in Stata memory
2) Reduce the variable width so that data for this variable can be seen in an efficient format
Often handy after inporting data from a spreadsheet where foot notes, comments in the variable column can create a string length that is far in excess to that required.
An example
clear input str200 country "Note: the details for this are.." 1 2 end edit notes : TS country[1] drop in 1 sleep 3000 compress notes
Reshaping long and attaching
variable labels afterwards(November 2012)
For small numbers of variables, reshaping long and
attaching variable labels afterwards, can be done by hand but
with more than says 10 variable stubs this becomes boring,
error prone and time consuming; so it is advisable to automate
this.
The following is an example of how this can be done. The
code
//creating the data clear input y id x2007 x2008 x2009 z2007 z2008 z2009 18 1 12 16 18 20 21 19 10 2 11 17 17 33 32 19 12 3 10 10 22 19 17 18 end l // Labeling variables foreach v of varlist x* z* { label variable `v' "`=substr("`v'",1,1)' factor(`=substr("`v'",length("`v'")3,4)')" } describe save data, replace //getting variable names and variable labels describe, replace clear //1 generate var=regexs(1) if regexm(name,"([azAZ]+)([09][09][09][09])") //2 levelsof var, local(stub) clean //3 drop if missing(var) //4 duplicates drop var, force //5 keep var varlab save varinfo, replace //the reshape use data, clear reshape long "`stub'", i(id) j(Year) //attached the data to the dataset merge 1:1 _n using varinfo, nogen //6 count if !missing(var) foreach i of varlist * { //loops variables forvalues i1=1/`=r(N)' { //loops observations if "`i'"=="`=var[`i1']'" { //7 label var `i' "`=varlab[`i1']'" //8 } } } drop varlab var describe list exitGoing through the above code:
"//getting variable names and variable labels"
The lines of code under this title get the variable stubs and their associated labels.
1 Using the replace and clear options of the describe command the variable names and labels replace the existing data in Stata.
2 generate a new variable that contains the variable stubs
3 get a list of stubs; these are saved in a local macro
4 drop observations where var is missing
5 As only one of each stub is required the duplicates command is used to remove duplicates
6 merge the variable name and label with the data set
7 Check to see if variable name in the dataset is the same as merged data variable name
8 If the same the variable label "varlab" is given the label for this variable
Graph marker labels
(October 2012)Graph marker labels will frequently overlap on graphs; making them difficult to read. One solution to prevent/minimise this is to write an algorithm that reduces this to a minium. However, this is a significant amount of work. An alternative is to separate them based on a qreg and then angle the labels based on the distance from the adjacent markers.
The following is a graph with markers as Stata presents them.
The following is a graph where we start with a specific observation, in this case observation one of the sort order that Stata supplies, then determine the closest adjacent observation, then this observation is used to determine the next closest observations etc.
Then a qreg is used at split the data; the quantile options filled in with a value determined by the user.
From there we select the number options: anglular rotations of the label, the starting angle and the angle of rotation.
These are aculmulated and finally the graph drawn.
The code
clear all sysuse auto, clear generate order=. generate index= _n //scale summarize mpg local a1_max=r(max) local a1_min=r(min) local a1_diff=`a1_max'`a1_min' summarize weight local a2_max=r(max) local a2_min=r(min) local a2_diff=`a2_max'`a2_min' generate dis_hor=. generate dis_vert=. generate hyp=. generate kk=. local z=1 forvalues i=1/74 { if `z'==1 { replace order=1 in 1 local z=0 } else { gsort order replace dis_hor=(mpg[1]mpg)/`a1_diff' replace dis_vert=(weight[1]weight)/`a2_diff' replace hyp=sqrt(dis_hor^2 +dis_vert^2) sort hyp replace kk=sum(missing(order)) replace kk=. if !missing(order) replace order=`i' if kk==1 } } qreg mpg weight, quantile(50) //quantile can be changed predict a generate up_down= a < mpg tab up_down bysort up_down order: gen order1=_n //values to change local angle =10 local s_ang=55 local no_ang= 3 bysort up_down (order1):gen aa=mod(_n,`=`no_ang'+1') //normal marker labels scatter mpg weight , mlab(make) mlabangle(45) yline(22) xline(2930) name(kk1) local z=0 forvalues i=0/`=`no_ang'' { if `z'==0 { local aa2 =`"(scatter mpg weight if aa==`i' & up_down==0 , mlabcolor(blue) "' + /// `" mlab(make) mlabpos(3) mlabangle(`=`s_ang'`angle'*`i'') ) "' + /// `"(scatter mpg weight if aa==`i' & up_down==1 , mlab(make) mlabcolor(red) "' + /// `" mlabpos(3) mlabangle(`=`s_ang'+(`angle'*`i')') ) "' local z=1 } else { local aa1= `" (scatter mpg weight if aa==`i' & up_down==0, mlabcolor(blue) "' + /// `" mlab(make) mlabpos(3) mlabangle(`=`s_ang'`angle'*`i'') ) "' + /// `"(scatter mpg weight if aa==`i' & up_down==1, mlab(make) mlabcolor(red) "' + /// `"mlabpos(3) mlabangle(`=`s_ang'+`angle'*`i'') )"' } local aa2 `aa2' `aa1' } twoway /// `aa2' , /// yline(22) xline(2930) name(kk2) legend(off) exit
Stata on Youtube
(September 2012)Stata has just announced some youtube video's:
http://www.youtube.com/user/statacorp
See Stata's YouTube channel with a basic tour of Stata for new users and 23 short tutorials that describe how to perform basic statistical analyses and create simple graphs in Stata. There's also a bonus video that shows how to use Stata's SEM builder to quickly and easily build structural equation models.
The videos are best viewed in 1080p HD. To change the resolution of a video, click on the icon labeled Change quality and select 1080p HD.
Gathering prefixes for the reshape
command(August 2012)
Stata's reshape command requires the prefixes of
variables to be stated. If there are many variables to be
reshaped, then rather than type in their prefixes , let Stata
do the work.
First generate a pretend data set. In reality there will
be many more variables
clear input /// //1 id a2001 a2002 a2003 b2001 b2002 b2003 c2001 c2002 c2003 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 end list preserve //2 describe, replace clear //3 generate prefix=regexs(1) if /// regexm(trim(name), "([AZaz]+)([09]+$)") //4 contract prefix, nomiss //5 local a //6 forvalues i=1/`=_N' { //7 local a `a' `=prefix[`i']' //8 } restore //9 reshape long `a', i(id) j(item) //10 list exitGoing through the above:
(1)  The Stata command input is useful for inputting a small data set. 
(2)  The Stata command preserve is used here because the existing data will be cleared and a copy of the data is required. perserve copies the current data to the computer hard drive. 
(3)  The Stata command describe ;one of the options of describe is replace ie replacing the existing data with the contents of the describe command. The command is used to get a list of variable names. 
(4)  The Stata functions regexs and regexm are regular expression functions and are used to generate a new variable containing the variable prefixes. For more informatrion see our previous tip on regular expression at: November 2010 
(5)  The Stata command contract contacts a data set to a set of values and frequencies; similar to a oneway frequency table. This command is used to remove duplicates from the data. An alternative to this command is duplicates but in this case contract is easier to user. 
(6)  local initializes the value of the local macro, which we have called a. This will be used later to accumulate variable prefixes. 
(7)  The Stata command forvalues is a loop command which goes through all the variable prefixes in the data. 
(8)  Using the local macro a prefixes are accumulated. 
(9)  The Stata command restore replace the data currently in Stata with the data set previously saved by the preserve command. 
(10)  The Stata command reshape is used to change the data format from wide to long. The reshape command uses the prefixes stored in macro a. 
For help on specific commands type:
help
and then the specific command eg.
help imput
help list
help preserve
help forvalues
help describe
help macro
Fuzzy merge  base on time  an
alternative(July 2012)
Refering to last months fuzzy merge tip, the Stata
commands tsset and tsfill were used to get all possible times.
Sometimes the number of observations can exceed Stata's limit.
When this happens an alternative method is to append the 2
datasets and then determine which events are to be treated as
the same event.
cd c:/ //(1) clear input time //(2) 1 7 10 11 15 16 21 25 30 end generate date1=1 //(3) generate stuff=runiform() //(4) save date1, replace clear input time //(5) 8 19 30 end generate date2=1 //(6) save date2, replace append using date1 //(7) duplicates tag time, gen(same) //(8) drop if same==1 & date2==1 //(9) merge 1:1 time using date2 //(10) sort time list //(11) drop if missing(date1)& missing(date2) //(12) generate min=cond(time[_n+1]time>timetime[_n1], /// //(13) 1*(timetime[_n1]), /// //(13) time[_n+1]time ) if date2==1 //(13) list replace stuff=stuff[_n+sign(min)*1] if !missing(min)& abs(min)<=2 //(14) replace date1=date1[_n+sign(min)*1] if !missing(min)& abs(min)<=2 drop if (min[_n+1]<0 & abs(min[_n+1])<=2 )  (min[_n1]>0 & abs(min[_n1])<=2 ) //(15) drop min _merge egen total_matches=rowtotal( date2 same) //(16) order time stuff date1 date2 same total_matches //(17) list exitGoing through the above:
(1)  The Stata command cd changes the working directory to that specified eg. c:/ 
(2)  The Stata command input is useful for inputting a small data set. In merge jargon the data set created here is called the "using" data set; because it is called into Stata by the merge command. 
(3)  Creates a variable to indicate the "using" data set. 
(4)  Creates a variable containing random values which range from 0 to 1. The function runiform() does this. This is done to create some "play" data. 
(5)  Inputting the second data set. This is the data set that is in Stata's memory when we merge. In merge jargon this is called the "master". 
(6)  Create a variable that identifies this data set. 
(7)  Joins the 2 data sets vertically together; matching variable names. 
(8)  Using Stata's duplicate command and the tag option generates a new variable called same. This indicates where both data set match on the specified variable(s). 
(9)  Where the time match perfectly only one observation is required therfore the duplicate observation is dropped. 
(10)  Merge the data based on a one to one (1:1) relationship (1:1 meaning only 1 observation from the using data set and one observation from the master data set allowed) between the key variable (time) in the using and master data sets 
(11)  list the data to see if it looks as expected. 
(12)  drops the observation if both data1 and data2 are missing 
(13)  generates an indicator variable that tells us what
is the closest observation in terms of time. A negative sign
indicates if this is before the merge and a positve sign
indicates after the merge. To do this the cond()
function is used. The first term in the brackets ie time[_n+1]time>timetime[_n1]
evaluates to either true of false. If true, the second term
is used ie 1*(timetime[_n1]) if false the third
term is used ie time[_n+1]time ) the 2nd or 3rd
terms are only generated on the observations were date2
equals 1 ie date2==1 Terms like time[_n+1] make Stata work at the observation level. Observations to be used are defined by the contents of the square brackets. The variable outside of the square brackets is the variable name whose observations we are using. Inside the square brackets _n is the current observation hence the term _n+1 is the current observation plus 1 (one). 
(14)  Replaces the contents of stuff with stuff[_n+sign(min)*1] only if the expression !missing(min)& abs(min)<=2 is true. In other word if the time between the merge observation and the next closest time is less than or equal to 2 the merge observation is filled in with this value. 
(15)  Lastly the observation that comes closest to the merge is deleted. Because this can be above or below the merge observation 2 conditions are specified separated by an OR symbol ie  
(16)  Creates an indicator variable total_matches using the egen command that indicates where the data now matches 
(17)  change the order of the variables to make for easier reading/checking 
For help on specific commands type:
help
and then the specific command eg.
help imput
help list
help merge
help drop
help replace
Fuzzy merge  base on time
(June 2012)Merging 2 data sets in Stata on a key variable requires the key variable to match exactly. However if the key variable is time, small discrepancies (milli seconds) will result in a nonmatch even if the 2 observations relate to the same event. To merge data sets like this, a range of time can be used for a match. In this example we will specify a time range + and  for an acceptable match to occur.
cd c:/ //(1) clear input time //(2) 1 7 10 11 15 16 21 25 30 end generate date1=1 //(3) generate stuff=runiform() //(4) tsset time //(5) tsfill, full //(6) save date1, replace clear input time //(7) 8 19 30 end generate date2=1 //(8) merge 1:1 time using date1 //(9) sort time list //(10) drop if missing(date1)& missing(date2) //(11) generate min=cond(time[_n+1]time>timetime[_n1], /// //(12) 1*(timetime[_n1]), /// //(12) time[_n+1]time ) if date2==1 //(12) list replace stuff=stuff[_n+sign(min)*1] if !missing(min)& abs(min)<=2 //(13) replace date1=date1[_n+sign(min)*1] if !missing(min)& abs(min)<=2 drop if (min[_n+1]<0 & abs(min[_n+1])<=2 )  (min[_n1]>0 & abs(min[_n1])<=2 ) //(14) drop min _merge list exitGoing through the above:
(1)  The Stata command cd changes the working directory to that specified eg. c:/ 
(2)  The Stata commandinput is useful for inputting a small data set. In merge jargon the data set created here is called the "using" data set; because it is called into Stata by the merge command. 
(3)  Creates a variable to indicate the "using" data set. 
(4)  Creates a variable containing random values from 0 to 1. The function runiform() does this. This is done to create some "play" data. 
(5)  tsset is Stata's command that sets the data for time series. In this case we set the variable "time" as the variable that contains time for time series. The only reason we tsset the data is so that we can use the next command ie tsfill 
(6)  tsfill is Stata's time series command that fills gaps in the time variable 
(7)  Inputting the second data set. This is the data set that is in Stata's memory when we merge. In merge jargon this is called the "master". 
(8)  Create a variable that identifies this data set. 
(9)  merge the data based on a one to one (1:1) relationship between the key variable (time) in the using and master data sets 
(10)  list the data to see if it looks as expected 
(11)  drops the observation if both data1 and data2 are missing 
(12)  generates an indicator variable that tells us what
is the closest observation in terms of time. A negative sign
indicates if this is before the merge and a positve sign
indicates after the merge. To do this the cond()
function is used. The first term in the brackets ie time[_n+1]time>timetime[_n1]
evaluates to either true of false. If true, the second term
is used ie 1*(timetime[_n1]) if false the third
term is used ie time[_n+1]time ) the 2nd or 3rd
terms are only generated on the observations were date2
equals 1 ie date2==1 Terms like time[_n+1] make Stata work at the observation level. Observations to be used are defined by the contents of the square brackets. The variable outside of the square brackets is the variable name whose observations we are using. Inside the square brackets _n is the current observation hence the term _n+1 is the current observation plus 1 (one). 
(13)  Replaces the contents of stuff with stuff[_n+sign(min)*1] only if the expression !missing(min)& abs(min)<=2 is true. In other word if the time between the merge observation and the next closest time is less than or equal to 2 the merge observation is filled in with this value. 
(14)  Lastly the observation that comes closest to the merge is deleted. Because this can be above or below the merge observation 2 conditions are specified separated by an OR symbol ie  
For help on specific commands type:
help
and then the specific command eg.
help imput
help list
help merge
help drop
help replace
Getting the data right for Stata's
Excel command(May 2012)
With the release of Stata 12 the loading of Excel
spreadsheets became even easier (see previous tip  here ).
However, the loading of a spreadsheet may not always go
as planned/hoped. The following problems can occur:
(1)
Stata will not load the variable names in the first row because
some of these do not correspond to Stata's variable name
convention
(2) Variables comes in as a string when it should be
numeric due to:
(a) empty cells (not Stata's
missing symbol ie .) in some of the spreadsheet
(b) nonnumeric characters in the variable.
Example:
It is good practice to do all the fix up work in a Stata
do file; rather then doing repairs on the spreadsheet.
Addressing problem 1
import excel "C:\excel.xls", sheet("Sheet1") clear foreach i of varlist AD { capture confirm number `=`i'[1]' if _rc==0{ rename `i' Y`=`i'[1]' } else rename `i' `=`i'[1]' } drop in 1
The above imports the spreadsheet and renames the problem variables. The command confirm number determines if the first row of the data set is a number or a string. capture in front of the confirm command "captures" if the confirm statement is true or false, hence giving the return code of the capture ( _rc)command a value of 7 if a string or 0 if a number. The return code is then used to determine how the variable name is to be renamed.
Having a closer look at: `=`i'[1]'
`i' is the macro substitution of the looping index i. In this case the variable name
[1] when put adjacent to a variable name (without a space) the bracketed number indicates the observation number. In this case the first time around the loop it would equal the first observation of variable A. This is know as explicit subscripting.
`= ' the symbols around `i'[1] are to tell Stata to evaluate the expression see: Stata 12 Users Guide 18.3.8 page 201
foreach i of varlist Y* { capture confirm numeric variable `i' if _rc!=0 { replace `i'="." if `i'=="" destring `i', replace } } describe
The above once again uses the confirm command but this time with the variable option. Each variable with a Y prefix (the character that was previously included) is put through a loop where firstly empty cells are filled in with the Stata missing values and then the string variable is changed to a numeric variable with the destring command.
If the output of the descibecommand indicates that the variable is still a string then this may be due to a nonnumeric characters in the variable. One way of looking for the observation that contains this is:
forvalues i=1/`=_N' { capture confirm number `=Y2001[`i']' if _rc!=0 { display "Potential problem observation: "`i' } }
The output to the above indicates a problem with observations 1 and 2 of the Y2001 variable.
Another option for finding problem observation is to use hexdump.
For help on specific commands type:
help
and then the specific command eg.
help capture
help confirm
help forvalues
help foreach
help destring
help rename
help drop
Using Stata's file command
(April 2012)Stata's file command can be useful for:
(1) cleaning up data before importing it into Stata
(2) handling a text variable where the text exceeds 244 characters and only a portion of the data is required to be imported.
An example of doing No 2 is below:
An example of the text file; assume that the text is longer than 244 characters (Stata's max. string length) The data can be created in a text editor eg. Stata's do editor and saved as with a .txt extension.
"c:/file_try.txt"
one:two:three:four one1:two1
For this example assume that we require only the last 2 words in a new file (the words are delimited with a ":")
An example of a Stata program that will create a new file with only the last 2 words of each line is:
program ltype //(1) version 12.1 syntax , Current(string) New(string) P(string) //(2) tempname fh hdl //(3) file open `fh' using `"`current'"', read //(4) file open `hdl' using `"`new'"', replace write text //(5) file read `fh' line //(6) while r(eof)==0 { //(7) local kk=reverse("`line'") //(8) tokenize `kk',p("`p'") //(9) local No1 "`=reverse("`1'")'" //(10) local No3 "`=reverse("`3'")'" //(10) file write `hdl' %st10 ("`No1'") %st10 (":") %st10 ("`No3'") _newline //(11) file read `fh' line //(12) } file close _all //(13) endGoing through the above:
(1) program The start of the program starts with program and a name of the program: ltype. The program finishes with end.
(2) syntax command, passes information to the program via local macros ie. current is the macro name containing the file name of the initial file (raw data). new is the name of the local macro that contains the file name for the information obtained by the program (last 2 words of line). p is the local macro name of the delimiter.
(3) Creating tempory names
(4) Opening the file with the initial information
(5) Opening a file for the required data to be entered
(6) Read the first line of the file and store this in the local macro line
(7) While loop; continues until end of file is reached
(8) Reverse the order of the contents of the line macro so the last word becomes the first
(9) tokenize the line based on the parse character : ; break string into local macro's with names: 1,2 etc.
(10) make the contents of the local macro's No1 and No3 the reverse of the last and 2nd last words (: is treated as a word)
(11) writes macro contents to the new file separating then with ":"
(12) read new line of file
(13) closes both files
Then the above program can be run with the following:
type "c:/file_try.txt" ltype ,current("c:/file_try.txt") new( "c:/file_try1.txt") p(":") type "c:/file_try1.txt" infile str10 a using "c:/file_try1.txt", clear //loading file into Stata list
Cleaning data  consistent naming 
using soundex()(March 2012)
There are 2 ways that data can be cleaned in Stata:
manually or using a rule based system. Below is one way that
messy data can be cleaned with the assistance of Stata's soundex()
function and some manually cleaning.
//creating the data for this example clear input str20 w1 "Microsoft" "MicroSoft" "Micro Soft" "MicroSoft" "Microsoft Inc." "Microsoft Inc" "MicrosoftInc" "MicrosoftAA" "Microaa" "MSFT" "MS" "M$" "STATA" "StataCorp" "StataCorp LP" "staCorp" "Linux" "linux" end list, clean noobs save c:/a, replace generate kk=soundex(w1) // <1 generate New_w1="" //Linux foreach i in L520 { // <2 replace New_w1="Linux" if kk=="`i'" // <3 } //Microsoft foreach i in M262 M000 M200 M213 { replace New_w1="Microsoft" if kk=="`i'" } //Stata foreach i in S330 S326 S332 { replace New_w1="StataCorp" if kk=="`i'" } replace New_w1="Micro AA" if kk=="M260" sort kk list, sepby(New_w1) exitGoing through the above:
(1) soundex() The soundex code consists of a letter followed by three numbers: the letter is the first letter of the name and the numbers encode the remaining consonants. Similar sounding consonants are encoded by the same number.
(2) Stata loop for each of the soundex code(s)
(3)replace command that replace existing contents with the name that mapps to the soundex code.
The resulting data set
++  w1 kk New_w1   1.  Linux L520 Linux  2.  linux L520 Linux   3.  M$ M000 Microsoft  4.  MS M200 Microsoft  5.  MSFT M213 Microsoft   6.  Microaa M260 Micro AA   7.  Micro Soft M262 Microsoft  8.  MicroSoft M262 Microsoft  9.  Microsoft Inc. M262 Microsoft  10.  MicroSoft M262 Microsoft  11.  MicrosoftInc M262 Microsoft  12.  MicrosoftAA M262 Microsoft  13.  Microsoft M262 Microsoft  14.  Microsoft Inc M262 Microsoft   15.  staCorp S326 StataCorp  16.  STATA S330 StataCorp  17.  StataCorp LP S332 StataCorp  18.  StataCorp S332 StataCorp  ++
For help on specific commands type:
help
and then the specific command eg.
help soundex()
help generate
help replace
help use
Cleaning data  consistent naming 
manually (January 2012)
There are 2 ways that data can be cleaned in Stata:
manually or using a rule based system. Below is one way that
messy data can be cleaned manually so that names are
consistent.
//creating the data for this example clear input str20 w1 "Microsoft" "MicroSoft" "Micro Soft" "MicroSoft" "Microsoft Inc." "Microsoft Inc" "MicrosoftInc" "MSFT" "MS" "M$" "STATA" "StataCorp" "StataCorp LP" "staCorp" "Linux" "linux" end list, clean noobs save c:/a, replace contract w1 //>1 edit //>2 //get by hand all the different forms of the one name. // In this case variations on Microsoft clear input str20 w1 "Microsoft" "MicroSoft" "Micro Soft" "MicroSoft" "Microsoft Inc." "Microsoft Inc" "MicrosoftInc" "MSFT" "MS" "M$" end generate x2="1" //>3 save c:/a1, replace //merging and replacing with the correct name use c:/a, clear //>4 merge 1:m w1 using c:/a1 , nogenerate //>5 replace x2="Microsoft" if x2=="1" //>6 save c:/a, replace contract w1 if x2=="" edit //get by hand all the different forms of the one name. // In this case variations on Stata clear input str20 w1 "STATA" "StataCorp" "StataCorp LP" "staCorp" end generate x3="1" save c:/a1, replace //merging use c:/a, clear merge 1:m w1 using c:/a1, nogenerate replace x2="Stata" if x3=="1" drop x3 //Etc until all the names are consistentGoing through the above:
(1) contract the dataset to a list of names and frequencies
(2) open the Stata editor so that the various names can be copied. If a large list and the required names are spread throughout the list a new variable can be created and a 1(one) put in, adjacent to the names. The variable with the 1's can be sorted and then the variations on the required name can be copied into the do file.
(3)generate a new variable to be used after the merge command that indicates the names to be changed
(4) Load original dataset (5) Merge the original dataset with the list of names dataset (6) Replace the "1" in the previously generated variable (3) to the official name "Microsoft"
Repeat the process again until all names are as required.
For help on specific commands type:
help
and then the specific command eg.
help contract
help generate
help merge
help use
help edit
Working with Dates 3
(January 2012)A problem that comes up from time to time is where you have clustered dates eg. going to the doctor for treatment and subsequent follow up(s). You may wish to group each issue (intial treatment and follow up into a group). Without detailed records as to the ailment treated on which date you can attempt to do this by making an assumption as to how long the treatment is likely to last. For example you may have the following data:
Assume that each treatment and follow up is no longer than 30 days.
clear set more off input /// str6 id str20 date 01003 07Nov2008 01003 07Nov2008 01003 11Nov2008 01007 22Dec2008 01007 05Dec2008 01007 13Nov2007 01007 14Nov2007 01007 22Jul2006 01007 22Jul2006 01007 22Jul2006 01007 11Sep2006 01009 13Oct2005 01009 17May2006 01009 17May2006 01009 13Jan2010 01009 06Jun2010 01008 08Nov2007 01008 08Nov2007 01008 08Nov2007 01008 15Jul2009 01008 15Jul2009 01008 15Jul2009 01008 27May2010 01008 28May2010 01008 28May2010 01008 28May2010 end l generate date1=date(date, "DMY") //1 generate cluster=. //2 list, sepby(id) tempvar max //3 bysort id (date1): gen `max'=_N //4 summarize `max' //5 forvalues i=1/`r(max)' { //6 bysort id (date1): replace cluster=`i' if /// //7 date1<=(date1[sum(cluster!=.)+1]+30) & cluster==. list, sepby(id) // list command to show what is happening // can be removed } list, sepby(id) exitGoing through the above:
(1) generate a new variable (date1) that takes the date in string format from date and converts this into elasped time (a numeric value)
(2) generate a new variable called cluster; all values equal to missing (.)
(3) Assigns name to temparory variable max
(4) Using the bysort prefix, by every level of id the values of the temporary variable max is generated and filled with values of _N. Note macro subsittion single brackets around the temporary variable name. _N stands for the max number of observations. In this case, because of bysort, it is the max number of observation for each level of id.
(5) The summarize command is used to obtain the max number of obseravations in all the levels of max. The summarize command has a handy return value that stores this eg. r(max). To see the other values returned by this command type: return list (after the summarize command)
(6) looping over the code in the curly brackets using forvalues loop
(7) using the bysort command replace the value of cluster with the looping index value (this will be the group number) if the qualifer is true ie. the start of the next cluster and within 30 days of the start of the new cluster.
Breaking the qualifer down: date1<=(date1[sum(cluster!=.)+1]+30) & cluster==.
cluster!=. logical statement either true of false ie if cluster does not equal (!=) a missing value (.) the observation is true and equals 1 (one)
sum(cluster!=.) : sums the results of cluster!=.
+1: add 1 (one) to the result of the sum(cluster!=.)
date1[sum(cluster!=.)+1]:inside the square brackets (explicit subscripting) Stata has calculated the observation number of date1 that we require eg. data1[obs no]. Stata gets the data for the variable date1 (a date) and adds 30 days to this. Stata then tests if the current observation of date1 is <= to the value calculated by date1[sum(cluster!=.)+1]+30) and also that the current value of cluster if missing (.). If the statement is true the looping index value (`i') replace the current value for observation of cluster.
For help on specific commands type:
help
and then the specific command eg.
help input
help generate
help summarize
(the saved results from the summarize command can be seen be typing: return list after the summarize command
help macro
help forvalues
help sum()
help tempvar
help list
Working with Dates 2
(December 2011)A problem that comes up from time to time is where, say hospital wards (could also be hospital beds, cars, hotel rooms,
machines etc.) are used for a patient of minutes/hours/days and management wishes to know at the end
of the month how many minutes each ward is used for.
An example:
clear input /// str30 date_in str30 date_out ward "7/22/2011 22:59" "7/27/2011 10:12" 1 "8/27/2011 12:05" "8/27/2011 21:07" 2 "8/27/2011 10:46" "8/28/2011 19:45" 1 "8/28/2011 15:34" "8/28/2011 16:43" 2 "8/28/2011 23:24" "8/29/2011 13:43" 1 "8/27/2011 14:32" "8/28/2011 15:15" 2 "8/28/2011 09:43" "8/28/2011 17:49" 1 "8/28/2011 01:33" "8/28/2011 02:32" 2 "8/28/2011 04:43" "8/29/2011 05:53" 1 "8/31/2011 07:30" "8/31/2011 08:11" 2 end list set more off split date_in, gen(kk) // (1) split date_out, gen(zz) generate double date_in2=date(date_in,"MDY hm") // (2) format date_in2 %td generate double date_out2=date(date_out,"MDY hm") format date_out2 %td summarize date_in2 // (3) local d1=r(min) // (3) summarize date_out2 // (3) local d2=r(max) // (3) local range=`d2'`d1' // (4) forvalues i=0/`range' { // (5) local kk=`d1'+`i' // (6) generate datea`kk'=1 if inrange(`d1'+`i', date_in2,date_out2) // (7) label var datea`kk' `="`=day(`d1'+`i')'"+ "_"+ /// // (8) "`=month(`d1'+`i')'"+ "_"+"`=year(`d1'+`i')'"' } generate id=_n // (9) reshape long datea, i(id) j(datekk) string // (10) bysort id:gen double datea1=sum(datea) if !missing(datea) bysort id:replace datea=sum(datea) if !missing(datea) levelsof id, local(id1) foreach i of local id1 { summarize datea if id==`i' replace datea1=24*60 if datea!=`r(min)' & datea!=`r(max)' /// & id==`i' & !missing(datea) replace datea1=24*60(clock(kk2,"hm")/(1000*60)) if /// datea==`r(min)' & id==`i' & !missing(datea) replace datea1=clock(zz2,"hm")/(1000*60) if /// datea==`r(max)' & id==`i' & !missing(datea) //enter and discharge the same day replace datea1=(clock(zz2,"hm")clock(kk2,"hm"))/(1000*60) if /// datea==`r(max)' & datea==`r(min)' & id==`i' & !missing(datea) } collapse (sum) datea1, by(ward datekk) // (11) destring datekk, gen(date) // (12) format date %td rename datea1 time // (13) label var time "time in minutes" // (14) list, sepby(ward) // (15)Going through the above:
(1) Split string dates into day and time
(2) Date/time; input as strings are converted to elasped time (numbers of milliseconds from a datum). (3) The minimum and maximum dates are obtained with the summarize command
and saved in local macros.
(4) The range is calculated and saved in a local macro.
(5) Using the forvalues command the days of the range are looped through.
(6) The date, in elasped days is calculated.
(7) A new variable for each day is calculated and 1 included in the observation where
the loop date is in the range indicated by the inrange() function.
(8) The newly created variable is give a label; which is the loop date.
(9) Generate a unique id value to be used by the reshape command.
(10) Reshape the data from wide to long data format.
(11) Collapse the data to give the required results.
(12) Use the destring command to convert a string variable to a numeric variable.
(13) Rename variable.
(14) Include a variable label.
(15) Finally, list the results.
For help on specific commands type:
help
and then the specific command eg.
help input
help generate
help summarize
(the saved results from the summarize command can be seen be typing: return list after the summarize command
help macro
help forvalues
help collapse
help destring
help rename
help label
help list
Working with Dates
(November 2011)A problem that comes up from time to time is where, say hotel rooms (could also be hospital beds, cars,
machines etc.) are booked for a number of days by the one person and management wishes to know at the end
of the month how many rooms for each day were occupied.
An example:
clear set more off input /// str30 date_in str30 date_out "7/22/2011" "8/27/2011" "8/27/2011" "8/27/2011" "8/27/2011" "8/28/2011" "8/28/2011" "8/28/2011" "8/28/2011" "8/29/2011" "8/27/2011" "8/28/2011" "8/28/2011" "8/28/2011" "8/28/2011" "8/28/2011" "8/28/2011" "8/29/2011" "8/31/2011" "8/31/2011" "8/31/2011" "8/31/2011" "8/31/2011" "9/4/2011" "8/23/2011" "8/23/2011" "8/23/2011" "8/24/2011" "8/24/2011" "9/15/2011" "8/4/2011" "8/4/2011" "8/4/2011" "8/8/2011" "8/10/2011" "8/10/2011" "8/10/2011" "8/17/2011" end list generate date_in1=date(date_in,"MDY") // see (1) below generate date_out1=date(date_out,"MDY") // (1) format date_in1 date_out1 %td // (2) summarize date_in1 // (3) local d1=r(min) // (3) summarize date_out1 // (3) local d2=r(max) // (3) local range=`d2'`d1' // (4) forvalues i=0/`range' { // (5) local kk=`d1'+`i' // (6) gen datea`kk'=1 if inrange(`d1'+`i', date_in1,date_out1) // (7) label var datea`kk' `="`=day(`d1'+`i')'"+ "_"+ /// // (8) "`=month(`d1'+`i')'"+ "_"+"`=year(`d1'+`i')'"' } generate id=_n // (9) reshape long datea, i(id) j(datekk) string // (10) collapse (sum) datea, by(datekk) // (11) destring datekk, replace // (12) format datekk %td rename datea rooms_oc // (13) label var rooms_oc "rooms occupied" // (14) list, sep(0) // (15)Going through the above:
(1) Dates; input as strings are converted to elasped time (numbers of days from a datum).
(2) Dates are formated.
(3) The minimum and maximum dates are obtained with the summarize command
and saved in local macros.
(4) The range is calculated and saved in a local macro.
(5) Using the forvalues command the days of the range are looped through.
(6) The date, in elasped days is calculated.
(7) A new variable for each day is calculated and 1 included in the observation where
the loop date is in the range indicated by the inrange() function.
(8) The newly created variable is give a label; which is the loop date.
(9) Generate a unique id value to be used by the reshape command.
(10) Reshape the data from wide to long data format.
(11) Collapse the data to give the required results.
(12) Use the destring command to convert a string variable to a numeric variable.
(13) Rename variable.
(14) Include a variable label.
(15) Finally, list the results.
For help on specific commands type:
help
and then the specific command eg.
help input
help generate
help summarize
(the saved results from the summarize command can be seen be typing: return list after the summarize command
help macro
help forvalues
help collapse
help destring
help rename
help label
help list
Doing things by levels of a variable
(October 2011)
levelsof is a useful Stata command for doing
something by levels of a variable. For example producing a
histogram of mpg by levels of the variable foreign eg.
clear sysuse auto, clear levelsof for, local(level) foreach i of local level { histogram mpg if for==`i', name(a`i') }
However, levelsof fails when there are many levels, as can be seen from the snipit of code:
clear set more off set obs 100000 gen a=_n levelsof a, local(aa)
The levelsof help file states that this command is best used if the number of levels is modest.
What to do if the number of levels exceeds the limit?
The following are 2 methods:
Method 1
This method contracts the variable that the levels are required for and then merges it with the dataset, hence the levels are contained in the Stata dataset:
Example:
sysuse auto, clear expand 10000 graph drop _all preserve contract mpg rename mpg levels save c:/kk, replace restore merge 1:1 _n using c:/kk drop _freq drop _merge sum levels forvalues i=1/`=r(N)' { scatter price weight if mpg==mpg[`i'], name(a`i') } exit
Method 2
Using Mata to get the levels of a variable
sysuse auto, clear graph drop _all expand 10000 set more off mata: a=uniqrows(st_data(.,"mpg")) a for(i=1;i<=rows(a);++i){ st_local("i1",strofreal(a[i])) stata("scatter price weight if mpg=="+st_local("i1")+", name(a" + st_local("i1")+")") } end
For help on specific commands type:
help
and then the specific command eg.
help levelsof
help contract
help mata
help mata st_local()
help mata stata
help mata unique
Speeding up Stata  the if statement
(September 2011)
Stata is fast but it can be sped up by taking a close
look at the way your Stata commands have been coded. In this
tip we will look at the if qualifer.
The if qualifier statement is computationly intensive
and adds considerable time to the running of a command that
includes this. However there are certain circumstanes where
this can be replaced and hence Stata's running time reduced.
Example 1
This shows how you would normally
run a number of regressions eg. just adding the qualifiers
behind the regress command.
// creating a data set clear timer clear //creeat data set obs 10000000 gen a=uniform() gen b=uniform() gen c=uniform() save c:/exp1, replace clear //Example 1 //running regressions timer on 1 use c:/exp1 regress a b c if c<.5 regress a b c if c<.5 regress a b c if c<.5 timer off 1 timer list //the timer gives the following results: . timer list 1: 20.25 / 1 = 20.2500
Example 2
The above example's comands have been modified to bring in only the required observations
(the ones that satisfy the qualifier). To do this we use the 2nd syntax of the use command.
clear timer on 2 use if c<.5 & b<.5 using c:/exp1 regress a b c regress a b c regress a b c //use c:/exp1 timer off 2 timer list //the timer gives the following results: . timer list 1: 20.25 / 1 = 20.2500 2: 9.75 / 1 = 9.7500As you can see example 2 runs considerably faster than example 1
Example 3
Another way of speeding up Stata is to create a variable where 1 equals
the observations that are to be included in the regression and then use a
less complex if statement.
clear timer on 3 use c:/exp1 mark a1 if c<.5 & b<.5 regress a b c if a1 regress a b c if a1 regress a b c if a1 timer off 3 timer list //the timer gives the following results: 1: 20.25 / 1 = 20.2500 2: 9.75 / 1 = 9.7500 3: 17.28 / 1 = 17.2820
For help on specific commands type:
help
and then the specific command eg.
help use
help mark
Stata 12's new Excel command
(August 2011)With Stata 12 there are some new commands that make getting tables into an Excel spreadsheet easier.
Stata 12 returns a matrix of the regression table in r(table) to see this do a regression and type:
matrix list r(table)
Stata 12 has a command for exporting into data in an Excel file eg. export Excel This command can be access via GUI eg. File>Export>Excel spreadsheet or via the commandline. To see the syntax type:
help import_excel
The following is an example of getting regression results into an Excel spreadsheet.
clear all sysuse auto, clear set more off ds, not(type string) capture erase "c:\stuff.xls" local z=1 foreach i of varlist `r(varlist)' { sysuse auto, clear if "`i'"=="length" { continue } regress length `i' matrix a1=r(table) matrix a2=a1[1..6,1..2]' matrix list a2 clear svmat a2, names(matcol) generate name="`i'" in 1 replace name="_cons" in 2 if `z'==1 { export excel using "c:\stuff.xls", sheetmodify cell(a`z') firstrow(variables) } else { export excel using "c:\stuff.xls", sheetmodify cell(a`=((`z'1)*2)+2') } local ++z } //loopFor help on specific commands type:
help
and then the specific command eg.
help import
Stata 12 PDF files of logs and
graphs (July 2011)
In Stata 12 log files
are still output as either SMCL or text. However, in
Stata 12 these log files can be converted into PDF files. This
can be easily done with the Stata translate command for
example:
log using c:/log1, replace sysuse auto, clear tab rep78 foreign log close translate c:/log1.smcl c:/log1.pdf , translator(smcl2pdf)Also, in Stata 12 you can produce a PDF of a graph from within Stata. Example
sysuse auto, clear scatter mpg weight //, name(g1) graph export c:/graph.pdf //name(windowname)For help on specific commands type:
help
and then the specific command eg.
help translate
help graph export
Using value labels for bar graph
labels (June 2011)
It is sometimes more
convenient to use value lables rather than the graph relabel
options to change graph bar labels. In the example below using
value labels also allows the legend to be spread over the width
of the graph.
An example:
clear all
sysuse auto
label define origin 0 "Europe de l`=char(146)'Ouest" ///
1 "Asie de l`=char(146)'Est", modify
graph hbar mpg trunk turn, over(foreign) ///
legend(row(1) span) stack name(two,replace)
For help on specific
commands type: help and then the specific command eg. help
label
Automatically sending emails from
Stata  Windows platform (May 2011)
If you are running a large model and wish to know how Stata is
progressing or would like a log file emailed to you or others
when Stata has finished a do file or would like Stata to send
out emails based on a program that you write, then the
following can be used.
To do this a program called CommandLineEmailer
must be downloaded. (This is not a Stata program) Intructions
to download this are at Notes for Options 1, 8. below. To run CommandLineEmailer
a small text file is written in stata.
Options 1
Getting Stata to automatically send an email to indicate
progress in the running of a do file:
capture erase kk2.txt log using c:/kklog,text replace set more off forvalues i=1/2000 { //data to run the email program if mod(`i',100)==0 { //<1 tempname fh file open `fh' using kk2.txt, write //<2 file write `fh' "smtpserver = mail.whatever.com.au" _n //<3 file write `fh' "from = myeamail@whatever.com.au" _n //<4 file write `fh' "to = reciever@whatever.com.au" _n //<5 file write `fh' "subject = Test Message" _n //<6 file write `fh' "body = `i' Test Message" _n //<7 file close `fh' !CommandLineEmailer /p:kk2.txt //<8 erase kk2.txt //<9 } log close exit
Notes for Option 1:
1. mod(`i',100)==0 determines when an email is to be sent. Other methods can be used.
2. Using Stata's file command you create a text file that contains the instructions to run CommandLineEmailer The text file created was called : kk2.txt
3. The address "mail.whatever.com.au" must be changed to your address. To find this out with Windows Live:
Open Windows Live
Using pulldown menu: Tools>Accounts
Click on: Mail
Click on: Properties
Click on: the "Servers" tab
Find the address at: Outgoing Mail [STMP]
The "_n" indicates newline.
4. Change the from email address to that required.
5. Change the to email address to that required.
6. Change the Subject title to that required.
7. Change the message in the body of the email to that required. In the above we have included `i' to indicate the number of loops that have been completed. Other data can be included.
8. Calls the file that will run the above code.
! send commands to your operating system see: help shell
CommandLineEmailer: is the file that must first be downloaded. This can be obtained from:
http://www.codeproject.com/KB/IP/cpcommandlineemailer.aspx
eg. Download compiled utility  6.05 Kb
(You must log into to download  free and easy to do)
9. Erase text file so a new one can be written.
Option 2
If you require that the log file be emailed to you (or others) when the analysis has been completed. The following can be done:
capture erase kk2.txt log using c:/kklog,text replace set more off forvalues i=1/2000 { display "Looping index: `i'" } log close //text file to run CommandLineEmailer tempname fh file open `fh' using kk2.txt, write file write `fh' "smtpserver = mail.tpg.com.au" _n file write `fh' "from = myemail@whatever.com.au" _n file write `fh' "to = email@whatever.com.au" _n file write `fh' "subject = Test Message" _n file write `fh' "body = log sent: `c(current_date)' `c(current_time)'" _n file write `fh' "attachment = c:\kklog.log" _n //<10 file close `fh' !CommandLineEmailer /p:kk2.txt exit
Note for Option 2:
10. Attaches the log file to the email.
See Option 1 notes above for other details
For help on specific commands type:
help and then the specific command eg. help obs
Generating a dataset
(April 2011)Sometimes researchers expect a large dataset at some time in the future and wish to make sure that their version of Stata can handles the dataset (within the limits of their version of Stata). Also, they may wish to check that their version does the analysis in a timely manner and their computer is set up to handle the data; otherwise there may be a need to upgrade the computer and/or upgrade the flavour of Stata eg. Stata/MP
To see the limits of your existing flavour of Stata type: Help limits
The following examples generate sample data sets to experiment with.
(1) Generates a data sets with a number of continuous variables and observations that are specified.
clear all set memory 300m //< allocates 300 megabits of memeory to Stata set obs 1000 //<No. of observations gen y=uniform()*10 forvalues i=1/100 { //<No of variables gen a`i'=uniform()*100 //<cont. variables } summarize
(2) Generates a binary variable and continuous variables.
clear all set memory 300m set obs 1000 //<No of observations generate y=uniform()<.5 //<binary variables forvalues i=1/100 { //<No. of cont. variables generate a`i'=uniform() //<cont. variables } tabulate y
(3) Generates a categorical variable and continuous variables.
clear all set memory 300m set more off set obs 1000 //<No of observations generate y=mod(_n,4)+1 //<cat. variable forvalues i=1/10 { //<No of cont. variables generate a`i'=uniform() //<cont. variables } tabulate y
For help on specific commands type: help and then the specific command eg. help obs
Printing log files
(March 2011)Recently a few people have inquired about the printing of log files. People have had problems with the truncation of right hand side the log file.
Stata has a few settings that allows control over the way a log is printed.
Option 1
Stata has various system settings. These can be seem by typing: query
To set the width of the text across the page use:
set linesize #
Example:
set linesize 85
The above example sets the linesize on the Results windows and hence the log to 85 characters.
(Note: not all commands are effected by the linesize setting, see the Stata 11 manual for more details)
Note: The linsize setting must be done prior to running the log.
Option 2
When printing you can also control the print font size. To change this load the log file into a Viewer window and :
Option 3
Print using the print command and include overrides eg.
print c:/experiment.smcl, header( off) fontsize( 6) logo(off) lmargin(3)
The overrides for the translators can be found by typing the "translator query" and the the name of the translator. For example:
translator query smcl2prn
For further help on the above code, type the following on the Stata command line:
help log
query
help viewer
help translator
Working with dates
(February 2011)Stata has a considerable collection of time and date functions. These can be found by typing:
help date()
Often you wish to limit the command to before, after or between particualar dates.This is easily done using the date pseudofunction or if the dataset has been set for time series the tin() function.
Example using a Pseudofunction
Find the number of observations greater than a specified date.
clear input str20 starts_d a "20jan1980" 1 "20jan1981" 2 "20jan1982" 3 "20jan1983" 4 "20jan1984" 5 "20jan1985" 6 "20jan1986" 7 "20jan1987" 8 "20jan1988" 9 end list generate date1=date(starts_d,"DMY") //<Note 1 summarize a if date1>td(25April1985) //<Note 2
Note 1: Generates a new variable (date1) which is the elapsed time in days from a date datum (1 Jan 1960). This variable is numeric.
Note 2: Summarizes a subset of the data. The subset being determined by the pseudofunction function td(). The number of observations in the subset are shown under obs.
Example using the tin() function
Find the number of observations up to a specified date.
clear input str20 starts_d a "20jan1980" 1 "20jan1981" 2 "20jan1982" 3 "20jan1983" 4 "20jan1984" 5 "20jan1985" 6 "20jan1986" 7 "20jan1987" 8 "20jan1988" 9 end list generate date1=date(starts_d,"DMY") format date1 %td tsset date1 //<Note 3 list if tin(,25Apr1985) //<Note 4
Note 3: tsset is the command to set the data for time series
Note 4: tin() determines the subset of the data. This function allows a lower and upper limit to be specified; the lower limit being on the left and the upper on the right. If the left hand limits is omitted Stata assumes that the lower limit is to be taken from the beginning of the data and conversely if the right hand limit is omitted Stata assumes the end of the dataset.
For further help on the above code type the following on the Stata command line:
help date()
help tsset
Producing Multiple graphs
(January 2011)Multiple graphs can be produced in Stata 11 with loops. If all the numeric variables are required to be graphed as histograms the following can be used:
sysuse auto, clear foreach i of varlist _all { capture confirm numeric variable `i' if _rc==0 { histogram `i', name("`i'") } }The Stata "confirm" command checks if the variable is a numeric variable. If it is the Stata prefix "capture" command returns _rc as 0 if not some other value is returned. Then the return code _rc is then checked with the "if" command, if true the histogram is drawn if false the next variable in the "foreach" loop is run.
If you do not wish to run all the variables in the dataset the following can be used:
sysuse auto, clear graph drop _all // drop existing graphs local a "mpg turn" foreach i of local a { capture confirm numeric variable `i' if _rc==0 { histogram `i', name("`i'") } } exit
Alternatively using the Stata's ds command:
sysuse auto, clear graph drop _all // drop existing graphs ds , has(type int) return list foreach i of varlist `r(varlist)' { histogram `i', name("`i'") } exit
Or
sysuse auto, clear graph drop _all // drop existing graphs ds make , not return list foreach i of varlist `r(varlist)' { histogram `i', name("`i'") } exitThis time still using the ds command but excluding the variables that you do not wish to graph with the not options.
The default display for multiple graphs is to show each graph in a separate graphics window. To show all the graphs in the one window (tab graphs) the stata setting: autotabgraphs can be set to on eg.
set autotabgraphs on
Also, when displaying graph in the one graphics window the display can be altered by pulling the tab into the desired part of the window. An example:
For further help on the above code, type the following on the Stata command line:
help capture
help ds
help forvalues
Stata 11 PDF
(December 2010)Stata 11 includes the manuals on PDF; all 8000+ pages! The manuals include detailed examples of Stata commands, technical details, references and the maths for the command. While Stata's online help is handy for those that are already familiar with the command, the manuals are very useful for learning about new commands. There are various way of accessing the PDF manual. These are :
1. To access the entire set of PDF manuals you can use the Pull down menu: Help>PDF Documentation
2. For a specific entry, open a Stata online help page (eg. help regress ) and then click on the hyperlink
3. Creating a hyperlink on the Results Windows. This is particularly helpful for Stata courses or emailing a reference to a fellow Stata user.
display in smcl "{manpage GSM 141} {hline 2} starting MAC"
display in smcl "{manlink R regress} {hline 2} Linear regression"
OR
Creating your own ado file with PDF hyperlinks eg.
*******pdf.ado************
program pdf
display in smcl "{manpage GSM 141} {hline 2} starting MAC"
display in smcl "{manlink R regress} {hline 2} Linear regression"
end
**************************
save the above file as pdf.ado and put it in the adopath (suggest c:/ado/personal)
then to bring up the hyperlink type pdf on the Stata command line.
For further help on the above code, type the following on the Stata command line:
help adopath
help display
help smcl
Regular Expressions
(November 2010)Stata has regular expressions that allow you to work with simple or complex text.
Regular expressions are listed under string functions.
One application of regular expression is for working with address data. The following show how to (in most cases) separate the postcode and state from an address.
clear input /// str100 address "1234 West St Blackburn 3000 Vic" "West St 1234 Blackburn 3000 vic" "West St 1234 Blackburn Vic 3000" "West St 1234 Blackburn sa 3000" "12 West St Backburner 2001 nsw" end list //getting postcode generate postcode1=regexs(2) if regexm(address,"(^.*)([09][09][09][09])") //comment: reg1 //get state generate state=regexs(0) if regexm(address,"([Vv][Ii][Cc][Nn][Ss][Ww][Ss][Aa])") //comment: reg2 //You could varify the first number of the postcode matches the state generate check=1 if lower(state)=="vic" & regexm(postcode1,"[09]") & regexs(0)=="3" //comment: reg3 listNotes:
reg1: (^.*) means get any text "." zero or more times "*" and the brackets around this indicate a subsection of the string  in this case subsction 1
Subsection 1 is to continue until the last 4 digit number as indicated by: ([09][09][09][09])
reg2: ([Vv][Ii][Cc][Nn][Ss][Ww][Ss][Aa]) requires a match of 3 characters the first character being either V or v and the second character being either I or i etc. if the first 3 characters have not been found then it continues to look for a match with the next group of 3 characters. The "" symbol is a logical OR.
reg3: looks for a match of state: lower(state)=="vic" , the lower() function makes sure that we are comparing the states in lower case. regexs(0)=="3" checks the match of the previous statement with the number 3; the correct start of the vic postcode.
Assuming that the postcode has been incorrectly coded with the inclusion of alpha characters and needs to be cleaned up. The following is one way of doing this.
clear input /// str100 address " 3a00c1 West St Blackburn 3a00c0 Vic" "West St 123 Blackburn 3Re00c1 vic" "West St 123 Blackburn Vic 3f000" "West St 123 Blackburn sa 30jj00" "12 West St Backburner 2001 nsw" end list tempvar a1 a2 a3 gen `a1'="" gen `a2'="" gen `a3'="" local aa "[AZaz]" //assume that the post code is in the second half of the string replace `a1'=regexs(0) if regexm( substr(address,strlen(address)/2,.)," ([3])(`aa'[09])*") //comment: reg4 replace `a2'=regexs(3) if regexm(`a1', " ([3])(`aa'*)([09]*)") replace `a3'=regexs(5) if regexm(`a1', " ([3])(`aa'*)([09]*)(`aa'*)([09]*)") generate code="3"+`a2'+`a3' if `a1'!="" listNotes:
reg4: substr(address,strlen(address)/2,.) limits the search to the second half of the string. The space in " ([3]) between the " and (, indicates that a space is require and ([3]) indicates that this must start with the number 3. The second subsection: (`aa'[09])*") looks for lower or uppercase characters OR; as indicated by OR symbol: "", a number. The "*" at the end of the 2nd statements indicates zero or more times.
The following is problem that requires the separating of the days, months and years into separate variables.
clear input /// str40 dpr "2 yrs 5months 26 days" "3 yrs 2 months" "1yr 9 months" "1 yr 8 months" "1 yr 11 months 28 days" "1 yr 12 days" "3 yrs 3 months12 days" "3yrs 4 months 26 days" "1 yr 9mnths 8 days" end list generate year=regexs(1) if regexm(dpr, "^([09])([years ])") generate months=trim(regexs(1)) if regexm(dpr, "([09][ ]?)m") generate days=regexs(1) if regexm(dpr, "([09]+[ ]?)d") list
For further help on the above code, type the following on the Stata command line:
findit regular expressions
Stata's profile.do
command
(October 2010)
A useful addition
to your Stata setup is a profile.do file. This is a do
file that Stata looks for and runs when starting a Stata
session.
To create a profile.do file, click on the "New
dofile editor" or type doedit on the Stata command line and
then type in commands that you wish to have executed when Stata
starts up. Then save this file where Stata can find it ie. on
the adopath.
Included in the profile.do file can be:
Stata settings eg.
set memory 30m
set matsize 800
Setting the default directory:
cd c:/data
defining quick keys ie.
global F4 "summarize;"
Pressing F4 now
executes the summarize command.
global F5 "sysuse auto, clear;"
Pressing
F5 loads the auto data set that comes with Stata.
global F6 " display in smcl _newline(60);"
Pressing F6 creates 60 new lines so the Results window looks
clean.
The profile.do file can also be used to load dialogue
boxes into the USER pulldown menu. For an example see:
http://www.statajournal.com/sjpdf.html?articlenum=pr0012
(this show how to include metaanalysis dialogue boxes)
When including a profile.do make sure that it is
on the adopath; so Stata can find it. To see the adopath
type: adopath
For further help on the above code see:
Stata 11
Getting started manual
help adopath
help
profile.do
Use System variables' _n and
_N (September 2010)
Stata's
system variabels' _n and _N can be used to do a large number of
otherwise difficult tasks. In this tip we will illustrate some
of things that these can be used for.
Defintion:
_n : Current observation
_N : Total number of
observations in data set currently in memeory
**Example 1
Generating observations that are a
sequent of numbers equal to the Stata observation number. The
resulting variable: number
Generating observations equal to the last observation
number. The resulting variable: number_T
clear all set obs 10 generate number=_n generate number_T=_N
The result of running the above is:
++  number number_T   1.  1 10  2.  2 10  3.  3 10  4.  4 10  5.  5 10   6.  6 10  7.  7 10  8.  8 10  9.  9 10  10.  10 10  ++
**Example 2
Reversing the data so that the _N (last) observation become the first. This done for a particular variable.
clear set obs 10 generate number=_n generate rev_number=number[_N_n+1] list
The result of running the above is:
++  number rev_nu~r   1.  1 10  2.  2 9  3.  3 8  4.  4 7  5.  5 6   6.  6 5  7.  7 4  8.  8 3  9.  9 2  10.  10 1  ++
**Example 3
Used _N with the bysort command to generate a variable that has the total number of children in families.
clear input /// famid child 1 1 2 1 2 2 2 3 3 1 3 2 3 3 3 4 end bysort famid: generate number=_N list, sepby(famid)
The result of running the above is:
++  famid child number   1.  1 1 1   2.  2 1 3  3.  2 2 3  4.  2 3 3   5.  3 1 4  6.  3 2 4  7.  3 3 4  8.  3 4 4  ++
**Example 4
_n and _N can also be used as a qualifier. In this example marking ,for each family, the child who has the greatest income. The income variable is in brackets which tells Stata to sort this variable by income. When sorted the last observation (_N) ,by family, is the greatest income for that family.
clear input /// famid child income 1 1 100 2 1 150 2 2 200 2 3 250 3 1 10 3 2 100 3 3 500 3 4 250 end bysort famid (income): generate number=1 if _n==_N l, sepby(famid)
The result of running the above is:
++  famid child number   1.  1 1 1   2.  2 1 3  3.  2 2 3  4.  2 3 3   5.  3 1 4  6.  3 2 4  7.  3 3 4  8.  3 4 4  ++
**Example 5
Generating lags and leads in the data.
clear input /// time sales 1 100 2 150 3 200 4 250 5 10 6 100 7 500 8 250 end generate lead=sales[_n+1] generate lag=sales[_n1] generate lags=(sales[_n1]+sales[_n2])/2 list
The result of running the above is:
++  time sales lead lag lags   1.  1 100 150 . .  2.  2 150 200 100 .  3.  3 200 250 150 125  4.  4 250 10 200 175  5.  5 10 100 250 225   6.  6 100 500 10 130  7.  7 500 250 100 55  8.  8 250 . 500 300  ++
For further help on the above code see:
Users guide: [U]13.4 System variables ( variables)
help bysort
Producing an edited log file
(August 2010)
Stata's log file
reproduces what you see in the Results windows. Often there is
a lot of material that is not needed for a final report and
this material needs to be edited before presenting a report to
others. Stata's log file can be edited from the do file as it
is written. Just write a do file as is normally done and then
decide what is required to be included.
The example below has 2 ways of contolling the final log
file output:
1. Turning the log on and off so only the
material that you wish to see is added
To do this write a
few local macros at the start of the log file and include these
where required between Stata commands.
2. Removing any text as required
Using filefilter
to remove the unnessary text.
//set macros local new "capture log using out1, text replace" local on "capture log using out1,text append" local off "capture log close" sysuse auto, clear `new' *this is a comment `off' //off regress mpg weight `on' //on display "`e(rss)'" `off' //off generate gpm=1/mpg `on' //on *this is GPM summarize gpm `off' //off type out1.log //displays log file before filefilter filefilter out1.log out2.log, from("off'") to(" ") replace filefilter out2.log out3.log, from("`") to(" ") replace filefilter out3.log out4.log, from(" //off") to("") replace filefilter out4.log out5.log, from(".") to("") replace type out5.log //displays log fileFor further help on the above code see:
help macro
help filefilter
help type
The input command (July
2010)
When you're working with a
data management or statistical command in Stata that you have
not previously used, you may not be confident that you are
doing this correctly. So rather then work with the complete
data set it's often useful to make up a small data set that
contains the critical points and run this to see if it is doing
what you had anticipated. Once satisfied you can run this on
the complete data set. For example if I wished to identify the
observations that included the current date and up to 4 days in
advance the following could be used:
clear input /// str15 dates "12/7/2010" "13/7/2010" "14/7/2010" "15/7/2010" "16/7/2010" "17/7/2010" "18/7/2010" "19/7/2010" end list gen date1=1 if inrange(date(dates, "DMY"), date(c(current_date),"DMY"),date(c(current_date),"DMY")+4) list exit
After running the above we see the result
. list ++  dates date1   1.  12/7/2010 .  2.  13/7/2010 .  3.  14/7/2010 1  4.  15/7/2010 1  5.  16/7/2010 1   6.  17/7/2010 1  7.  18/7/2010 1  8.  19/7/2010 .  ++ . exitErrors in logic can now more easily be spotted and you have saved time by not running the complete data set. When this had been satisfactorily run it could be included in the main do file.
For further information on this command see:
help input
For further help on the above code see:
help comments
help date
help dates
help inrange()
help creturn list
Splitting the Do Editor
(June 2010)
In Stata 11 the do editor can be
split, making it easier to do some types of work. To do this
there must be at least two tabs on your do editor. Pull one of
these to the middle of the editor. When a selection box appears
select one and 2 tabbed do editors windows appear.
Pulling a tab to the centre
Factor variables and lincom
to produce a table (May 2010)
Stata 11's factor variable can be combined with lincom to
quickly produce tables.
In this example we look at the table on P226 of
"Statistical Modeling for Biomedical Researchers: A Simple
Introduction to the Analysis of Complex Data, 2nd Edition by
William D. Dupont" (See out bookshop to order).
The data set can be downloaded:
http://biostat.mc.vanderbilt.edu/dupontwd/wddtext/index.html
set more off cd "C:\data\dupont" //if the data is stored in a different directory change this //to where it has been stored use "5.5.EsophagealCa.dta", clear recode tobacco 3=2 4=3, g(smoke) label define q_smoke 1 "09" 2 "1029" 3 ">=30" label value smoke q_smoke logistic cancer i.alcohol i.smoke i.age [fw=patients] forvalues i=1/4 { //alcohol forvalues j=1/3 { //smoke qui: lincom `i'.alcohol + `j'.smoke, or local a`i'`j'=r(estimate) } } local a11=1 decode alcohol, gen(a) contract a keep a matrix aa=( `a11', `a12', `a13' \ `a21', `a22' ,`a23' \ `a31', `a32' ,`a33'\ `a41', `a42' ,`a43') svmat aa rename aa1 Tobacco_0_9 rename aa2 Tobacco_10_29 rename aa3 Tobacco_30 list exitIn the above the forvalue loop gets the different levels of alcohol and smoke. These are then applied to the factor variables in the lincom command. The returned values from lincom are then stored in a Stata matrix; one at a time. After going through all the combination of alcohol and smoke the matrix is then put into Stata and some labels applied.
For more information on the specific commands type help and then the command eg. help lincom
Stata
Graphs (April 2010)
From time
to time Stata is used to produce nonstandard/interesting
graphs. I have compiled some of these graphs. These have mainly
been presented on the Statalist. To see these graphs click here .
This page will be updated from time to time.
To see some of the User written graph commands click here .
(from a previous tip)
Tabdisp (March
2010)
tabdisp is a Stata command that allows you to
display Stata tables. This command allows lots control of the way
that the elements are displayed.
If cell percentages are
required the following can be used:
sysuse auto, clear contract for rep78 list summarize _freq generate percentage=(_freq/r(sum))*100 tabdisp for rep78, cell(percentage) cellwidth(7)
Or if the % symbol is also required:
sysuse auto, clear contract for rep78 list summarize _freq generate percentage=(_freq/r(sum))*100 gen freq=string(percentage, "%5.2f") replace freq=freq + "%" tabdisp for rep78, cell(freq) cellwidth(7)
If the above is what was required then instead the user written program: tab2way or tab3way could be used
There are many other ways to display your data eg. including the words max and min in the table cells
sysuse auto, clear contract for rep78 list sort _freq tostring _freq, gen(freq) replace freq=freq+ " Max" in `=_N' replace freq=freq+ " Min" in `=_n' tabdisp for rep78, cell(freq) cellwidth(7)
For help on the individual commands type help and then the command name.
To download the user written command tab2way or tab3way , type: ssc install tab2way or ssc install tab3way
Tables to spreadsheet
(February 2010)
The tabulate command allows the
values of the table to be save as matrices eg. options for the
tabulate command are: matcell(), matrow() and matcol(). These
matrices can be put into a spreadsheet. The table command however
does not have these matrix options. However, there are
workarounds that make it easy to put the results that the table
command would have given into a spreadsheet. This tip explores a
number of ways that this can be done:
This is the command what we wish use and then get the
resulting table out of Stata and into a spread sheet
sysuse auto, clear //table in offical stata table for rep78 , c(mean price)
The following gives us what we want but does not allow the output to be put into a spreadsheet
sysuse auto, clear collapse (mean) price, by(foreign rep78) list tabdisp foreign rep78 , c(price)
This time getting the table into a Stata data set so it can be exported to a spreadsheet
This method has the advantage that the colum and row labels are also included
sysuse auto, clear collapse (mean) price, by(foreign rep78) list drop if rep78==. reshape wide price, i(foreign) j(rep78) //because the data is in long form it can be reshape // into the required table list outsheet using c:/table, replace //outputting the table to a form that can be read with a spreadsheet
This time using Mata to manipulate the initial data
sysuse auto, clear collapse (mean) price, by(foreign rep78) fillin foreign rep78 drop if rep78==. sort for rep78 list mata: //start of Mata a=st_data(.,.) a s=J(2,6,.) s for(i=1; i<=10; i++) { r=a[i,2] c=a[i,1] s[r+1,c]=a[i,3] }As you can see there are a number of different ways of getting table information out of Stata.
names = st_varname((1..3)) names b2=st_varvaluelabel(names[1,1]) b2 if(b2!="") { zy2=uniqrows(a[.,1]) b3=st_vlmap(b2, zy2) b3 } else { b3=strofreal(uniqrows(a[.,1])) b3 } b2a=st_varvaluelabel(names[1,2]) b2a if(b2a!="") { zy2a=uniqrows(a[.,2]) b3a=st_vlmap(b2a, zy2a) b3a } else { b3a=strofreal(uniqrows(a[.,2])) b3a } table=(""\b3a) ,(b3',"."\strofreal(s)) table mm_outsheet( "c:/table1" ,table, mode="r") //user written program output to a Excel readable file end
For help on the individual commands type help and then the command name. To download the user written command mm_outsheet, type: ssc install moremata
Tables to spreadsheet
(January 2010)
When a large number of tables are
required to be put into a spreadsheet and no use written program
is available to easily do this the following method can be used:
Write a program for the particular table (or any output)
that you require. If there are a number of different tables then
written a program for each type of table.
The program starts a log file and then runs the table
command. It then closes the log file. The log file is then put
through a file filter to remove any unwanted text.
The partially cleaned up log file is then imported into
Stata using the insheet command and then further cleaned
up; removing any unwanted text and then the columns in the table
are split into Stata columns. The extent of the clean up depends
on the desired output.
Having finished the cleaning up, this is either saved or
appended to, using the required program option.
Then you go on to append the next table to the file.
When finished the file containing the tables can be opened in a
spreadsheet
clear programs  //Clears the previous program to allow for modfications. 
//This can be removed when you are happy with the program  
program tables  
version 11.0  
syntax varlist(max=2 min=2) [, append] gen(string)  
tokenize `varlist'  //split varlist 
capture log using a, text replace  
label var `1' `=strtoname("`:var lab `1'' " )'  //combining the label into one word 
label var `2' `=strtoname("`:var lab `2'' " )'  
table `1' `2', stubwidth(40)  
log close  
filefilter a.log a1.log , from("") to("") replace  //deleting unwanted text in the log file 
filefilter a1.log a2.log , from("") to("") replace  
filefilter a2.log a3.log , from("+") to("") replace  
insheet using a3.log, clear  //brings the modified log file into Stata 
drop in 4/1  //get rid of other material 
drop if strpos(v1,"log")  
drop if strpos(v1,"pause")  
drop if strpos(v1,"resumed")  
drop if strpos(v1,"unnamed")  
drop if strpos(v1,":")  
capture drop v2  
split v1  
drop v1  
//additional cleaning up if required  
replace v11=subinstr(v11,"_", " ",. ) in 1/2  
quietly: d  
local a1=round(`r(k)'/2)  
replace v1`a1'=v11[1] in 1  
replace v11="" in 1  
set obs `=_N+2'  //two line space between tables 
if "`append'"!="append" {  
save `gen', replace  
}  
if "`append'"=="append" {  
append using `gen'  
save `gen' , replace  //saves file to hard disc 
}  
end  //end of program 
********************************************* **the commands that calls the above program ********************************************* 

sysuse auto, clear  
set more off  
cd c:/  
tables for rep78, gen(aa)  //1st table 
sysuse auto, clear  //2nd table 
tables rep78 for, append gen(aa)  
sysuse auto, clear  //3rd table 
tables for rep78, append gen(aa)  
list ,noheader sep(0) noobs  
outsheet using c:/aa.csv, comma nonames replace  //saving to disk this can be opened in a spreadsheet 
exit 
For more information see:
see help for the specific command
User written table output commands include:
tabout
logout
esttab
Point Estimates for a
Regression (December 2009)
After
a regression point estimates can be obtained with:
Examples:
sysuse auto, clear
regress
mpg weight
display _b[weight]*3000+_b[_cons]
OR
sysuse auto, clear
regress mpg weight
lincom weight*3000+_cons
OR
sysuse auto, clear
regress mpg weight
//then open the data editor and add an additional observation
for the weight variable eg. 3000
//then run the following
predict a
//then display the point estimate with
the following
display a[_N]
This then displays the point estimate. The last method is
useful when a number of estimates need to be made.
For more information see:
help predict
Doing thing quietly in Stata
(November 2009)
Stata's quietly
command allows commands to be run without outputting to the
results window. This is useful if you only require the returned
results (eg. r(mean) etc see help return list ) and not the
actual output.
Example:
sysuse auto, clear
quietly
summarize mpg, detail
or
quietly:
summarize mpg, detail
Also you can have a block quiet:
sysuse auto, clear
quietly {
local a=r(mean)
summarize price, detail
local a=r(mean)
If you wish to see specific output in a quiet block you can add noisily to this
Example:
sysuse auto, clear
quietly {
local a=r(mean)
noisily summarize price, detail
local a=r(mean)
For more information see:
help quietly
Graphing functions
(October 2009)
The graph histogram command
allows a normal distribution option to be included in this graph.
The twoway graph however does not have this option. However, this
can also be easily done by adding a function graph, as shown in
the following example:
sysuse auto, clear
quietly summarize mpg
twoway (histogram mpg,
bin(10), ) ///
(function y=normalden(x, `r(mean)',
`r(sd)'), range(4 44) xlabel(#10) )
Lots of other functions can be drawn eg.
twoway function t=tden(1, x), range(5 5) xsize(4)
ysize(2) color(blue) ///
lstyle(p1solid)
xlabel(5(1)5) recast(area)  function z=normden(x), range(5
5) ///
color(maroon) lwidth(thick)
twoway
function c=chi2(1,x), range(0 5) xsize(4) ysize(3) yline(.5)
twoway function c=Fden(5, 10, x), range(0 5)
xsize(4) ysize(3) yline(.3)
Stata 11  Variable manager
(September 2009)
Getting variable names into
a do file:
The Stata 11 variable manager makes this easy.
Just highlight the variable name(s) in the variable manager,
right click and then click onto "copy variable list" . Go to the
do Editor and paste where required.
Filtering variable names
On the top left hand
side of the variable manager is the variable filter. Start
typing any part of the variable name in the filter and the
variables that include this text remain in the variable manager
list; the others disappear. This is a great feature for looking
for a particular variable in a large dataset.
For more information:
help varmanage (Access
Stata's PDF manual by clicking on the online help hyperlink: [D]
varmanage )
Getting a Subset of a large
dataset into Stata (August 2009)
The various flavours of Stata have limits on various commands,
label lengths, macro lengths etc. One of the limits is the
maximum number of variables that can be loaded into Stata.
In Stata/IC 11 the limit is set at 2,047 variables
To see the limits of the various flavours of Stata see: help
limits
If your data set contains more than 2047
variables and you do not need all of these in Stata then the
second syntax of Stata's use command can be used to get a
subset of this data set into Stata
help use
use
[varlist] [if] [in] using filename [, clear nolabel]
example:
use mpg using "c:/program
files/stata11/auto", clear
This loads only the mpg
variable into Stata.
If you wish to inspect a dataset in
memory (to see variable names etc.) you can use the second syntax
of Stata's describe command
describe
[varlist] using filename [, file_options]
example:
describe using "c:/program
files/stata11/auto", varlist
return list
Also see:
help memory
Capture
(July 2009)Controlling the unknown
Stata commands that result in an error, issue a non zero return code (_rc). In Stata 10 and Stata 11 the return codes can be seen in the Review Windows (you may need to expand the Reviews window to see the _rc column)
If a command in an do file produces an error the do file will stop. This can be prevented by prefixing the command with the capture command eg.
log close //example 1
capture log close //example 2
In the above example 1, a do file/program would stop running if there was no log file open. Stata requires a log file to be open before it can be closed and no other log file open before it can open a log file.
In example 2, a do file/program would continue to run even if there was no log file open. The capture command allows errors to be ignored.
Apart from preventing a do file/program from stopping, the capture command can also capture a command's return code in _rc. The return code (_rc) can then be used to make a decision in your do file/program.
Example
sysuse auto, clear
tostring mpg, replace //for the purposes of the example convert mpg to a string variable
describe
foreach v of varlist priceforeign {
capture confirm numeric variable `v'
display _rc //allow you to see the return code
if _rc { //if _rc is not 0 (zero) the statement is true and Stata goes into the loop
destring `v',replace
describe `v'
}
}
Also see:
http://www.stata.com/statalist/archive/200906/msg00623.html (An example of how to use a return code to set up the default directory in Stata.)
help confirm
help capture
Transparent Graphs
(June 2009)
Stata graphs can be made transparent in MS Word and other
software. For example the following graph was produced in Stata
and then made transparent in Word.
sysuse auto, clear
twoway ///
(histogram mpg if rep78==3, fcolor(green)) ///
(histogram mpg if rep78==4, fcolor(blue))
graph export c:/hist.wmf, replace
Then in Word 2003
Insert>Picture>from file and then c:/hist.wmf
Click on graph
Edit picture
Right Click on a bar that you wish to make transparent
Format AutoShape>Color and lines tab>Fill section and the move the transparency slider to 50% and press OK
Continue to edit all the bars this way. The legend can also be modified as per above
Save
Also see:
http://www.stata.com/statalist/archive/200904/msg00574.html
http://www.stata.com/statalist/archive/200904/msg00612.html
Getting Stata's Graph editor
commands into Stata graphs (May 2009)
Stata has a great graph editor. However, after you have modified
your graph the editor will not produce the normal Stata code for
this graph. However, it is possible to retrieve the editing
commands if they have been recorded using the Stata graph
editor recorder, adding gr_edit at the start of each
editor line and then adding this to the initial graph code. Now
you have the code to reproduce the graph.
Example:
Assume that you have run the following
sysuse auto, clear
histogram mpg
Then
click on the Start Graph Editor icon and pressed the Start
recording icon. Then altered the color of the histogram bins.
Then stop the recorder and saved the record on the hard disk with
a suitable name and path. Then opened the record (just saved) in
Stata's do editor.
the line:
plotregion1.plot1.style.editstyle
area(shadestyle(color(gs7))) editcopy
was retreived and gr_edit
added to the from of this.
the complete file would
look like:
sysuse auto, clear
histogram
mpg
gr_edit plotregion1.plot1.style.editstyle
area(shadestyle(color(gs7))) editcopy
this is run and will produce the original graph complete
with the edit.
Alternatively you could save the recording and include it
as follows:
sysuse auto, clear
histogram
mpg, play(hist1) //hist1 is the name of the recording
Also see:
http://www.stata.com/statalist/archive/200807/msg00932.html
help graph play
Weaving Stata results into a
Word Report (April 2009)
It is possible
to put results from Stata into a word document by first obtaining
your data in Stata and then using mail merge to get this into
Word.
For example, if you wish to automate you
report writing and required the max. and min. mpg in a Word
report (using the auto.dta data set that comes with Stata ) this
can be done with the following do file: The user written program
moremata is used this must first be installed. To install type
the following on the Stata command line eg.
ssc
install moremata
Once installed run the following
Stata do file is run
********************weaving do
file*********************************
sysuse auto,
clear
*determine max and min mpg
quietly: sum mpg
local max_mpg =r(max)
local min_mpg =r(min)
di
`max_mpg' //only if required to see results in Stata
di
`min_mpg' //only if required to see results in Stata
mata
a="max_mpg"\st_local("max_mpg")
a1="min_mpg"\st_local("min_mpg")
a2=a,a1
a2
mm_outsheet("c:/tips.txt", a2, mode="r")
end
********************weaving do
file*********************************
After running
the above the text file tips.txt is produced (in C:/ drive)
Then in your MS Word report include the following:
The maximum value of mpg is: {MERGEFIELD "max_mpg"}
The
minimum value of mpg is: {MERGEFIELD "min_mpg"}
Open
the data source in Word and then run Mail Merge
After running mail merge your report should look like:
The maximum value of mpg is: 41
The minimum value of mpg
is: 12
Using this method you can include
tables, graphs etc. into your Word document.
References:
http://ideas.repec.org/p/boc/asug05/14.html
Also look at:
findit texdoc
findit esttab
findit estout
Stopping Stata during the
running of a do file (March 2009)
When
running a do file you may wish to inspect the data at various
points. Stata has a number of way of doing this. For example:
Option 1:
Using the edit command.
Opens the data editor and allows you to inspect the data. When
the editor is closed the do file continues to run. (Instead of edit
you could have used browse to open the data browser)
sysuse auto, clear
regress mpg weight
edit //stops Stata and opens the data edit window
summarize
exit
Options 2:
Stopping Stata by using the more
command
sysuse auto, clear
regress
mpg weight
more
summarize
exit
Options 3:
sleep stops Stata for a
specified number of milliseconds
sysuse auto,
clear
regress mpg weight
sleep 1000
//sleep specifies the number of milliseconds to wait
beep
//used to wake you up if the sleep is too long
summarize
exit
Options 4:
exit stops a do file. To
run more of the do file move the exit command down the do
file and run again.
sysuse auto, clear
regress mpg weight
local a 1
exit
//program stop at this point then move to another line and run
again
display `a'
For more information
see:
help edit
help browse
help more
help exit
Putting Greek symbols in graphs
(February 2009)
Greek symbols (or other
symbols) can be added to Stata graphs. To add these you must
first set up your computer for this eg.
In Windows
XP
Click on the start button (bottom left hand side
of screen)
Click on the Control Panel
Click on
Regional and language option
Click on the Advanced
tab
Select Greek (or another language with you
require this)
then click Apply and then OK
(the computer will then be required to be restarted )
then in Stata:
using the pull down menu:
Edit>Preferences>Graph
Preferences
Then font select Arial Greek
To see the numbers used in the extended code you can use
the Nick Cox written graph:
asciiplot
(this is
a user written command and must first be downloaded)
To
download asciiplot type the following on the Stata command line
ssc install asciiplot
then for example, type:
scatter weight mpg, title( Example of Greek characters in a Graph `=char(238)' `=char(243)' `=char(236)' )
Or the Stata graphics Editor can be used to include Greek symbols
For more information see:
Data Management Manual: char(n)
For an article on char()
See http://www.statajournal.com/sjpdf.html?articlenum=dm0006
Doing things by levels of a
variable (January 2009)
Using Stata's bysort
prefix command
bysort is a Stata prefix command that allows you to
execute commands by levels/groups of the variable(s) that you
specify
Example:
If you wanted to generate a
new variable with a 1 at the first occurrence of each level of mpg
the following can be used (using the auto data set that comes
with Stata):
sysuse auto, clear //load the
auto data set into Stata
bysort mpg: gen first=1 if
_n==1
If you wanted to generate a new variable with
a 1 at the last occurrence of a level of mpg the following
can be used:
sysuse auto, clear //load the
auto data set into Stata
bysort mpg: gen last=1 if
_n==_N
Sorting within the group eg. if you wanted
the car with the smallest weight within each level of mpg
the following can be used:
bysort mpg
(weight): gen first_low_weight=1 if _n==1
Note that the brackets around the weight variable
name indicates to Stata that this is not be used as the
level/group criteria but weight is to be sorted within
each level of mpg
For more information see:
Stata 10 Data Management manual
Online help
bysort
Online help for other prefix commands: help
prefix
help _n
help _N
Automation of Tables in Stata
(December 2008)
A tutorial showing different options for the automatic
production of tables can be obtained by the following commands:
ssc install tabletutorial
to install and then
help tabletutorial
Memory usage in Stata (November
2008)
Stata generally stores all of the dataset that it is
working with, in the computer's memory. Therefore, the computer
should have sufficient RAM to load all of the data. Storing data
in memory allows fast access to the data. If the computer has
insufficient memory and the operating system allows, the data is
stored on the computer hard disk, however this can be very slow
ie. Stata uses virtual memory where the operating system allows
Stata assigns an amount of memory for it's self so that it
can store the data in RAM, so whatever this is set to must be
sufficient to store the entire data set. The memory settings in
Stata can be changed to allow sufficient memory for the data set.
What is sufficient memory?
To determine this
the online calculator can be used
online calculator
A quick way
to determine the average width of the variable ( bytes) is as
follows:
(type the following on the command line or into a
do file:)
describe
display r(width)/r(k)
Then put this number (average variable width) into the online
calculator. The result from the online calculator is the minimum
memory required so allow 3050% more then this for additional
variables etc.
then set the memory using the set
memory command eg. set memory 50m
Other
useful Stata memory commands are:
compress
memory
References
http://www.stata.com/statalist/archive/200507/msg00348.html
Stata Comment (October
2008)
Stata has a number of ways of adding comments to
Stata code. Some of these are:
*
The star at the start of a line tells
Stata to ignore what follows eg.
*this is ignored
/* */
The /* */ are used to add
comments between code eg.
regress mpg /* weight is
the independent variable */weight
or /* */ can be
used to concatenate two lines of code eg.
twoway
scatter (mpg weight) /*
*/ (lfitci mpg weight)
///
Stata ignores what is after /// and
continues on the next line eg.
regress mpg ///
dependent variable
weight
The #delim ; command is useful in a number
of ways. One use is to comment out blocks of code/text eg.
#delimit ;
*
display "this is a
comment"
display "this is a comment"
display "this
is a comment"
*;
#delimit cr
di "this
is the end"
exit
the lines between #delimit; and #delimit cr
are ignored
For further information on comments
type help comments
For further information on
#delimit type help delimit
Stata user written graphs
(September 2008)
New graphs have been added.
Apart from the official Stata graphs many users have
written special graph commands and have made these available for
download.
To see just some of these click
More
user written graph will be added in the future.
To see how
these have been written (the code) use Stata's viewsource
command.
Stata tables (August
2008)
If you required a table of cell percentages you could:
sysuse auto, clear
svyset rep78
svy:tab rep78
for, per
An easier way is to use Philip
Ryan's user written command tab2way:
tab2way
rep78 for, cellpct
this command also has lots of other options. To download
this type the following on the Stata commandline:
ssc
install tab2way
Stata users have written many
commands for tables. To see a list of some of them type the
following on Stata command line (when online):
findit
tab table
Then to download click on the hyperlink and
follow the instructions
Sending Command(s) to the Stata Do
Editor from the Stata Review Window (July 2008)
While running Stata interactively, either with dialogue
boxes or from the command line the command(s) that you issue to
Stata are recorded in the Review Window. These commands can be
put directly into the Do Editor for rerunning a session of Stata
again, modifying the commands and rerunning or as a record of the
analysis.
Putting the contents or some of the contents of
the Review Window into the Do Editor can be done as follows:
In the Review Windows selecting the command(s) that you
wish to go into the Do editor by:
 Clicking on the command; if a single command is required
 If more than one command is required. Holding down the shift key and select the commands
 If all the commands that are currently in the Review Window are required then right click and and press select all
Then: Right clicking the mouse button and selecting send to dofile editor
The Do editor will then open with the highlighted command(s)in it. To run this using the Do Editor pulldown memu select: Tools>Do or using the icon (in Stata 10 this is the icon on the far right) or save this file and run from the command line eg. Saving this as c:/dofile and run by typing do c:/dofile on the Stata commandline.
For more details the Stata command type the following on the Stata commandline:
help do
Creating a Stata dataset from multiple
Excel worksheets (June 2008)
There are a
number of ways of doing this:
 odbc
 Stat/transfer
 Stata with the append command
In this example the Excel file is called book2 and is in c:/ drive. The file has two work sheets: kk1 and kk2
odbc
clear
tempfile kka
odbc load, dsn("Excel Files;DBQ=c:\book2.xls") table("kk1$")
save `kka'
list
clear
odbc load, dsn("Excel Files;DBQ=c:\book2.xls") table("kk2$")
list
append using `kka'
list
exit
Also see:
http://www.ats.ucla.edu:80/stat/stata/faq/odbc.htm
Using Stat/Transfer 9
With Stat/Transfer this would be done as follows:
Open tab: option 3
And then tick "concatenate worksheet pages"
Stata with the append command
Save each Excel worksheet as a csv in Excel. In this example c:/book2_kk1.csv and c:/book2_kk2.csv are the two files created
insheet using c:/book2_kk1.csv, clear
save c:/book2_kk1
list
clear
insheet using c:/book2_kk2.csv, clear
list
append using c:/book2_kk1
list
For more details the Stata commands type the following on the Stata commandline:
help append
help insheet
catplot (May 2008)
Nick Cox has written a useful graph command (catplot) that graphs
categorical variables. This user written program can be
downloaded for free.
To download this:
On the
Stata commandline window type:
findit catplot
then click on the hyperlink
catplot from
http://fmwww.bc.edu/RePEc/bocode/c
and then follow
instructions
If the catplot command didn't
exist and you wanted to produce a bar plot of the frequencies of
the categories of rep78 then you would have to do something like:
sysuse auto, clear
tab rep78, g(z)
graph hbar (sum) z* , bargap(13) asc ///
yvaroptions(relabel(1 "1" 2 "2" 3 "3" 4 "4" 5 "5"))
With catplot this is make easier with the
following:
sysuse auto, clear
catplot
hbar rep78
For more details on
catplot see the online help help catplot (once installed)
For other graphs that Nick Cox has written see:
http://www.ats.ucla.edu/stat/Stata/faq/graph/njcplot.htm
Stata Users' Group Meeting Proceedings
(April 2008)
Material documenting the
Stata Users' group meetings is worth looking through. It contains
articles on a large number of topics.
The list of
Stata Users' Group meetings can be found at:
http://www.stata.com/meeting/proceedings.html
For example if you're not sure what regular expressions
are, then have a look at:
http://ideas.repec.org/s/boc/wsug07.html
Or the
following may be of interest:
Panel data methods for
microeconometrics using Stata
http://repec.org/wcsug2007/cameronwcsug.pdf
Powerful new tools for time series analysis may be of
interest:
http://repec.org/nasug2007/StataTS07.beamer.7727.pdf
Interested in Stata and genetics? Then have a look at:
A brief introduction to genetic epidemiology using Stata
http://repec.org/usug2007/slides_nshephard.pdf
There is lots more.
Programming Stata  learning by
examples (March 2008)
Below are a number
of do files that can be run in Stata thus allowing you to see how
Stata programming works. By seeing the input and the ouput you
can learn some of the basics of the Stata programming language
(for the finer points refer to the Stata manuals). By learning
some programming, Stata can be used more efficiently eg. the use
of macro rather than typing in the same thing again and again. To
use the tutorial:
Tutorial 1  do files
Tutorial 2  macros
Tutorial 3  loops
Tutorial 4  if statement
Tutorial 5  incrementing, _n, and _N
Tutorial 6  local extended macros
More tutorials will be added in the following weeks
Also see:
Stata 10 Users Guide
Stata 10 Programming Manual
The Stata Journal (2005) Nicholas J. Cox "Suggestions on Stata programming style" 5, Number 4, pp. 560566
Nicholas J. Cox The Stata Journal (2002) Nicholas J. Cox "How to face lists with fortitude" 2, Number 2: pp. 202222 click here
Nicholas J. Cox Stata Netcourse NC151 "Introduction to Stata programming"
Stata Netcourse NC152 "Advanced Stata programming"
(Back issues of the Stata journal can be purchase from Survey Design and Analysis  contact details below)
(To enroll in a Stata Netcourse please contact us)
Mata  learning by
examples (February 2008)
Mata is a Stata matrix programming language. The advantage of
Mata is that it is fast and for some problems the solution to
these is easier in Mata.
The Mata manuals are very useful
for learning Mata. To complement the manuals attached are some
Mata tutorials.
The tutorials are a series of examples in a
do file. To use the tutorial:
Tutorial 1 Getting Data in Mata
Tutorial 2 Looping, If statement and examples
Tutorial 3 Subscripting matrics
Tutorial 4 string and numerical matrices, getting a mata matrix into Stata
Tutorial 5 Mata functions
Tutorial 6 Mata pointers and Mata optimize
Tutorial 7 Mata matrix maths and Solving simultaneous equation
Also see:
Stata 10 Mata manuals (The entire Mata manual can be found in Stata's online help for Mata eg. help Mata
The Stata Journal (2007) William Gould (2004) "Mata Matters: Structures", 7, Number 4, pp. 556 – 570
The Stata Journal (2007) William Gould (2004) "Mata Matters: Subscripting", 7, Number 1, pp. 106 – 116
The Stata Journal (2006) William Gould (2004) "Mata Matters: Precision", 6, Number 4, pp. 550 – 560
The Stata Journal (2006) William Gould (2004) "Mata Matters: Interactive use", 6, Number 3, pp. 387 – 396
The Stata Journal (2006) William Gould (2004) "Mata Matters: Creating new variables–sounds boring, isn't", 6, Number 1, pp. 112 – 123
The Stata Journal (2005) William Gould (2004) "Mata Matters: Using views onto the data", 5, Number 4, pp. 567 – 573
The Stata Journal (2005) William Gould (2004) "Mata matters: Translating Fortran.", 5, Number 3, pp. 421 – 441
(Back issues of the Stata journal can be purchase from Survey Design and Analysis  contact details below)
Stata's display
command (January 2008)
Stata's display command is useful for writing to the Stata
results window or using it as an online calculator
The display command has features that allow various types of
output and the tools to format and enhance these.
Controlling the color of the output
Example:
display as text "green" as error " red" as
result " yellow" as input " white"
(text, error,
result and input are styles)
Controlling
where the text is placed
Example:
display _column(50) "column"
Including smcl (smcl is Stata's mark up and
control language)
Examples:
display
"{center: this}"
display "{hline}"
Formating
Example:
display %9.5f 9
Stata's system values (type creturn list to see
these)
Example:
display ("$S_DATE")
display
c(current_date)
Link to Stata 11 PDF manual (New for Stata
11)
Handy for passing on a reference to a specific topic in the PDF
manuals
Input the following commands and then click on the
hyperlink in the Results windows.
Examples:
display
in smcl "{manpage GSM 141} {hline 2} starting MAC"
display in smcl "{manlink R regress} {hline 2} Linear
regression"
Also see:
Stata 10 programming manual display
Stata 10 programming manual smcl
Stata Journal Ryan,
Philip (2004) "Stata tip 4: Using display as an online
calculator", 4:1 Page 93.
In Mata: display()
Creating a binary
variable from a continuous variable
(December 2007)
On way of creating a binary
variable is to generate a new variable containing 0 and then
replace the contents of the variable with 1 based on a qualifier
eg.
generate dummy1=0
replace dummy1=1 if mpg <=25
generate dummy1= mpg <=25
this works because mpg <=25 is either true or false. Stata qualifiers evaluates to 1 if true and 0 if false.
If the variable that is part of the qualifier contains missing values then include the if condition: !=missing() eg.
generate dummy1= mpg <=25 if !=missing(mpg)
Other ways of creating dummy variables can be found at:Stata FAQ
Also see: What is true and false in Stata?
New subcommand for
listing user written commands (November
2007)
Many user written commands are stored in the
SSC (Statistical Software Components) archive. In the lastest ado
update for Stata 10 a new subcommand has been added to scc:
ssc whatshot
The syntax is: ssc whatshot [, n(#) author(name)]
Examples:
whatshot
whatshot, author(cox)
To get these commands you
need to update Stata. To do this with the pull down menu:
Help>Official Updates and then click
on www.stata.com. Then follow
instructions.
For more information on SCC type help
scc on the Stata commandline
Undocumented commands
(October 2007)
In
addition to the commands found in the Stata manual there are also
undocumented commands that you may find useful. To see these type
help undocumented.
A commands that you may find
useful is: twoway__histogram_gen
This command
generates coordinates of the bars in a histogram. An examples of
how it works is:
sysuse auto, clear
twoway__histogram_gen mpg , fraction gen(h x)
l h x in
1/20
twoway (scatter h x) (histogram mpg, fraction)
tab x h
exit
Another commands that you may find useful is
the matalabel
This command generates 3 matrices in
mata, one for each of: value label name, value and the label
An examples of how it works is:
sysuse
auto, clear
matalabel , generate("a" "b" "c")
mata
a
b
c
mata describe
vallab=(a,c)
vallab
b
end
User written program 
examples (September 2007)
When learning new commands in Stata it is often useful to
have examples of how the syntax is applied. Stata's documentation
includes many examples and allows you to downloaded data sets for
these (File/Example datasets), thus allowing you to reproduce the
results. Also, Stata's online help includes many examples.
Another useful source of examples is Nick Cox's examples
user written program
An example of some of what you get by
typing examples egen
Setup
. sysuse auto, clear
Create highrep78 containing the value of rep78 if rep78 is equal to 3, 4, or 5, otherwise highrep78 contains missing (.)
. egen highrep78 = anyvalue(rep78), v(3/5)
List the result
. list rep78 highrep78
To see a description of examples type the following on the Stata command line when online
ssc describe examples
To install examples, type the following on the Stata command line when online
ssc install examples
Settings for Stata (August 2007)
Various
features of Stata can be set to individual preferences or changed
to meet the requirements for a particular analysis.
To see
what can be set type query on the Stata command line
Amongst the things that can be set (in Stata 10) is
whether or not you would like graphs tabbed on the graph window
or each open graph in a separate graph window.
The syntax
for this command is:
set autotabgraphs on ,
permanently
Other set commands that you are
likely to use are:
set more off
set
memory
For more information see the Stata 10
reference manuals
[R] query  Display system
parameters
[R] set  Overview of system parameters
Copy as picture 
Copying from the results windows to Word and Excel (July 2007)
Stata 10 has a
copy feature that allows you to copy highlighted parts of the
results windows to Word, Excel and other packages, as a picture.
To use this, highlight what you want copied in the results
window, right click the mouse button and click on to "Copy as
Picture". Then paste into another package. In the other package
this can usually be cropped and edited in the normal way.
Estout  Stata
Regression Tables (June 2007)
Estout is a useful user written command for outputing
regression results in various forms. For more information you can
see the estout web site go here
Adoupdate (May 2007)
The commands
under update are useful for keeping Stata's executable
file and the official Stata ado files up to date (see help
update). However, these do not keep the user written ado files up
to date. (user written programs that you have downloaded). To
ensure that you are working with the latest version of a user
written ado file type adoupdate on the Stata command line
or using the pulldown menu help/SJ and user written programs and
then click on update. (You must be online to use this command)
For more information on the adoupdate command see help
adoupdate.
Also, see update
Nested Do file (April 2007)
Stata allows
you to break up your analysis in to logical sections; each part
being a separate do file, with all the parts of the analysis
contained in one do file. eg.
**master****the
do files below are contained in do file that you have name
master.do (can be called any other name)
.
.
.
do projA_data
do projA_error_checking
run
projA_data_man
if M1==2 {
do projA _A1 // projA_A1
exits finishes analysis
}
do projA_results
exit
**master*************
In fact Stata allows nesting up to a depth of 64.
eg. a do file calls another do file which calls another do file;
up to 64 times.
Nesting do files has some
advantages:
 Being able to reuse do files (that you have previously
used an know that have no bugs) for other projects
 Stata doesn't have a "goto line X" command. However if
you break down your analysis into do files the same thing can be
achieved.
 Allows an quick overall view of the analysis
 Easier to debug smaller do files than large do files
 Some do files can be run (no output to the
screen) and others can you can do (output to the screen).
This is easier then using the Stata's quiet command
Disadvantages:
 More files to manage
To learn more see: Stata 9 Users guide [U] 16.2 and [U]16.6.2
Max. depth of nested do files, in Stata type help limits
Personal help file (March 2007)
Stata comes
with help files for it's commands. However you may wish to
compile a list of frequently used, but hard to remember commands
in your personal help file
Your own help file is
saved as file with hlp extension eg. me.hlp on the adopath
An example of a help file is as follows:
{smcl}
{* 03may2005}{...}
{cmd:help Joe Blow } {right:updated 1
March 2007}
{hline}
{title:Wildcards and symbols}
{p2col :{helpb comments:comments} *, ?, ///, etc.}{p_end}
{hi:Wildcard zero or more} * or ~
{hi:Wildcard one
character} ?
{hi:Continue onto next line} ///
{hi:Commment out
line} * at beginning or /// midline
To learn more
about smcl see the Stata Users Guide or look at a Stata help file
(.hlp extension) in the do file editor.
spmap  Visualization
of Spatial Data (February 2007)
spmap is a user written command that can be
down loaded for free.
To download:
Make sure
that you are online.
Type findit spmap
Then
click on the hyperlink.
Once installed type help
spmap to see the help file. At the bottom of the help file there
are examples of what can be done. Click the hyperlink to see the
graphs.
Here are some of the examples.
stcmd  Using
Stat/Transfer within Stata (January
2007)
The user written command stcmd can be
used within Stata to change the data format of data sets stored
on disk. stcmd uses Stat/Transfer to do this.
To use this command you must first have Stat/Transfer and stcmd
installed
To get Stat/Transfer contact Survey Design and
Analysis Services (details below).
To get stcmd type
findit stcmd in the Stata command Window and follow
instruction to install the program
Examples
Using stcmd to convert a Stata data set to
Excel
stcmd "c:\Program Files\Stata9\auto.dta"
c:/auto.xls, replace
Using stcmd to
convert a Stata data set to SPSS
stcmd "c:\Program Files\Stata9\auto.dta"
c:/auto.sav
Using stcmd to converting
many files from Excel to Stata
stcmd mat*.xls *.dta.
For more
information see help stcmd (stcmd must first be installed)
Also see fdasave for another way of changing the Stata
data format to SAS
encode (December 2006)
encode
is a useful command for converting strings to numbers. encode
does this in alphabetical order eg.
With the
following dataset
var1
a
b
c
encode var1, gen(var1a)
Var1

Var1a

a

1

b

2

c

3

(Note: when Stata encodes it produces a value label: to see this type label list )
If this is not the encoding that you require a way around this is to define a value label first and then use the label options for encode.
Eg.
If you have:
var1
a
b
c
But would like var1a encode like:
var1a
3
1
2
You would first define the value label eg.
label define preference1 a 3 b 1 c 2
And then applying this using the encode command
encode var1, label(preference1) gen(var1a)
Resulting in:
Var1

Var1a

a

3

b

1

c

2

The code to run the above:
clear
input str1 var1
a
b
c
end
label define preference1 3 a 1 b 2 c
encode var1, label(preference1) gen(var1a)
label list
list, nolab
For more information see:
Stata 9 Data Management manual
kdensity (November 2006)
One of the
problems with combining a number of histograms is that, generally
where there are more than 3, the graph becomes unreadable. kdensity
may be an a solution to this problem.
sysuse
auto, clear
twoway (kdensity mpg if rep78==1, color(red))
///
(kdensity mpg if rep78==2, color(blue) ) ///
(kdensity mpg if rep78==3, color(black)) ///
(kdensity mpg
if rep78==4, width(1)color(green) ) ///
(kdensity mpg if
rep78==5, color(purple)) , ///
legend(label( 1 mpg at
rep78=1) ///
label( 2 mpg at rep78=2) ///
label( 3
mpg at rep78=3) ///
label( 4 mpg at rep78=4) ///
label( 5 mpg at rep78=5))
For more
information see:
Stata 9 graphics manual
A Visual
Guide to Stata Graphic by Michael Mitchael
Stata Journal
Vol 3 No. 2
Intermediate graph
commands (October 2006)
Graphs cannot always be combined; even with the addplot
option. However, you can still get combined graphs by using the
pci , twoway scatteri and twoway pcarrow commands. For example if
you wished to add a box plot to a scatter plot this could be
achieved with the aid of the pci command and a twoway scatter. sysuse
auto, clear
qui sum mpg, detail
local a=
r(p25)
local b= r(p75)
local c=r(p50)
local
uav=`b' 1.5*(`a'`b')
local lav=`b'+ 1.5*(`a'`b')
twoway (scatter mpg weight) ///
(pci `a' 3000 `b' 3000,
lcolor(red)) ///
(pci `a' 3400 `b' 3400, lcolor(red)) ///
(pci `a' 3000 `a' 3400, lcolor(red)) ///
(pci `b'
3000 `b' 3400, lcolor(red)) ///
(pci `c' 3000 `c' 3400,
lcolor(red)) ///
(pci `uav' 3200 `b' 3200, lcolor(red))
///
(pci `uav' 3150 `uav' 3250, lcolor(red)) ///
(pci `lav' 3200 `a' 3200, lcolor(red)) ///
(pci `lav' 3150
`lav' 3250, lcolor(red)) ///
(scatteri `c' 3450
"Median",mlabangle(45) mlabsize(8)) ///
, legend(off)
For more information see:
Stata
tip 21 SJ 5 No. 2 pp282284
Stata 9 graphics manual
MATA (September 2006)
Mata is
Stata 9's new matrix programming language.
If you haven't
had a look at Mata yet, then here are some examples of what you
can use it for:
Example 1
Sorting rows
in alphabetical order (statalistdigest V4 #2451)
(the user
written program moremata must first be installed)
clear
input str20 x1 str20 x2 str20 x3
"massagli,mark" "wood,j." "dessent,harold"
"beletz,elaine"
"carter,annie" "curtis,barbara"
"bradshaw,joe"
"brown,arnold" "dunaway,lowell"
"schneider,mark"
"mullins,bobby" "sump,lawrence"
end
list
tempfile foo
mata
C= J(3,1,"")
//creates a new vector
A =
st_sdata(.,.)' // a transpose view of the data
in stata
for (i = 1; i <=cols(A); i++) {
A
= sort(A,i)
C = C,A[.,i]
}
C=C[.,(2::cols(A)+1)]'
mm_outsheet("`foo'",C, mode="r")
end
insheet using `foo', clear tab
list
Example 2
xpose using mata (statalistdigest
V4 #2328)
(the user written program moremata must first be
installed)
clear
tempfile tmp1
input str16 v1 str2 v2 str2 v3 str2 v4
"Sex" M M M
"Age" 47 66 56 "Left eye"
"Right eye" Y Y Y
"Lower
eyelid" Y Y Y
"Upper eyelid"
"Lateral canthus"
"Medial canthus" Y Y
"Recurrent lesion"
"Primary
lesion" Y Y Y
end
list
mata
A = st_sdata(.,.)'
mm_outsheet("`tmp1'",A, mode="r")
end
insheet using "`tmp1'",clear
l, ab(15)
noobs
Mata can do much more. To learn
more see:
Translating Fortran
SJ 5(3), 3rd quarter
2005, 421  441
Using views onto data
SJ 5(4), 4th quarter 2005, 567  573
Creating
new variables (Sounds boring, isn't)
SJ 6(1), 1st quarter
2006, 112  123
Interactive use
SJ
6(3), 3rd quarter 2006, 387 — 396
Various responses
on Statalist
Mata Stata 9 reference Manual
numlabel (August 2006)
numlabel
is a command that prefixes numeric values to value labels.
Without numlabel
numlabel , add //adding numlabel
For more information see help numlabel and the Stata
9 Data Management Manual
viewsource (July 2006)
viewsource is
a command that allows a file located on the adopath to be viewed
in the Stata viewer.
Example: To view the code for the t
test type viewsource ttest.ado
For
more information see help viewsource and the Stata 9
Programming Manual
datasignature  Determine
whether data have changed (June
2006)
If you have updated Stata 9 to the latest
update (17 May 2006) you will find that a new Stata command has
been added: datasignature. (to find out what has been added with
the update, using the pulldown menu: Help>what's new or type
whatsnew on the Stata command line)
Datasignature
give a number based on the following:
1. The number of
observations and number of variables in the data.
2. The
values of the variables.
3. The names of the variables.
4. The order in which the variables occur in the dataset if
varlist is not specified, or in varlist if it is.
5. The
storage formats of the individual variables.
Datasignature can be used for the following:
Examples of interactive use
1. checking with previous
datasignature to see if the data has changed.
2. checking
if you are working with the same dataset as your colleges.
For more information see help datasignature
Simple Thematic Mapping(May
2006)
tmap is a user written Stata program that
allows you to map your data.
For more information on using
tmap see:
FAQ
Stata Journal Vol 4  No 4
Some shape data sources for Australia:
AEC
VDS
Technologies
Maps based on postcode can be purchased.
I mapped the Victoria electoral map using the following
for actual population. Other maps can be generated by adding your
own data and then mapping this.
*start do
file
clear
cd
"C:\ASTATA INFO\learning\tmap" //where the data has been
downloaded to
set matsize 3000
mif2dta
VIC20030129_elb, genid(id)
use VIC20030129_elbdatabase
describe
tmap choropleth actual, id(id)
map("VIC20030129_elbCoordinates.dta") palette(Reds)
exit
data downloaded from
http://www.aec.gov.au/_content/Who/profiles/gis/gis_datadownload.htm
*end do
file
Separate
(April 2006)
A useful Stata command for creating
separate new variables based on either an expression or a
variable is separate
Eg.
separate mpg,
by(mpg>20)
will create 2 new variables one being
mpg<=20 and the other mpg>20
other examples
are:
separate mpg, by(mpg)
separate
mpg, by(mpg) gen(MPG)
For more information on the
separate command see the Stata 9 Data Management manual or online
by typing help separate.
Macro Expressions (March 2006)
Rather than
typing
local a=r(N)
forvalues i = 1/`a' {
}
The above code can be reduced to
forvalues i
= 1/`= r(N)' {
}
For more information on macro expressions see the Stata 9 Users
guide [U] 18.3.8
inlist() & inrange()
functions (Febuarary 2006)
Stata has many functions that make using Stata easier. Eg.
count if mpg==22  mpg==25  mpg==34  mpg==45
can be written as:
count if inlist(mpg,22,25,34,45)
Another function is inrange() eg.
count if
inrange(mpg, 23,34)
Instead of
count if mpg>=23 &
mpg<=34
These functions can be used after the if
qualifier with commands such as generate, list, summarize etc.,
or after assert,
Examples:
assert
inlist(mpg,22,25,34,425)
generate mpg1=mpg if
inlist(mpg,22,25,34,45)
list mpg if
inlist(mpg,22,25,34,45)
list mpg if
inlist(mpg,22,25,34,45)  inlist(mpg,15,26,35,55) ///use 2
inlist functions when the list exceeds the max. allowed for 1
function
set trace on (January 2006)
local all `"`all'
`"`=`v'[`i']'"'"'
set trace off
with our data a section of the trace will look like this:

local all `"`all' `"`=`v'[`i']'"'"'
= local all `" `"Volvo
260"' `"11995"' `"17"' `"5"'"'
 set trace off
The first line is the line being executed. It has a  in front of it to indicate it is being executed. The second line is after macro substitution has occurred. It has a = in front of it to indicate that substitution has occurred.
See also the user written command: trTo install this command: ssc install tr
For more information on trace see the Stata 9 programming manual. Also see the Stata command pause.
Dofile Editor(December 2005)
When typing
commands in the Stata DoEditor, individual commands or a
selection of commands can be run by highlighting the section that
you would like to run and then pressing the do icon. This allows
you to try out your file section by section.
Regular Expressions(November 2005)
Regular expressions allow the matching of complex text patterns.
Regular expression commands have been included in Stata 9 with
the commands:
regexm  regular expression match
regexs  return nth subexpression from match
regexr
 replace match expression with new string
For example
In the following example if you wish to have
the day as a separate variable in the following data set:
clear input /// str25 date "12jan2003" "1april1995" "17may1977" "02september2000" end listThe following could the used:
gen day=regexs(1) if regexm(date, "(^[09]+)")
breaking this down:
regexm: match expression
^: start at the beginning (LHS)of the string
[09]: the first character to be any numbers 0 to 9
+: one or more of the previous ie. characters between 0 and 9 (stops when a letter comes up eg. j of jan)
( ): the brackets around indicates the subexpression. In this case there is only one group hence regexs uses 1
regexs(): returns subexpression ie. first subexpression
The ouput from the regular expression:
. gen day=regexs(1) if regexm(date, "(^[09]+)") . list ++  date day   1.  12jan2003 12  2.  1april1995 1  4.  02september2000 02  ++
Another example:
We have some text that includes citations. We wish to create a new variable that contains the text of the last citation. In this case the last citation is not at the end of the text so it is useful to reverve the text and then look for the desired pattern.
clear input /// id str200 cit_1 1 "EP696218A  WO9215370A SUND _SUNDIndividual_" 2 "WO9425112A  GB298635A" 3 "EP578126A  CH180906A AGE_OK" 4 "EP562128A  DE1684639A" 5 "WO9318277A  DK137935B" 6 "US4434855A SEC OF NAVY _USNA_" end list gen kk1=reverse(regexs(1)) if regexm(reverse(cit_1), "([AZ][][09]+[AZ]+)") listFor a FAQ on regular expression go here
Text Editors(October 2005)
The text editor that
comes with Stata is fine for small programs. However, as the size
of the program increases other text editors are often used to
make programming easier. For a discussion on various text editors
go here
The function sum() (September 2005)
sum(x) returns the running sum of x. A basic use of sum() would be: generate running_tot =sum(1)
Another example of the use of sum() is: given the data
below you need to create a new
variable that starts with
zero and goes to zero for changes in id and increases by 1 for
changes in var2.
id var2
1 71 7
1 7
1 7
1 7
1 7
1 8
1 8
2 8
2 8
2 1
bysort id: gen running _tot=sum(var2[_n]!=var2[_n1])
further information can be found by typing help sum() on the Stata command line
clonevar (July 2005)
Stata 9 has a useful commands that generates an exact copy of an existing variable.
eg clonevar MPG=mpg
for more information see help clonevar
Docking Stata 9 Windows (June 2005)
The UCLA site has much
useful information on using Stata. If you are new to Stata 9 then
the movie on docking windows will be useful. To see this go here
Getting the path and file
name onto the Stata command line (March
2005)
Stata 8 has a handy way of getting files names
complete with the path onto the command line. Rather than typing
folders, sub folders, and file name use the pull down menu
File/Filename, click onto the required file and path and file
name will be shown on the command line; enclosed in quotation
marks. This is particularly handy when the path consists of many
sub directories with long names. You can then add commands such
as cd, use to the command line.
Tabout  a user
written command (February 2005)
tabout produces publication quality tables from Stata, with the
output exported to a text file. It can be exported as
tabdelimited, html code or LaTeX/TeX code. tabout provides
extensive user control over formating of data and labels and
generates table headers automatically
ssc install tabout
(or: ssc install tabout, replace).
To make learning the syntax easy, an example file which can be used as a tutorial is available here
window command (Feburary 2005)
The window command can be be useful for adding your frequently used commands to the pull down menu, pushing commands to the review window and displaying the current file in the top left hand corner of the Stata window and a lot more.
To have your current file name displayed on the Stata window you can add the following to your do file:
window manage maintitle "`c(filename)'"
See your programming manual for further details on the window
command
ds  Describing Variables and Saving Results (January 2004)
ds lists the variable names of the dataset currently in
memory in a compact form. The command is useful if you require a
list of variables that satisfies certain criteria. The list that
results is saved in r(varlist) which can be used in other
commands eg.
(Using the auto dataset supplied with Stata)
use "c:/stata8/auto.dta", clear
ds, not(vall origin)
list `r(varlist)'
ds m*
list `r(varlist)'
See describe in the Stata reference manual for more
details.
Also see statalistdigest V4 #1701 & #1607
WORKING IN ROWS (December 2004)
The egen command has a number of functions that make it easier to work with data in rows. Rather than using xpose or reshape to convert the data to columns these commands may be able to be used.
Egen's row functions"
rfirst(varlist)
may not be combined with by. It
gives the first nonmissing value in
varlist for each
observation (row). If all values in varlist are
missing for
an observation, newvar is set to missing.
rlast(varlist)
may not be combined with by. It gives
the last nonmissing value in
varlist for each observation
(row). If all values in varlist are
missing for an
observation, newvar is set to missing.
rmax(varlist)
may not be combined with by. It gives
the maximum value (ignoring
missing values) in varlist for
each observation (row). If all values in
varlist are
missing for an observation, newvar is set to missing.
rmean(varlist)
may not be combined with by. It
creates the (row) means of the
variables in varlist,
ignoring missing values; for example, if three
variables
are specified and, in some observations, one of the variables
is missing, in those observations newvar will contain the mean of
the
two variables that do exist. Other observations will
contain the mean
of all three variables. Where none of the
variables exist, newvar is
set to missing.
rmin(varlist)
may not be combined with by. It gives
the minimum value in varlist for
each observation (row). If
all values in varlist are missing for an
observation,
newvar is set to missing.
rmiss(varlist)
may not be combined with by. It gives
the number of missing variables
in varlist for each
observation (row). String variables  if specified
 are
counted as containing missing when their value is ""; numeric
variables are counted as containing missing when their value is
system
missing (.) or extended missing (.a, ..., .z).
robs(varlist) [, strok]
may not be combined with by.
It gives the number of nonmissing values
in varlist for
each observation (row)  this is the value used by
rmean()
for the denominator in the mean calculation.
String variables may not be specified unless option strok is also
specified. If strok is specified, string variables will be
counted as
containing missing values when they contain "";
numeric variables will
be counted as containing missing
when their value is ., as usual.
rsd(varlist)
may not be combined with by. It creates
the (row) standard deviations
of the variables in varlist,
ignoring missing values. Also see rmean().
rsum(varlist)
may not be combined with by. It
creates the (row) sum of the variables
in varlist, treating
missing as 0.
A REMINDER TO START A LOG (November 2004)
Would you like to be reminded to start a log each time that you start Stata.
One way of doing is this is to include the command below in your profile.do file
db log
For information on profile see the GETTING STARTED MANUAL  More on starting and stopping Stata
Version Control (October 2004)
PROBLEM: Stata is continually being improved, meaning programs and dofiles written for older versions might stop working.
SOLUTION: Specify the version of Stata you are using at the top of programs and dofiles that you write:
 myprog.do 
version 8.2
use mydata, clear
regress ......
 myprog.do 
 example.ado 
program myprog
version 8.2
...
end
 example.ado 
For further information see the Stata programming manual
Assert (September 2004)
Assert is a useful command for verifying your data. e.g..
assert sex=="Male"  sex=="Female"
assert mpg<50 & mpg>10
Also see Stata reference manual for further information.