ProductsPricesAustraliaNew ZealandSupportOrder form Contact us 
Tips for using StataThis document describes some tips to enhance your efficient use of Stata. We will keep adding tips to the top of our home page to encourage you to visit it each month! We will move the monthly tips to the bottom of this page when we place new tips on our home page. Our bookshop has several publications to assist in learning Stata data management and analyses. Options Edit and browse Do Editor Folders Statalist The Stata Journal and Stata Technical Bulletin (STB) Useful links The Stata resources web page is worth a look at. It has links to free downloadable tutorial etc. http://www.stata.com/links/resources1.html UCLA graphics page using Stata may also be of interest: http://www.ats.ucla.edu/stat/stata/Library/GraphExamples/default.htm Setting up docked windows (Great for learning to set up Stata 9 windows) http://www.ats.ucla.edu/stat/stata/faq/stata9gui/dockfloatpin.html These are tutorials for learning Stata http://data.princeton.edu/stata/ http://dss.princeton.edu/online_help/stats_packages/stata/ Working with Hilda data  PanelWhiz http://www.panelwhiz.eu Stata Programming http://www.stata.com/meeting/11uk/baum.pdf Tips from our home page The following tips were initially presented on our home page. To see the current monthly tip click here
Stata Tips
contract  the table
sysuse auto, clear bysort for mpg : gen index=1 if _n==1 bysort for mpg : gen freq=_N edit for mpg freq if index==1 //Or reducing this further to: sysuse auto, clear bysort for mpg : gen freq=_N if _n==1 edit for mpg freq if !missing(freq) The result: Foreign mpg freq Domestic 12 2 Domestic 14 5 Domestic 15 2 Domestic 16 4 Domestic 17 2 Domestic 18 7 Domestic 19 8 Domestic 20 3 Domestic 21 3 Domestic 22 5 Domestic 24 3 Domestic 25 1 Domestic 26 2 Domestic 28 2 Domestic 29 1 Domestic 30 1 Domestic 34 1 Foreign 14 1 Foreign 17 2 Foreign 18 2 Foreign 21 2 Foreign 23 3 Foreign 24 1 Foreign 25 4 Foreign 26 1 Foreign 28 1 Foreign 30 1 Foreign 31 1 Foreign 35 2 Foreign 41 1 The equivalent contract command: contract mpg for A twoway table can even by produced in the editor. Example 2 sysuse auto, clear set more off bysort mpg for: gen freq=_N separate freq, by(for) bysort mpg (freq0): replace freq0=freq0[1] bysort mpg (freq1): replace freq1=freq1[1] bysort mpg : gen index1=1 if _n==1 rename freq0 domestic_cars rename freq1 foreign_cars edit mpg domestic_cars foreign_cars if index1==1 The result: mpg domestic foreign 12 2 14 5 1 15 2 16 4 17 2 2 18 7 2 19 8 20 3 21 3 2 22 5 23 3 24 3 1 25 1 4 26 2 1 28 2 1 29 1 30 1 1 31 1 34 1 35 2 41 1 The equivalent table command: tabulate mpg for For further help: help contract help bysort help edit
The Stata cond() function
//generate data clear set obs 15 generate a=_n generate b=cond(a>5,a,5) list Example 2 There are of course many other ways of doing this: //(1) generate c=5 replace c=a if a>5 list //(2) generate c1=(a>5)*a replace c1=5 if a<6 list //(3) generate c2=5 replace c2=a if inrange(a,5,.) list //(4) // This options while doing this in one line is not as easy to understand as // that using the cond() function gen c3=(a>5)*a + (a<=5)*5 list //(5) // This can also be done with the max() function eg. generate c4=max(a,5) list Example 3 However where the variable a contains missing values the results from cond() are different from max(): // generate data clear set obs 11 generate a=_n if _n<10 generate b=cond(a>5,a,5) generate c=max(a,5) if !missing(a) list exit Example 4 Using cond() is a spell checker //generating the data clear input str10 a "thsi " "that " "the" "tree" "these" end list gen b=cond(trim(a)=="thsi","this",trim(a)) //The alternative is to use: gen c=a replace c="this" if trim(a)=="thsi" list exit Example 5 Using cond() to categorise a variable into negative, zero and postive. //generate the data clear input a 2 1 0 1 2 end list generate b=cond(a>=0, a!=0, 1) list //or using nested cond() functions generate c= /// cond(a>0, 1, /// greater than zero cond(a==0, 0, /// equal to zero cond(a<0, 1, . /// less than zero ))) list Example 6 An interesting way of dealing with missing values (as seen on Statalist) generate avprice = (total  cond(missing(price), 0, price)) / cond(missing(price), n, n  1) Example 7 Another example seen on statalist. This time for dropping duplicates quiet bysort hhid hhsize:gen dupobs=cond(_N==1,0,_n) drop dupobs // Probably a better way of dealing with this is: quiet bysort hhid hhsize:keep if _n==1 Example 8 From the Stata press book: "An Introduction to Stata programming by Christopher F. Baum" Section 3.3.2 generate netmarr2x=cond(marr/divr>2.0, 1, 2) // The above is OK but could also be replaced with: gen netmarr2x=(marr/divr<2.0)+1 Example 9 From the statalist clear set obs 10 generate x=_n if _n<8 list generate z=cond(x>5,1,0,.) list // the above produces the correct values; as a missing value is always greater than // any number in the variable. However // you may wish to have missing where a value in a is missing. // http://www.stata.com/statalist/archive/200802/msg01204.html // This puts missing values in where they occur in the x variable generate z1 = cond(missing(x), ., x > 5) list // An alternative to the above is to use the 2nd syntax of the cond() function. Note the 3rd term // in the fucntion does nothing so instead of "." any number would have been OK. generate z2=cond(x,x>5,.,.) list Example 10 The stata press book "Data Anlaysis using Stata by Ulrich Kohler and Frauke Kreuter" p460 Show how the cond() function can be used provide a default title for a graph local title = cond( `"`title'"' == `""', `"`varlist' by `by'", `"`title'"') graph twoway connected `yvars' xhelp, title(`title') .…. For further help: help cond() Stata Journal Vol 5 No 3
Combinations of variables
From time to time I'm asked how to write a program that used all combination
of a subset of variables in an estimation command. program combin syntax , NUMTot(integer) NUMPick(integer) tempvar cat bbb q con con1 tempfile temf temf1 quiet { save temf, replace //save the dataset currently in Stata labels1 `numtot' //program clear numlist "1/`numtot'" set obs `numtot' foreach i of numlist `=r(numlist)' { egen a`i'=fill( 1/`numtot' ) } //all permutations of the data fillin a1a`numtot' egen `cat'=concat(a*), p(" ") gen `bbb'="" gen `q'=. // all combinations of the data forvalues i=1/`=_N' { local b=`cat'[`i'] replace `bbb'="`: list sort b'" in `i' local q1 : list uniq b replace `q'=`:list sizeof q1' in `i' } //keep only the number of variables (items) required local z=1 foreach i of varlist * { if `z'>`numpick' { drop `i' } local ++z } egen `con'=concat(*) , punct(" ") generate `con1'="." forvalues i=1/`=_N'{ local a "`=`con'[`i']'" local aa :list dups a replace `con1'= "`aa'" in `i' local aaa :list sort a replace `con'= "`aaa'" in `i' } drop if !missing(`con1') duplicates drop `con', force labels2 //program save temf1, replace merge 1:1 _n using temf, nogen } //quiet end //store labels in do file program labels1 args max_vars describe, replace keep name keep in 1/`max_vars' encode name, gen(name1) label save name1 using filename , replace end //attaching value labels and decoding etc. program labels2 do filename label value a* name1 foreach i of varlist a* { decode `i', gen(z`i') } egen levels=concat(z*), punct(" ") keep levels end input numtot(#): this is the number of variables from which the "numpick" are taken out. The order of the variables is important as mumtot start at the first variable in the current order numpick(#): the number of variables out of numtot() you select sysuse auto, clear order make, last combin, numtot(6) numpick(2) forvalues i = 1/6 { di "`i'" regress turn `=levels[`i']' estimates store a`i' } estimates table a* , stats(r2) exit **The resulting variable that contains the combinations:
For further help: help program help macro help egen
Determining the value of PI by simulation in Stata
Time for some fun with Stata! If you randomly generate an x and y value between 0 and 1 and have these as scatter points, some x,y points will fall into a unit radius area and other outside of this. The ratio of the number of points in the areas times 4 (4 quadrants) will give Pi. The more points the more accurate the result. Solving for pi yields: pi = 4 * (scatter points in quadrant area)/(scatter points in square area ie. 1*1 square) the code of the above graph clear set obs 1000 gen x=runiform() gen y=runiform() gen height=sqrt(1  (x)^2) twoway (function y = sqrt(1  (x)^2), /// range(0 1) lwidth(thick) lcolor(red)) /// (area height x , sort) /// (scatter x y if y>height, mcolor(blue)) /// (scatter x y if y<=height, mcolor(green)) /// ,aspect(1) legend(off) The program that simulate calls: clear all set more off program a, rclass //1 args n //2 set obs `n' local z=0 //3 local zz=0 //3 forvalues i=1/100 { //4 local x=runiform() local y=runiform() local dis=sqrt(`x'*`x'+`y'*`y') if `dis'<=1 local ++z //5 local ++zz } //end loop return scalar stuff=4*(`z')/`zz' //6 endThe simulate command that calls the above program "a" simulate mean=r(stuff) , reps(10000): a 100 //7 summarize mean //the output should should be 3.415...Notes: 1  Program called by the simulate command. The program we called "a". Note class r ie returns r values 2  The args command. When we call the "a" program we also pass to the program the number of observations required The terms passed is mapped on the first term of args eg. n 3  Defining local macros. 4  forvalues loop command. 5  if statement that increments local macro z. 6  statement that defines a return value ie stuff. For further help: help return help simulate help macro help runiform() help summarize
Using Mata to produce an Excel table(January 2014)
sysuse auto, clear cd c:/ // 1 capture erase Results.xls // 2 regress price weight length if foreign // 3 return list // 4 matrix list r(table) // 5 matrix a1=r(table) // 6 matrix list a1 // 7 regress price weight length if !foreign matrix a0=r(table) matrix list a0 mata // 8 b=xl() b.create_book("Results","Sheet1") // 9 b.put_string(1,1,"Variable") // 10 b.put_string(1,2,"Coefficientforeign=1") b.put_string(1,3,"Coefficientforeign=0") b.put_string(2,1,"weight") b.put_string(3,1,"length") b.put_number(2,2,st_matrix("a1")[1,1]) // 11 b.put_number(3,2,st_matrix("a1")[1,2]) b.put_number(2,3,st_matrix("a0")[1,1]) b.put_number(3,3,st_matrix("a0")[1,2]) end // 12 display "{browse results.xls : results}" // 13 Notes: 1  Change directory; this is where the excel file is to be saved to. 2  Erasing any previous excel file saved with the same name in the folder. 3  The regress command. The results of which we wish to table. 4  Not required to run the above but lets us see what return results Stata produces for this command. 5  Not required to run the above but lets us see the results that Stata saves in the matrix. 6  Save the Stata matrix called r(table) into a Stata matrix we will call a1. 7  Not required to run the above but lets us see the contents of the matrix we have just saved. 8  Starting Mata. 9  Mata command to name the file in which we wish to save the results. We call this file "Results". 10  Mata command to put a name into a cell. 11  Mata command to put a result; obtained from the Stata matrix "a1" into a cell. The cell being row 2 and column 2 [2,2]. 12  End Mata. 13  Hyperlink the name of the file. For further help: help return help matrix help mata help help m5_intro help help mf_xl
Do Editor  Additional Toolbars (December 2013)
Group renaming of variables (November 2013)
//using foreach to loop over the variables to make the change foreach i of varlist * { rename `i' lower(`i') } But the rename command makes this even easier eg. rename * , lower Some further examples First generating some variables names clear forvalues i=1/15 { gen var`=`i'^2' = missing() } save data, replaceExample 2 Add suffix to the variable names var1 to var100 (based on the current variable name order) use data, clear rename (var1var100) =AExample 3 example 2 but with the variable names in no particular order use data, clear order var225 var169 rename (var(#) var(##) var100) (var(#)A var(##)A var100A)Example 4 Adding the string "four" to the variable name when ever the name contains the number "4" use data, clear rename (var*4*) (fourvar*4*)Example 5 Swapping the prefix and suffix where they both contain a character. use data, clear rename * *A rename ?ar#? ?[3]ar#[2]?[1]Example 6 Swapping year to a suffix. clear input currentliabilities2000total currentliabilities2001total currentliabilities2002total 1 1 1 end rename currentliabilities#total currentliabilitiestotal# For further help: help rename help rename group
Put your own font in Stata graphs (October 2013)
//make up some data set seed 1 clear set obs 50 generate a=runiform()*1000 generate b=runiform()*10 generate s=runiform()<.5 label define sexlab 1 `"{fontface "Female and male symbols":M }"' 0 /// `"{fontface "Female and male symbols":F }"' label values s sexlab scatter a b , ms(i) mlab(s) mlabpos(c) /// mlabsize(*2) title( "Example" `"{fontface "Female and male symbols":M F}"') For further help: help smcl
Getting to the function quickly (September 2013)
************profile.do************** global F4 "help functions;" //other profile settings ***********************************For further help: help profile or see Previous tip on profile.do
Value Labels  using labels in an expression (August 2013)
sysuse auto, clear list if foreign==1The above is fine if you know what the number means. This could be looked up with: label list origin A safer way of specifying this is: sysuse auto, clear list if foreign=="Domestic":originFor further help: help label Also see Stata 13 Users Guide 13.10 or click here
The Label command  save option (July 2013)
//setting up the data for the example clear set more off input a b 1 1 2 2 3 3 end label define a 1 "take bus to work" 2 walk 3 bike label define b 1 "drive alone" 2 "drive with 1 passenger" 3 "some times drive alone" label list label values a a label values b b list //finished setting up the data set label list label save using c:/label1, replace // (1) type c:/label1.do // (2) filefilter c:/label1.do c:/label2.do , from("`") to("") replace // (3) filefilter c:/label2.do c:/label3.do , from("'") to("") replace preserve // (4) infile str100 (a b c d e f) using c:/label3.do, clear // (5) list replace d="9"+d if c=="b" // (6) replace c="a" // (6) replace f="" in 1 // (6) replace e=char(34)+e+char(34) // (6) egen t1=concat(af), punct(" ") // (7) replace t1=subinword(t1,"modify", ",modify",1) // (6) keep t1 // (8) outfile using "c:/try.do", noquote replace wide // (9) type c:/try.do restore // (10) decode a, gen(a1) // (11) decode b, gen(b1) stack a1 b1, into(c1) // (12) type c:/try.do do c:/try // (13) label list encode c1, gen(d1) label(a) // (14) keep d1 list list, nolabNotes on the above: (1) Using the label command with the save option. (2) Using the type command to view what was save in the do file. (3) Using filefilter to remove the single quote character from the do file. (4) summarize writes a copy of the current data in Stata memory to the hard drive. (5) Inputing the contents of the do file into Stata. (6) Modifying the label definition as required. (7) Using one of egen's many handy commands to concatenate the strings. (8) Keep only the t1 variable. (9) Output . (10) Inputing the data previously temporarily stored on the hard drive into Stata. (11) decode generates a new variable (a1) that contains the string values labels of the variable (a). (12) combining the 2 variable with the stack command. (13) execute the do file that now contains the label definitions so that it is part of the data set. (14) encode the string variable using the label definition previously loaded. For further help: help label help type help filefilter help filefilter help egen help infile help outfile help stack help decode help encode
Breaking up dates (June 2013)
clear set more off input /// //(1) str30 date_in str30 date_out ward "7/22/2009 22:59" "7/24/2011 10:12" 1 "7/22/2011 12:05" "8/25/2011 21:07" 2 "8/27/2011 10:46" "8/28/2017 19:45" 1 "8/28/2011 15:34" "8/28/2011 16:43" 2 "8/28/2011 23:24" "8/29/2011 13:43" 1 "8/27/2011 14:32" "8/28/2011 15:15" 2 "8/28/2011 09:43" "8/28/2011 17:49" 1 "8/28/2011 01:33" "8/28/2011 02:32" 2 "8/28/2011 04:43" "8/29/2011 05:53" 1 "8/31/2011 07:30" "8/31/2011 08:11" 2 end l generate double date_in2a=date(date_in,"MDY hm") // (2) format date_in2 %td // (3) summarize date_in2a // (4) local mint=r(min) // (5) generate double date_out2a=date(date_out,"MDY hm") format date_out2 %td summarize date_out2a local maxt=r(max) generate flag=0 //(6) forvalues i= `=year(`mint')'/`=year(`maxt')' { //(7) replace flag=inrange(`i', year(date_in2a),year(date_out2a)) //(8) //start date generate y`i'_s1=date_in2a if flag & year(date_in2a)==`i' // (9) replace y`i'_s1=td("1Jan`i'") if flag & missing(y`i'_s1) // (10) //end data generate y`i'_f1=date_out2a if flag & year(date_out2a)==`i' replace y`i'_f1=td("31Dec`i'") if flag & missing(y`i'_f1) } format y* %td list exitNotes on the above: (1) Using the input command to produce a data set. (2) Converting the date in string format to elapsed time (a number) with the date() function eg. the number of days from 1 Jan 1960. (3) Formating the just created elasped date to make it easier to read for checking if the conversion went correctly. (4) Using the summarize command to get the earliest date. This is stored in the return scalar: r(min) All the return value from the summarize command can be seen by typing return list. (5) Storing the min value in a local macro. (6) Generating a flag variable to indicate if a particular year is within the date span of the observaion. (7) Looping through all the years in the data. (8) Making flag equal to 1 if the year is in range. (9) If the date span for the observation contains the year in the looping index and this is the starting date then put the starting date in the variable. (10) If the starting date is earlier than the start of the year in the looping index put in the 1 Jan for that year. For further help: help input help summaraize help generate
Stata's separate command (May 2013)
Use the following: sysuse auto, clear separate weight , by(rep78) twoway scatter weight1weight5 mpg , name(a3) ytitle(Weight (lbs.)) For further help: help separate
Getting Stata to automatically open a web page (April 2013)
clear mata local a : sysdir PERSONAL //1 cd `a' mata: //2 if(!fileexists("mymatrix")){ //3 v=st_global("c(current_date)") //4 X=date(v,"DMY") //5 fh = fopen("mymatrix.myfile", "rw") //6 fputmatrix(fh, X) fclose(fh) } fh = fopen("mymatrix.myfile", "rw") //7 X = fgetmatrix(fh) fclose(fh) st_local("date",strofreal(X)) end //end mata local t=1 //interval //8 if `date'+`t'< date(c(current_date),"DMY") { //9 shell "C:\Program Files\Mozilla Firefox\firefox.exe" /// "http://www.statapress.com/forthcoming/" //10 } else { display "No required to check web page" //11 } exit Note on the above: (1) Extended macro saving the path of Stata's PERSONAL location in the local macro a. PERSONAL is on Stata's adopath (2) Using a Stata Mata matrix to store the date that a Web page was last accessed. You could store the information in other forms but a Mata matrix seemed a handy way of doing this (3) The first time that this is run there is no Mata file; so just checking if one needs to be created. If not, jump into the loop (4) Save the current date in a scalar matrix called v (5) Convert current date to Stata elapsed time using the date() function (6) Saving the Mata matrix to a file (7) Reading the saved Mata file (8) Save the interval (days) that you wish to display the web page. In this case every day (9) If the duration that the web page was last accessed is greater then the specified interval and it has not been accessed today then jump into loop (10) The web page that you wish to see (11) Comment indicating that program is working but is not required to access Web page For further help: help clear help adopath help extended_fcn help mata help comments help macro help date functions
Adding to Stata and User written programs (March 2013)
program s_levelsof , rclass //(1) levelsof `0' //(2) local b: word count `r(levels)' //(3) return scalar a=`b' //(4) return local levels=r(levels) end //(5) The above program is run as follows: sysuse auto, clear //(6) s_levelsof mpg,local(z) //(7) display "`z'" return listNote on the above: (1) program command with a new name for the command. Never over write existing commands; create a new name. A common prefix will allow you to easily identify the new command. The rclass options is used where the program is required to return some values. (2) The levelsof command with `0'. The macro `0' contains what was passed to the new command eg. mpg,local(z) (3) Using the macro extended function: word count , to count the nunber of levels (4) The return value as a scalar (5) end of the program (6) Loading the Stata data set (7) Calling the new program: s_levelsof For further help: help levelsof help program help extended_fcn
Stata Editor  selecting columns (February 2013)
Highlight the column of ///. Adjust the end /// so that it is the closest to the rhs. At the top lhs of the column place place the cursor. Then on the keyboard simultaneously press the Ctrl and Alt keys, then with the mouse select the column require. At the top of the column drag the column to the left hand side. Then using the do editor pull down menu: Edit>Find>Replace, tick the regular expression box and type in the show regular expression and execute. Then select the column of ///. (on the keyboard simultaneously press the Ctrl and Alt keys, then with the mouse select the column require ) At the top of the column drag right to the required position. For further help: findit regular expressions help comments
adoupdate (January 2013)
Search of official help files, FAQs, Examples, SJs, and STBs SJ82 st0146 . . Errorcorrectionbased cointegration tests for panel data (help xtwest if installed) . . . . . . . D. Persyn and J. Westerlund Q2/08 SJ 8(2):232241 implements the four errorcorrectionbased panel cointegration tests developed by Westerlund You click on the hyperlink and it downloads (confirmed by ado dir or using the pull down menu: Help>SJ and user written programs and then previously installed packages>list. To get the version of this program you type which on the Stata command line. . which xtwest c:\ado\plus\x\xtwest.ado *! xtwest 1.1 1Apr2008 *! Damiaan Persyn, LICOS centre for Development and Economic Performance www.econ.kuleuven.be/licos *! Copyright Damiaan Persyn 20072008. adoupdate does not update this because it was not loaded from ssc eg. . adoupdate xtwest, update (note: adoupdate updates userwritten files; type update to check for updates to official Stata) (no packages match "xtwest") Now getting the package from ssc . ssc install xtwest, replace checking xtwest consistency and verifying not already installed... the following files will be replaced: c:\ado\plus\x\xtwest.ado c:\ado\plus\x\xtwest.hlp installing into c:\ado\plus\... installation complete. . which xtwest c:\ado\plus\x\xtwest.ado *! xtwest 1.5 1Jul2010 *! Damiaan Persyn, LICOS centre for Development and Economic Performance www.econ.kuleuven.be/licos *! Copyright Damiaan Persyn 20072008. Now the version number is 1.5 (previously 1.1). The later version includes some bug fixes. Therefore care should be taken to see that the version of a user package is the one that you require. Other cases of adoupdate not updating user packages is where the author of a user written package has this on their personal web page; not ssc. eg. Spost http://www.indiana.edu/~jslsoc/web_spost/sp_install.htm some programs written by Eric Booth at: https://sites.google.com/site/ericabooth/Home/software and other.. In these cases you need to go to the authors site and down load the package as per the authors instructions.
Compress (December 2012)
clear input str200 country "Note: the details for this are.." 1 2 end edit notes : TS country[1] drop in 1 sleep 3000 compress notes
Reshaping long and attaching variable labels afterwards (November 2012)
//creating the data clear input y id x2007 x2008 x2009 z2007 z2008 z2009 18 1 12 16 18 20 21 19 10 2 11 17 17 33 32 19 12 3 10 10 22 19 17 18 end l // Labeling variables foreach v of varlist x* z* { label variable `v' "`=substr("`v'",1,1)' factor(`=substr("`v'",length("`v'")3,4)')" } describe save data, replace //getting variable names and variable labels describe, replace clear //1 generate var=regexs(1) if regexm(name,"([azAZ]+)([09][09][09][09])") //2 levelsof var, local(stub) clean //3 drop if missing(var) //4 duplicates drop var, force //5 keep var varlab save varinfo, replace //the reshape use data, clear reshape long "`stub'", i(id) j(Year) //attached the data to the dataset merge 1:1 _n using varinfo, nogen //6 count if !missing(var) foreach i of varlist * { //loops variables forvalues i1=1/`=r(N)' { //loops observations if "`i'"=="`=var[`i1']'" { //7 label var `i' "`=varlab[`i1']'" //8 } } } drop varlab var describe list exitGoing through the above code: "//getting variable names and variable labels" The lines of code under this title get the variable stubs and their associated labels. 1 Using the replace and clear options of the describe command the variable names and labels replace the existing data in Stata. 2 generate a new variable that contains the variable stubs 3 get a list of stubs; these are saved in a local macro 4 drop observations where var is missing 5 As only one of each stub is required the duplicates command is used to remove duplicates 6 merge the variable name and label with the data set 7 Check to see if variable name in the dataset is the same as merged data variable name 8 If the same the variable label "varlab" is given the label for this variable
Graph marker labels (October 2012)
clear all sysuse auto, clear generate order=. generate index= _n //scale summarize mpg local a1_max=r(max) local a1_min=r(min) local a1_diff=`a1_max'`a1_min' summarize weight local a2_max=r(max) local a2_min=r(min) local a2_diff=`a2_max'`a2_min' generate dis_hor=. generate dis_vert=. generate hyp=. generate kk=. local z=1 forvalues i=1/74 { if `z'==1 { replace order=1 in 1 local z=0 } else { gsort order replace dis_hor=(mpg[1]mpg)/`a1_diff' replace dis_vert=(weight[1]weight)/`a2_diff' replace hyp=sqrt(dis_hor^2 +dis_vert^2) sort hyp replace kk=sum(missing(order)) replace kk=. if !missing(order) replace order=`i' if kk==1 } } qreg mpg weight, quantile(50) //quantile can be changed predict a generate up_down= a < mpg tab up_down bysort up_down order: gen order1=_n //values to change local angle =10 local s_ang=55 local no_ang= 3 bysort up_down (order1):gen aa=mod(_n,`=`no_ang'+1') //normal marker labels scatter mpg weight , mlab(make) mlabangle(45) yline(22) xline(2930) name(kk1) local z=0 forvalues i=0/`=`no_ang'' { if `z'==0 { local aa2 =`"(scatter mpg weight if aa==`i' & up_down==0 , mlabcolor(blue) "' + /// `" mlab(make) mlabpos(3) mlabangle(`=`s_ang'`angle'*`i'') ) "' + /// `"(scatter mpg weight if aa==`i' & up_down==1 , mlab(make) mlabcolor(red) "' + /// `" mlabpos(3) mlabangle(`=`s_ang'+(`angle'*`i')') ) "' local z=1 } else { local aa1= `" (scatter mpg weight if aa==`i' & up_down==0, mlabcolor(blue) "' + /// `" mlab(make) mlabpos(3) mlabangle(`=`s_ang'`angle'*`i'') ) "' + /// `"(scatter mpg weight if aa==`i' & up_down==1, mlab(make) mlabcolor(red) "' + /// `"mlabpos(3) mlabangle(`=`s_ang'+`angle'*`i'') )"' } local aa2 `aa2' `aa1' } twoway /// `aa2' , /// yline(22) xline(2930) name(kk2) legend(off) exit
Stata on Youtube (September 2012)
Gathering prefixes for the reshape command (August 2012)
clear input /// //1 id a2001 a2002 a2003 b2001 b2002 b2003 c2001 c2002 c2003 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 end list preserve //2 describe, replace clear //3 generate prefix=regexs(1) if /// regexm(trim(name), "([AZaz]+)([09]+$)") //4 contract prefix, nomiss //5 local a //6 forvalues i=1/`=_N' { //7 local a `a' `=prefix[`i']' //8 } restore //9 reshape long `a', i(id) j(item) //10 list exitGoing through the above:
For help on specific commands type: help and then the specific command eg. help imput help list help preserve help forvalues help describe help macro
Fuzzy merge  base on time  an alternative (July 2012)
cd c:/ //(1) clear input time //(2) 1 7 10 11 15 16 21 25 30 end generate date1=1 //(3) generate stuff=runiform() //(4) save date1, replace clear input time //(5) 8 19 30 end generate date2=1 //(6) save date2, replace append using date1 //(7) duplicates tag time, gen(same) //(8) drop if same==1 & date2==1 //(9) merge 1:1 time using date2 //(10) sort time list //(11) drop if missing(date1)& missing(date2) //(12) generate min=cond(time[_n+1]time>timetime[_n1], /// //(13) 1*(timetime[_n1]), /// //(13) time[_n+1]time ) if date2==1 //(13) list replace stuff=stuff[_n+sign(min)*1] if !missing(min)& abs(min)<=2 //(14) replace date1=date1[_n+sign(min)*1] if !missing(min)& abs(min)<=2 drop if (min[_n+1]<0 & abs(min[_n+1])<=2 )  (min[_n1]>0 & abs(min[_n1])<=2 ) //(15) drop min _merge egen total_matches=rowtotal( date2 same) //(16) order time stuff date1 date2 same total_matches //(17) list exitGoing through the above:
For help on specific commands type: help and then the specific command eg. help imput help list help merge help drop help replace
Fuzzy merge  base on time (June 2012)
cd c:/ //(1) clear input time //(2) 1 7 10 11 15 16 21 25 30 end generate date1=1 //(3) generate stuff=runiform() //(4) tsset time //(5) tsfill, full //(6) save date1, replace clear input time //(7) 8 19 30 end generate date2=1 //(8) merge 1:1 time using date1 //(9) sort time list //(10) drop if missing(date1)& missing(date2) //(11) generate min=cond(time[_n+1]time>timetime[_n1], /// //(12) 1*(timetime[_n1]), /// //(12) time[_n+1]time ) if date2==1 //(12) list replace stuff=stuff[_n+sign(min)*1] if !missing(min)& abs(min)<=2 //(13) replace date1=date1[_n+sign(min)*1] if !missing(min)& abs(min)<=2 drop if (min[_n+1]<0 & abs(min[_n+1])<=2 )  (min[_n1]>0 & abs(min[_n1])<=2 ) //(14) drop min _merge list exitGoing through the above:
For help on specific commands type: help and then the specific command eg. help imput help list help merge help drop help replace
Getting the data right for Stata's Excel command (May 2012)
import excel "C:\excel.xls", sheet("Sheet1") clear foreach i of varlist AD { capture confirm number `=`i'[1]' if _rc==0{ rename `i' Y`=`i'[1]' } else rename `i' `=`i'[1]' } drop in 1 The above imports the spreadsheet and renames the problem variables. The command confirm number determines if the first row of the data set is a number or a string. capture in front of the confirm command "captures" if the confirm statement is true or false, hence giving the return code of the capture ( _rc)command a value of 7 if a string or 0 if a number. The return code is then used to determine how the variable name is to be renamed. Having a closer look at: `=`i'[1]' `i' is the macro substitution of the looping index i. In this case the variable name [1] when put adjacent to a variable name (without a space) the bracketed number indicates the observation number. In this case the first time around the loop it would equal the first observation of variable A. This is know as explicit subscripting. `= ' the symbols around `i'[1] are to tell Stata to evaluate the expression see: Stata 12 Users Guide 18.3.8 page 201 foreach i of varlist Y* { capture confirm numeric variable `i' if _rc!=0 { replace `i'="." if `i'=="" destring `i', replace } } describe The above once again uses the confirm command but this time with the variable option. Each variable with a Y prefix (the character that was previously included) is put through a loop where firstly empty cells are filled in with the Stata missing values and then the string variable is changed to a numeric variable with the destring command. If the output of the descibecommand indicates that the variable is still a string then this may be due to a nonnumeric characters in the variable. One way of looking for the observation that contains this is: forvalues i=1/`=_N' { capture confirm number `=Y2001[`i']' if _rc!=0 { display "Potential problem observation: "`i' } } The output to the above indicates a problem with observations 1 and 2 of the Y2001 variable. Another option for finding problem observation is to use hexdump. For help on specific commands type: help and then the specific command eg. help capture help confirm help forvalues help foreach help destring help rename help drop
Using Stata's file command (April 2012)
one:two:three:four one1:two1 For this example assume that we require only the last 2 words in a new file (the words are delimited with a ":") An example of a Stata program that will create a new file with only the last 2 words of each line is: program ltype //(1) version 12.1 syntax , Current(string) New(string) P(string) //(2) tempname fh hdl //(3) file open `fh' using `"`current'"', read //(4) file open `hdl' using `"`new'"', replace write text //(5) file read `fh' line //(6) while r(eof)==0 { //(7) local kk=reverse("`line'") //(8) tokenize `kk',p("`p'") //(9) local No1 "`=reverse("`1'")'" //(10) local No3 "`=reverse("`3'")'" //(10) file write `hdl' %st10 ("`No1'") %st10 (":") %st10 ("`No3'") _newline //(11) file read `fh' line //(12) } file close _all //(13) endGoing through the above: (1) program The start of the program starts with program and a name of the program: ltype. The program finishes with end. (2) syntax command, passes information to the program via local macros ie. current is the macro name containing the file name of the initial file (raw data). new is the name of the local macro that contains the file name for the information obtained by the program (last 2 words of line). p is the local macro name of the delimiter. (3) Creating tempory names (4) Opening the file with the initial information (5) Opening a file for the required data to be entered (6) Read the first line of the file and store this in the local macro line (7) While loop; continues until end of file is reached (8) Reverse the order of the contents of the line macro so the last word becomes the first (9) tokenize the line based on the parse character : ; break string into local macro's with names: 1,2 etc. (10) make the contents of the local macro's No1 and No3 the reverse of the last and 2nd last words (: is treated as a word) (11) writes macro contents to the new file separating then with ":" (12) read new line of file (13) closes both files Then the above program can be run with the following: type "c:/file_try.txt" ltype ,current("c:/file_try.txt") new( "c:/file_try1.txt") p(":") type "c:/file_try1.txt" infile str10 a using "c:/file_try1.txt", clear //loading file into Stata list
Cleaning data  consistent naming  using soundex()(March 2012) (March 2012)
//creating the data for this example clear input str20 w1 "Microsoft" "MicroSoft" "Micro Soft" "MicroSoft" "Microsoft Inc." "Microsoft Inc" "MicrosoftInc" "MicrosoftAA" "Microaa" "MSFT" "MS" "M$" "STATA" "StataCorp" "StataCorp LP" "staCorp" "Linux" "linux" end list, clean noobs save c:/a, replace generate kk=soundex(w1) // <1 generate New_w1="" //Linux foreach i in L520 { // <2 replace New_w1="Linux" if kk=="`i'" // <3 } //Microsoft foreach i in M262 M000 M200 M213 { replace New_w1="Microsoft" if kk=="`i'" } //Stata foreach i in S330 S326 S332 { replace New_w1="StataCorp" if kk=="`i'" } replace New_w1="Micro AA" if kk=="M260" sort kk list, sepby(New_w1) exitGoing through the above: (1) soundex() The soundex code consists of a letter followed by three numbers: the letter is the first letter of the name and the numbers encode the remaining consonants. Similar sounding consonants are encoded by the same number. (2) Stata loop for each of the soundex code(s) (3)replace command that replace existing contents with the name that mapps to the soundex code. The resulting data set ++  w1 kk New_w1   1.  Linux L520 Linux  2.  linux L520 Linux   3.  M$ M000 Microsoft  4.  MS M200 Microsoft  5.  MSFT M213 Microsoft   6.  Microaa M260 Micro AA   7.  Micro Soft M262 Microsoft  8.  MicroSoft M262 Microsoft  9.  Microsoft Inc. M262 Microsoft  10.  MicroSoft M262 Microsoft  11.  MicrosoftInc M262 Microsoft  12.  MicrosoftAA M262 Microsoft  13.  Microsoft M262 Microsoft  14.  Microsoft Inc M262 Microsoft   15.  staCorp S326 StataCorp  16.  STATA S330 StataCorp  17.  StataCorp LP S332 StataCorp  18.  StataCorp S332 StataCorp  ++ For help on specific commands type: help and then the specific command eg. help soundex() help generate help replace help use
Cleaning data  consistent naming  manually (January 2012) //creating the data for this example clear input str20 w1 "Microsoft" "MicroSoft" "Micro Soft" "MicroSoft" "Microsoft Inc." "Microsoft Inc" "MicrosoftInc" "MSFT" "MS" "M$" "STATA" "StataCorp" "StataCorp LP" "staCorp" "Linux" "linux" end list, clean noobs save c:/a, replace contract w1 //>1 edit //>2 //get by hand all the different forms of the one name. // In this case variations on Microsoft clear input str20 w1 "Microsoft" "MicroSoft" "Micro Soft" "MicroSoft" "Microsoft Inc." "Microsoft Inc" "MicrosoftInc" "MSFT" "MS" "M$" end generate x2="1" //>3 save c:/a1, replace //merging and replacing with the correct name use c:/a, clear //>4 merge 1:m w1 using c:/a1 , nogenerate //>5 replace x2="Microsoft" if x2=="1" //>6 save c:/a, replace contract w1 if x2=="" edit //get by hand all the different forms of the one name. // In this case variations on Stata clear input str20 w1 "STATA" "StataCorp" "StataCorp LP" "staCorp" end generate x3="1" save c:/a1, replace //merging use c:/a, clear merge 1:m w1 using c:/a1, nogenerate replace x2="Stata" if x3=="1" drop x3 //Etc until all the names are consistentGoing through the above: (1) contract the dataset to a list of names and frequencies (2) open the Stata editor so that the various names can be copied. If a large list and the required names are spread throughout the list a new variable can be created and a 1(one) put in, adjacent to the names. The variable with the 1's can be sorted and then the variations on the required name can be copied into the do file. (3)generate a new variable to be used after the merge command that indicates the names to be changed (4) Load original dataset (5) Merge the original dataset with the list of names dataset (6) Replace the "1" in the previously generated variable (3) to the official name "Microsoft" Repeat the process again until all names are as required. For help on specific commands type: help and then the specific command eg. help contract help generate help merge help use help edit
Working with Dates 3(January 2012) clear set more off input /// str6 id str20 date 01003 07Nov2008 01003 07Nov2008 01003 11Nov2008 01007 22Dec2008 01007 05Dec2008 01007 13Nov2007 01007 14Nov2007 01007 22Jul2006 01007 22Jul2006 01007 22Jul2006 01007 11Sep2006 01009 13Oct2005 01009 17May2006 01009 17May2006 01009 13Jan2010 01009 06Jun2010 01008 08Nov2007 01008 08Nov2007 01008 08Nov2007 01008 15Jul2009 01008 15Jul2009 01008 15Jul2009 01008 27May2010 01008 28May2010 01008 28May2010 01008 28May2010 end l generate date1=date(date, "DMY") //1 generate cluster=. //2 list, sepby(id) tempvar max //3 bysort id (date1): gen `max'=_N //4 summarize `max' //5 forvalues i=1/`r(max)' { //6 bysort id (date1): replace cluster=`i' if /// //7 date1<=(date1[sum(cluster!=.)+1]+30) & cluster==. list, sepby(id) // list command to show what is happening // can be removed } list, sepby(id) exitGoing through the above: (1) generate a new variable (date1) that takes the date in string format from date and converts this into elasped time (a numeric value) (2) generate a new variable called cluster; all values equal to missing (.) (3) Assigns name to temparory variable max (4) Using the bysort prefix, by every level of id the values of the temporary variable max is generated and filled with values of _N. Note macro subsittion single brackets around the temporary variable name. _N stands for the max number of observations. In this case, because of bysort, it is the max number of observation for each level of id. (5) The summarize command is used to obtain the max number of obseravations in all the levels of max. The summarize command has a handy return value that stores this eg. r(max). To see the other values returned by this command type: return list (after the summarize command) (6) looping over the code in the curly brackets using forvalues loop (7) using the bysort command replace the value of cluster with the looping index value (this will be the group number) if the qualifer is true ie. the start of the next cluster and within 30 days of the start of the new cluster. Breaking the qualifer down: date1<=(date1[sum(cluster!=.)+1]+30) & cluster==. cluster!=. logical statement either true of false ie if cluster does not equal (!=) a missing value (.) the observation is true and equals 1 (one) sum(cluster!=.) : sums the results of cluster!=. +1: add 1 (one) to the result of the sum(cluster!=.) date1[sum(cluster!=.)+1]:inside the square brackets (explicit subscripting) Stata has calculated the observation number of date1 that we require eg. data1[obs no]. Stata gets the data for the variable date1 (a date) and adds 30 days to this. Stata then tests if the current observation of date1 is <= to the value calculated by date1[sum(cluster!=.)+1]+30) and also that the current value of cluster if missing (.). If the statement is true the looping index value (`i') replace the current value for observation of cluster. For help on specific commands type: help and then the specific command eg. help input help generate help summarize (the saved results from the summarize command can be seen be typing: return list after the summarize command help macro help forvalues help sum() help tempvar help list
Working with Dates 2(December 2011) clear input /// str30 date_in str30 date_out ward "7/22/2011 22:59" "7/27/2011 10:12" 1 "8/27/2011 12:05" "8/27/2011 21:07" 2 "8/27/2011 10:46" "8/28/2011 19:45" 1 "8/28/2011 15:34" "8/28/2011 16:43" 2 "8/28/2011 23:24" "8/29/2011 13:43" 1 "8/27/2011 14:32" "8/28/2011 15:15" 2 "8/28/2011 09:43" "8/28/2011 17:49" 1 "8/28/2011 01:33" "8/28/2011 02:32" 2 "8/28/2011 04:43" "8/29/2011 05:53" 1 "8/31/2011 07:30" "8/31/2011 08:11" 2 end list set more off split date_in, gen(kk) // (1) split date_out, gen(zz) generate double date_in2=date(date_in,"MDY hm") // (2) format date_in2 %td generate double date_out2=date(date_out,"MDY hm") format date_out2 %td summarize date_in2 // (3) local d1=r(min) // (3) summarize date_out2 // (3) local d2=r(max) // (3) local range=`d2'`d1' // (4) forvalues i=0/`range' { // (5) local kk=`d1'+`i' // (6) generate datea`kk'=1 if inrange(`d1'+`i', date_in2,date_out2) // (7) label var datea`kk' `="`=day(`d1'+`i')'"+ "_"+ /// // (8) "`=month(`d1'+`i')'"+ "_"+"`=year(`d1'+`i')'"' } generate id=_n // (9) reshape long datea, i(id) j(datekk) string // (10) bysort id:gen double datea1=sum(datea) if !missing(datea) bysort id:replace datea=sum(datea) if !missing(datea) levelsof id, local(id1) foreach i of local id1 { summarize datea if id==`i' replace datea1=24*60 if datea!=`r(min)' & datea!=`r(max)' /// & id==`i' & !missing(datea) replace datea1=24*60(clock(kk2,"hm")/(1000*60)) if /// datea==`r(min)' & id==`i' & !missing(datea) replace datea1=clock(zz2,"hm")/(1000*60) if /// datea==`r(max)' & id==`i' & !missing(datea) //enter and discharge the same day replace datea1=(clock(zz2,"hm")clock(kk2,"hm"))/(1000*60) if /// datea==`r(max)' & datea==`r(min)' & id==`i' & !missing(datea) } collapse (sum) datea1, by(ward datekk) // (11) destring datekk, gen(date) // (12) format date %td rename datea1 time // (13) label var time "time in minutes" // (14) list, sepby(ward) // (15)Going through the above: (1) Split string dates into day and time (2) Date/time; input as strings are converted to elasped time (numbers of milliseconds from a datum). (3) The minimum and maximum dates are obtained with the summarize command and saved in local macros. (4) The range is calculated and saved in a local macro. (5) Using the forvalues command the days of the range are looped through. (6) The date, in elasped days is calculated. (7) A new variable for each day is calculated and 1 included in the observation where the loop date is in the range indicated by the inrange() function. (8) The newly created variable is give a label; which is the loop date. (9) Generate a unique id value to be used by the reshape command. (10) Reshape the data from wide to long data format. (11) Collapse the data to give the required results. (12) Use the destring command to convert a string variable to a numeric variable. (13) Rename variable. (14) Include a variable label. (15) Finally, list the results. For help on specific commands type: help and then the specific command eg. help input help generate help summarize (the saved results from the summarize command can be seen be typing: return list after the summarize command help macro help forvalues help collapse help destring help rename help label help list
Working with Dates (November 2011) clear set more off input /// str30 date_in str30 date_out "7/22/2011" "8/27/2011" "8/27/2011" "8/27/2011" "8/27/2011" "8/28/2011" "8/28/2011" "8/28/2011" "8/28/2011" "8/29/2011" "8/27/2011" "8/28/2011" "8/28/2011" "8/28/2011" "8/28/2011" "8/28/2011" "8/28/2011" "8/29/2011" "8/31/2011" "8/31/2011" "8/31/2011" "8/31/2011" "8/31/2011" "9/4/2011" "8/23/2011" "8/23/2011" "8/23/2011" "8/24/2011" "8/24/2011" "9/15/2011" "8/4/2011" "8/4/2011" "8/4/2011" "8/8/2011" "8/10/2011" "8/10/2011" "8/10/2011" "8/17/2011" end list generate date_in1=date(date_in,"MDY") // see (1) below generate date_out1=date(date_out,"MDY") // (1) format date_in1 date_out1 %td // (2) summarize date_in1 // (3) local d1=r(min) // (3) summarize date_out1 // (3) local d2=r(max) // (3) local range=`d2'`d1' // (4) forvalues i=0/`range' { // (5) local kk=`d1'+`i' // (6) gen datea`kk'=1 if inrange(`d1'+`i', date_in1,date_out1) // (7) label var datea`kk' `="`=day(`d1'+`i')'"+ "_"+ /// // (8) "`=month(`d1'+`i')'"+ "_"+"`=year(`d1'+`i')'"' } generate id=_n // (9) reshape long datea, i(id) j(datekk) string // (10) collapse (sum) datea, by(datekk) // (11) destring datekk, replace // (12) format datekk %td rename datea rooms_oc // (13) label var rooms_oc "rooms occupied" // (14) list, sep(0) // (15)Going through the above: (1) Dates; input as strings are converted to elasped time (numbers of days from a datum). (2) Dates are formated. (3) The minimum and maximum dates are obtained with the summarize command and saved in local macros. (4) The range is calculated and saved in a local macro. (5) Using the forvalues command the days of the range are looped through. (6) The date, in elasped days is calculated. (7) A new variable for each day is calculated and 1 included in the observation where the loop date is in the range indicated by the inrange() function. (8) The newly created variable is give a label; which is the loop date. (9) Generate a unique id value to be used by the reshape command. (10) Reshape the data from wide to long data format. (11) Collapse the data to give the required results. (12) Use the destring command to convert a string variable to a numeric variable. (13) Rename variable. (14) Include a variable label. (15) Finally, list the results. For help on specific commands type: help and then the specific command eg. help input help generate help summarize (the saved results from the summarize command can be seen be typing: return list after the summarize command help macro help forvalues help collapse help destring help rename help label help list
Doing things by levels of a variable (October 2011) clear sysuse auto, clear levelsof for, local(level) foreach i of local level { histogram mpg if for==`i', name(a`i') } However, levelsof fails when there are many levels, as can be seen from the snipit of code: clear set more off set obs 100000 gen a=_n levelsof a, local(aa) The levelsof help file states that this command is best used if the number of levels is modest. What to do if the number of levels exceeds the limit? The following are 2 methods: Method 1 This method contracts the variable that the levels are required for and then merges it with the dataset, hence the levels are contained in the Stata dataset: Example: sysuse auto, clear expand 10000 graph drop _all preserve contract mpg rename mpg levels save c:/kk, replace restore merge 1:1 _n using c:/kk drop _freq drop _merge sum levels forvalues i=1/`=r(N)' { scatter price weight if mpg==mpg[`i'], name(a`i') } exit Method 2 Using Mata to get the levels of a variable sysuse auto, clear graph drop _all expand 10000 set more off mata: a=uniqrows(st_data(.,"mpg")) a for(i=1;i<=rows(a);++i){ st_local("i1",strofreal(a[i])) stata("scatter price weight if mpg=="+st_local("i1")+", name(a" + st_local("i1")+")") } end For help on specific commands type: help and then the specific command eg. help levelsof help contract help mata help mata st_local() help mata stata help mata unique
Speeding up Stata  the if statement (September 2011) // creating a data set clear timer clear //creeat data set obs 10000000 gen a=uniform() gen b=uniform() gen c=uniform() save c:/exp1, replace clear //Example 1 //running regressions timer on 1 use c:/exp1 regress a b c if c<.5 regress a b c if c<.5 regress a b c if c<.5 timer off 1 timer list //the timer gives the following results: . timer list 1: 20.25 / 1 = 20.2500 Example 2 The above example's comands have been modified to bring in only the required observations (the ones that satisfy the qualifier). To do this we use the 2nd syntax of the use command. clear timer on 2 use if c<.5 & b<.5 using c:/exp1 regress a b c regress a b c regress a b c //use c:/exp1 timer off 2 timer list //the timer gives the following results: . timer list 1: 20.25 / 1 = 20.2500 2: 9.75 / 1 = 9.7500As you can see example 2 runs considerably faster than example 1 Example 3 Another way of speeding up Stata is to create a variable where 1 equals the observations that are to be included in the regression and then use a less complex if statement. clear timer on 3 use c:/exp1 mark a1 if c<.5 & b<.5 regress a b c if a1 regress a b c if a1 regress a b c if a1 timer off 3 timer list //the timer gives the following results: 1: 20.25 / 1 = 20.2500 2: 9.75 / 1 = 9.7500 3: 17.28 / 1 = 17.2820 For help on specific commands type: help and then the specific command eg. help use help mark
Stata 12's new Excel command (August 2011) clear all sysuse auto, clear set more off ds, not(type string) capture erase "c:\stuff.xls" local z=1 foreach i of varlist `r(varlist)' { sysuse auto, clear if "`i'"=="length" { continue } regress length `i' matrix a1=r(table) matrix a2=a1[1..6,1..2]' matrix list a2 clear svmat a2, names(matcol) generate name="`i'" in 1 replace name="_cons" in 2 if `z'==1 { export excel using "c:\stuff.xls", sheetmodify cell(a`z') firstrow(variables) } else { export excel using "c:\stuff.xls", sheetmodify cell(a`=((`z'1)*2)+2') } local ++z } //loopFor help on specific commands type: help and then the specific command eg. help import
Stata 12 PDF files of logs and graphs (July 2011) log using c:/log1, replace sysuse auto, clear tab rep78 foreign log close translate c:/log1.smcl c:/log1.pdf , translator(smcl2pdf)Also, in Stata 12 you can produce a PDF of a graph from within Stata. Example sysuse auto, clear scatter mpg weight //, name(g1) graph export c:/graph.pdf //name(windowname)For help on specific commands type: help and then the specific command eg. help translate help graph export
Using value labels for bar graph labels (June 2011)
Automatically sending emails from Stata  Windows platform (May 2011) capture erase kk2.txt log using c:/kklog,text replace set more off forvalues i=1/2000 { //data to run the email program if mod(`i',100)==0 { //<1 tempname fh file open `fh' using kk2.txt, write //<2 file write `fh' "smtpserver = mail.whatever.com.au" _n //<3 file write `fh' "from = myeamail@whatever.com.au" _n //<4 file write `fh' "to = reciever@whatever.com.au" _n //<5 file write `fh' "subject = Test Message" _n //<6 file write `fh' "body = `i' Test Message" _n //<7 file close `fh' !CommandLineEmailer /p:kk2.txt //<8 erase kk2.txt //<9 } log close exit Notes for Option 1: 1. mod(`i',100)==0 determines when an email is to be sent. Other methods can be used. 2. Using Stata's file command you create a text file that contains the instructions to run CommandLineEmailer The text file created was called : kk2.txt 3. The address "mail.whatever.com.au" must be changed to your address. To find this out with Windows Live: Open Windows Live Using pulldown menu: Tools>Accounts Click on: Mail Click on: Properties Click on: the "Servers" tab Find the address at: Outgoing Mail [STMP] The "_n" indicates newline. 4. Change the from email address to that required. 5. Change the to email address to that required. 6. Change the Subject title to that required. 7. Change the message in the body of the email to that required. In the above we have included `i' to indicate the number of loops that have been completed. Other data can be included. 8. Calls the file that will run the above code. ! send commands to your operating system see: help shell CommandLineEmailer: is the file that must first be downloaded. This can be obtained from: http://www.codeproject.com/KB/IP/cpcommandlineemailer.aspx eg. Download compiled utility  6.05 Kb (You must log into to download  free and easy to do) 9. Erase text file so a new one can be written. Option 2 If you require that the log file be emailed to you (or others) when the analysis has been completed. The following can be done: capture erase kk2.txt log using c:/kklog,text replace set more off forvalues i=1/2000 { display "Looping index: `i'" } log close //text file to run CommandLineEmailer tempname fh file open `fh' using kk2.txt, write file write `fh' "smtpserver = mail.tpg.com.au" _n file write `fh' "from = myemail@whatever.com.au" _n file write `fh' "to = email@whatever.com.au" _n file write `fh' "subject = Test Message" _n file write `fh' "body = log sent: `c(current_date)' `c(current_time)'" _n file write `fh' "attachment = c:\kklog.log" _n //<10 file close `fh' !CommandLineEmailer /p:kk2.txt exit Note for Option 2: 10. Attaches the log file to the email. See Option 1 notes above for other details For help on specific commands type: help and then the specific command eg. help obs
Generating a dataset (April 2011) clear all set memory 300m //< allocates 300 megabits of memeory to Stata set obs 1000 //<No. of observations gen y=uniform()*10 forvalues i=1/100 { //<No of variables gen a`i'=uniform()*100 //<cont. variables } summarize (2) Generates a binary variable and continuous variables. clear all set memory 300m set obs 1000 //<No of observations generate y=uniform()<.5 //<binary variables forvalues i=1/100 { //<No. of cont. variables generate a`i'=uniform() //<cont. variables } tabulate y (3) Generates a categorical variable and continuous variables. clear all set memory 300m set more off set obs 1000 //<No of observations generate y=mod(_n,4)+1 //<cat. variable forvalues i=1/10 { //<No of cont. variables generate a`i'=uniform() //<cont. variables } tabulate y For help on specific commands type: help and then the specific command eg. help obs
Printing log files (March 2011)
Working with dates (February 2011) clear input str20 starts_d a "20jan1980" 1 "20jan1981" 2 "20jan1982" 3 "20jan1983" 4 "20jan1984" 5 "20jan1985" 6 "20jan1986" 7 "20jan1987" 8 "20jan1988" 9 end list generate date1=date(starts_d,"DMY") //<Note 1 summarize a if date1>td(25April1985) //<Note 2 Note 1: Generates a new variable (date1) which is the elapsed time in days from a date datum (1 Jan 1960). This variable is numeric. Note 2: Summarizes a subset of the data. The subset being determined by the pseudofunction function td(). The number of observations in the subset are shown under obs. Example using the tin() function Find the number of observations up to a specified date. clear input str20 starts_d a "20jan1980" 1 "20jan1981" 2 "20jan1982" 3 "20jan1983" 4 "20jan1984" 5 "20jan1985" 6 "20jan1986" 7 "20jan1987" 8 "20jan1988" 9 end list generate date1=date(starts_d,"DMY") format date1 %td tsset date1 //<Note 3 list if tin(,25Apr1985) //<Note 4 Note 3: tsset is the command to set the data for time series Note 4: tin() determines the subset of the data. This function allows a lower and upper limit to be specified; the lower limit being on the left and the upper on the right. If the left hand limits is omitted Stata assumes that the lower limit is to be taken from the beginning of the data and conversely if the right hand limit is omitted Stata assumes the end of the dataset. For further help on the above code type the following on the Stata command line: help date() help tsset
Producing Multiple graphs (January 2011) sysuse auto, clear foreach i of varlist _all { capture confirm numeric variable `i' if _rc==0 { histogram `i', name("`i'") } }The Stata "confirm" command checks if the variable is a numeric variable. If it is the Stata prefix "capture" command returns _rc as 0 if not some other value is returned. Then the return code _rc is then checked with the "if" command, if true the histogram is drawn if false the next variable in the "foreach" loop is run. If you do not wish to run all the variables in the dataset the following can be used: sysuse auto, clear graph drop _all // drop existing graphs local a "mpg turn" foreach i of local a { capture confirm numeric variable `i' if _rc==0 { histogram `i', name("`i'") } } exit Alternatively using the Stata's ds command: sysuse auto, clear graph drop _all // drop existing graphs ds , has(type int) return list foreach i of varlist `r(varlist)' { histogram `i', name("`i'") } exit Or sysuse auto, clear graph drop _all // drop existing graphs ds make , not return list foreach i of varlist `r(varlist)' { histogram `i', name("`i'") } exitThis time still using the ds command but excluding the variables that you do not wish to graph with the not options. The default display for multiple graphs is to show each graph in a separate graphics window. To show all the graphs in the one window (tab graphs) the stata setting: autotabgraphs can be set to on eg. set autotabgraphs on Also, when displaying graph in the one graphics window the display can be altered by pulling the tab into the desired part of the window. An example: For further help on the above code, type the following on the Stata command line: help capture help ds help forvalues
Stata 11 PDF (December 2010)
Regular Expressions (November 2010) clear input /// str100 address "1234 West St Blackburn 3000 Vic" "West St 1234 Blackburn 3000 vic" "West St 1234 Blackburn Vic 3000" "West St 1234 Blackburn sa 3000" "12 West St Backburner 2001 nsw" end list //getting postcode generate postcode1=regexs(2) if regexm(address,"(^.*)([09][09][09][09])") //comment: reg1 //get state generate state=regexs(0) if regexm(address,"([Vv][Ii][Cc][Nn][Ss][Ww][Ss][Aa])") //comment: reg2 //You could varify the first number of the postcode matches the state generate check=1 if lower(state)=="vic" & regexm(postcode1,"[09]") & regexs(0)=="3" //comment: reg3 listNotes: reg1: (^.*) means get any text "." zero or more times "*" and the brackets around this indicate a subsection of the string  in this case subsction 1 Subsection 1 is to continue until the last 4 digit number as indicated by: ([09][09][09][09]) reg2: ([Vv][Ii][Cc][Nn][Ss][Ww][Ss][Aa]) requires a match of 3 characters the first character being either V or v and the second character being either I or i etc. if the first 3 characters have not been found then it continues to look for a match with the next group of 3 characters. The "" symbol is a logical OR. reg3: looks for a match of state: lower(state)=="vic" , the lower() function makes sure that we are comparing the states in lower case. regexs(0)=="3" checks the match of the previous statement with the number 3; the correct start of the vic postcode. Assuming that the postcode has been incorrectly coded with the inclusion of alpha characters and needs to be cleaned up. The following is one way of doing this. clear input /// str100 address " 3a00c1 West St Blackburn 3a00c0 Vic" "West St 123 Blackburn 3Re00c1 vic" "West St 123 Blackburn Vic 3f000" "West St 123 Blackburn sa 30jj00" "12 West St Backburner 2001 nsw" end list tempvar a1 a2 a3 gen `a1'="" gen `a2'="" gen `a3'="" local aa "[AZaz]" //assume that the post code is in the second half of the string replace `a1'=regexs(0) if regexm( substr(address,strlen(address)/2,.)," ([3])(`aa'[09])*") //comment: reg4 replace `a2'=regexs(3) if regexm(`a1', " ([3])(`aa'*)([09]*)") replace `a3'=regexs(5) if regexm(`a1', " ([3])(`aa'*)([09]*)(`aa'*)([09]*)") generate code="3"+`a2'+`a3' if `a1'!="" listNotes: reg4: substr(address,strlen(address)/2,.) limits the search to the second half of the string. The space in " ([3]) between the " and (, indicates that a space is require and ([3]) indicates that this must start with the number 3. The second subsection: (`aa'[09])*") looks for lower or uppercase characters OR; as indicated by OR symbol: "", a number. The "*" at the end of the 2nd statements indicates zero or more times. The following is problem that requires the separating of the days, months and years into separate variables. clear input /// str40 dpr "2 yrs 5months 26 days" "3 yrs 2 months" "1yr 9 months" "1 yr 8 months" "1 yr 11 months 28 days" "1 yr 12 days" "3 yrs 3 months12 days" "3yrs 4 months 26 days" "1 yr 9mnths 8 days" end list generate year=regexs(1) if regexm(dpr, "^([09])([years ])") generate months=trim(regexs(1)) if regexm(dpr, "([09][ ]?)m") generate days=regexs(1) if regexm(dpr, "([09]+[ ]?)d") list For further help on the above code, type the following on the Stata command line: findit regular expressions
Stata's profile.do command (October 2010)
Use System variables' _n and _N (September 2010) clear all set obs 10 generate number=_n generate number_T=_N The result of running the above is: ++  number number_T   1.  1 10  2.  2 10  3.  3 10  4.  4 10  5.  5 10   6.  6 10  7.  7 10  8.  8 10  9.  9 10  10.  10 10  ++ **Example 2 Reversing the data so that the _N (last) observation become the first. This done for a particular variable. clear set obs 10 generate number=_n generate rev_number=number[_N_n+1] list The result of running the above is: ++  number rev_nu~r   1.  1 10  2.  2 9  3.  3 8  4.  4 7  5.  5 6   6.  6 5  7.  7 4  8.  8 3  9.  9 2  10.  10 1  ++ **Example 3 Used _N with the bysort command to generate a variable that has the total number of children in families. clear input /// famid child 1 1 2 1 2 2 2 3 3 1 3 2 3 3 3 4 end bysort famid: generate number=_N list, sepby(famid) The result of running the above is: ++  famid child number   1.  1 1 1   2.  2 1 3  3.  2 2 3  4.  2 3 3   5.  3 1 4  6.  3 2 4  7.  3 3 4  8.  3 4 4  ++ **Example 4 _n and _N can also be used as a qualifier. In this example marking ,for each family, the child who has the greatest income. The income variable is in brackets which tells Stata to sort this variable by income. When sorted the last observation (_N) ,by family, is the greatest income for that family. clear input /// famid child income 1 1 100 2 1 150 2 2 200 2 3 250 3 1 10 3 2 100 3 3 500 3 4 250 end bysort famid (income): generate number=1 if _n==_N l, sepby(famid) The result of running the above is: ++  famid child number   1.  1 1 1   2.  2 1 3  3.  2 2 3  4.  2 3 3   5.  3 1 4  6.  3 2 4  7.  3 3 4  8.  3 4 4  ++ **Example 5 Generating lags and leads in the data. clear input /// time sales 1 100 2 150 3 200 4 250 5 10 6 100 7 500 8 250 end generate lead=sales[_n+1] generate lag=sales[_n1] generate lags=(sales[_n1]+sales[_n2])/2 list The result of running the above is: ++  time sales lead lag lags   1.  1 100 150 . .  2.  2 150 200 100 .  3.  3 200 250 150 125  4.  4 250 10 200 175  5.  5 10 100 250 225   6.  6 100 500 10 130  7.  7 500 250 100 55  8.  8 250 . 500 300  ++ For further help on the above code see: Users guide: [U]13.4 System variables ( variables) help bysort
Producing an edited log file (August 2010) //set macros local new "capture log using out1, text replace" local on "capture log using out1,text append" local off "capture log close" sysuse auto, clear `new' *this is a comment `off' //off regress mpg weight `on' //on display "`e(rss)'" `off' //off generate gpm=1/mpg `on' //on *this is GPM summarize gpm `off' //off type out1.log //displays log file before filefilter filefilter out1.log out2.log, from("off'") to(" ") replace filefilter out2.log out3.log, from("`") to(" ") replace filefilter out3.log out4.log, from(" //off") to("") replace filefilter out4.log out5.log, from(".") to("") replace type out5.log //displays log fileFor further help on the above code see: help macro help filefilter help type
The input command (July 2010) clear input /// str15 dates "12/7/2010" "13/7/2010" "14/7/2010" "15/7/2010" "16/7/2010" "17/7/2010" "18/7/2010" "19/7/2010" end list gen date1=1 if inrange(date(dates, "DMY"), date(c(current_date),"DMY"),date(c(current_date),"DMY")+4) list exit After running the above we see the result . list ++  dates date1   1.  12/7/2010 .  2.  13/7/2010 .  3.  14/7/2010 1  4.  15/7/2010 1  5.  16/7/2010 1   6.  17/7/2010 1  7.  18/7/2010 1  8.  19/7/2010 .  ++ . exitErrors in logic can now more easily be spotted and you have saved time by not running the complete data set. When this had been satisfactorily run it could be included in the main do file. For further information on this command see: help input For further help on the above code see: help comments help date help dates help inrange() help creturn list
Splitting the Do Editor (June 2010) Now there are two do editor windows Factor variables and lincom to produce a table (May 2010) set more off cd "C:\data\dupont" //if the data is stored in a different directory change this //to where it has been stored use "5.5.EsophagealCa.dta", clear recode tobacco 3=2 4=3, g(smoke) label define q_smoke 1 "09" 2 "1029" 3 ">=30" label value smoke q_smoke logistic cancer i.alcohol i.smoke i.age [fw=patients] forvalues i=1/4 { //alcohol forvalues j=1/3 { //smoke qui: lincom `i'.alcohol + `j'.smoke, or local a`i'`j'=r(estimate) } } local a11=1 decode alcohol, gen(a) contract a keep a matrix aa=( `a11', `a12', `a13' \ `a21', `a22' ,`a23' \ `a31', `a32' ,`a33'\ `a41', `a42' ,`a43') svmat aa rename aa1 Tobacco_0_9 rename aa2 Tobacco_10_29 rename aa3 Tobacco_30 list exitIn the above the forvalue loop gets the different levels of alcohol and smoke. These are then applied to the factor variables in the lincom command. The returned values from lincom are then stored in a Stata matrix; one at a time. After going through all the combination of alcohol and smoke the matrix is then put into Stata and some labels applied. For more information on the specific commands type help and then the command eg. help lincom
Stata Graphs (April 2010)
Tabdisp (March 2010) sysuse auto, clear contract for rep78 list summarize _freq generate percentage=(_freq/r(sum))*100 tabdisp for rep78, cell(percentage) cellwidth(7) Or if the % symbol is also required: sysuse auto, clear contract for rep78 list summarize _freq generate percentage=(_freq/r(sum))*100 gen freq=string(percentage, "%5.2f") replace freq=freq + "%" tabdisp for rep78, cell(freq) cellwidth(7) If the above is what was required then instead the user written program: tab2way or tab3way could be used There are many other ways to display your data eg. including the words max and min in the table cells sysuse auto, clear contract for rep78 list sort _freq tostring _freq, gen(freq) replace freq=freq+ " Max" in `=_N' replace freq=freq+ " Min" in `=_n' tabdisp for rep78, cell(freq) cellwidth(7) For help on the individual commands type help and then the command name. To download the user written command tab2way or tab3way , type: ssc install tab2way or ssc install tab3way
Tables to spreadsheet (February 2010) sysuse auto, clear //table in offical stata table for rep78 , c(mean price) The following gives us what we want but does not allow the output to be put into a spreadsheet sysuse auto, clear collapse (mean) price, by(foreign rep78) list tabdisp foreign rep78 , c(price) This time getting the table into a Stata data set so it can be exported to a spreadsheet This method has the advantage that the colum and row labels are also included sysuse auto, clear collapse (mean) price, by(foreign rep78) list drop if rep78==. reshape wide price, i(foreign) j(rep78) //because the data is in long form it can be reshape // into the required table list outsheet using c:/table, replace //outputting the table to a form that can be read with a spreadsheet This time using Mata to manipulate the initial data sysuse auto, clear collapse (mean) price, by(foreign rep78) fillin foreign rep78 drop if rep78==. sort for rep78 list mata: //start of Mata a=st_data(.,.) a s=J(2,6,.) s for(i=1; i<=10; i++) { r=a[i,2] c=a[i,1] s[r+1,c]=a[i,3] }As you can see there are a number of different ways of getting table information out of Stata. For help on the individual commands type help and then the command name. To download the user written command mm_outsheet, type: ssc install moremata
Tables to spreadsheet (January 2010)
For more information see: see help for the specific command User written table output commands include: tabout logout esttab
Point Estimates for a Regression (December 2009) Doing thing quietly in Stata (November 2009) summarize mpg, detail
}
local a=r(mean) summarize price, detail local a=r(mean) If you wish to see specific output in a quiet block you can add noisily to this Example: sysuse auto, clear quietly { summarize mpg, detail
}local a=r(mean) noisily summarize price, detail local a=r(mean) For more information see: help quietly
Graphing functions (October 2009)
Stata 11  Variable manager (September 2009)
Getting a Subset of a large dataset into Stata (August 2009)
Capture (July 2009)
Transparent Graphs (June 2009)
sysuse auto, clear twoway /// (histogram mpg if rep78==3, fcolor(green)) /// (histogram mpg if rep78==4, fcolor(blue)) graph export c:/hist.wmf, replace Then in Word 2003 Insert>Picture>from file and then c:/hist.wmf Click on graph Edit picture Right Click on a bar that you wish to make transparent Format AutoShape>Color and lines tab>Fill section and the move the transparency slider to 50% and press OK Continue to edit all the bars this way. The legend can also be modified as per above Save Also see: http://www.stata.com/statalist/archive/200904/msg00574.html http://www.stata.com/statalist/archive/200904/msg00612.html
Getting Stata's Graph editor commands into Stata graphs (May 2009)
Weaving Stata results into a Word Report (April 2009)
Stopping Stata during the running of a do file (March 2009)
Putting Greek symbols in graphs (February 2009)
scatter weight mpg, title( Example of Greek characters in a Graph `=char(238)' `=char(243)' `=char(236)' ) Or the Stata graphics Editor can be used to include Greek symbols For more information see: Data Management Manual: char(n) For an article on char() See http://www.statajournal.com/sjpdf.html?articlenum=dm0006
Doing things by levels of a variable (January 2009)
Automation of Tables in Stata (December 2008)
Memory usage in Stata (November 2008)
Stata Comment (October 2008)
Stata user written graphs (September 2008)
Stata tables (August 2008)
Sending Command(s) to the Stata Do Editor from the Stata Review Window (July 2008)
Then: Right clicking the mouse button and selecting send to dofile editor The Do editor will then open with the highlighted command(s)in it. To run this using the Do Editor pulldown memu select: Tools>Do or using the icon (in Stata 10 this is the icon on the far right) or save this file and run from the command line eg. Saving this as c:/dofile and run by typing do c:/dofile on the Stata commandline. For more details the Stata command type the following on the Stata commandline: help do
Creating a Stata dataset from multiple Excel worksheets (June 2008)
In this example the Excel file is called book2 and is in c:/ drive. The file has two work sheets: kk1 and kk2 odbc clear tempfile kka odbc load, dsn("Excel Files;DBQ=c:\book2.xls") table("kk1$") save `kka' list clear odbc load, dsn("Excel Files;DBQ=c:\book2.xls") table("kk2$") list append using `kka' list exit Also see: http://www.ats.ucla.edu:80/stat/stata/faq/odbc.htm Using Stat/Transfer 9 With Stat/Transfer this would be done as follows: Open tab: option 3 And then tick "concatenate worksheet pages" Stata with the append command Save each Excel worksheet as a csv in Excel. In this example c:/book2_kk1.csv and c:/book2_kk2.csv are the two files created insheet using c:/book2_kk1.csv, clear save c:/book2_kk1 list clear insheet using c:/book2_kk2.csv, clear list append using c:/book2_kk1 list For more details the Stata commands type the following on the Stata commandline: help append help insheet
catplot (May 2008)
Stata Users' Group Meeting Proceedings (April 2008)
Programming Stata  learning by examples (March 2008)
Tutorial 1  do files Tutorial 2  macros Tutorial 3  loops Tutorial 4  if statement Tutorial 5  incrementing, _n, and _N Tutorial 6  local extended macros More tutorials will be added in the following weeks Also see: Stata 10 Users Guide Stata 10 Programming Manual The Stata Journal (2005) Nicholas J. Cox "Suggestions on Stata programming style" 5, Number 4, pp. 560566 Nicholas J. Cox The Stata Journal (2002) Nicholas J. Cox "How to face lists with fortitude" 2, Number 2: pp. 202222 click here Nicholas J. Cox Stata Netcourse NC151 "Introduction to Stata programming" Stata Netcourse NC152 "Advanced Stata programming" (Back issues of the Stata journal can be purchase from Survey Design and Analysis  contact details below) (To enroll in a Stata Netcourse please contact us)
Mata  learning by examples (February 2008)
Tutorial 1 Getting Data in Mata Tutorial 2 Looping, If statement and examples Tutorial 3 Subscripting matrics Tutorial 4 string and numerical matrices, getting a mata matrix into Stata Tutorial 5 Mata functions Tutorial 6 Mata pointers and Mata optimize Tutorial 7 Mata matrix maths and Solving simultaneous equation Also see: Stata 10 Mata manuals (The entire Mata manual can be found in Stata's online help for Mata eg. help Mata The Stata Journal (2007) William Gould (2004) "Mata Matters: Structures", 7, Number 4, pp. 556 – 570 The Stata Journal (2007) William Gould (2004) "Mata Matters: Subscripting", 7, Number 1, pp. 106 – 116 The Stata Journal (2006) William Gould (2004) "Mata Matters: Precision", 6, Number 4, pp. 550 – 560 The Stata Journal (2006) William Gould (2004) "Mata Matters: Interactive use", 6, Number 3, pp. 387 – 396 The Stata Journal (2006) William Gould (2004) "Mata Matters: Creating new variables–sounds boring, isn't", 6, Number 1, pp. 112 – 123 The Stata Journal (2005) William Gould (2004) "Mata Matters: Using views onto the data", 5, Number 4, pp. 567 – 573 The Stata Journal (2005) William Gould (2004) "Mata matters: Translating Fortran.", 5, Number 3, pp. 421 – 441 (Back issues of the Stata journal can be purchase from Survey Design and Analysis  contact details below)
Stata's display command (January 2008)
Creating a binary variable from a continuous variable (December 2007)
generate dummy1=0 generate dummy1= mpg <=25 this works because mpg <=25 is either true or false. Stata qualifiers evaluates to 1 if true and 0 if false. If the variable that is part of the qualifier contains missing values then include the if condition: !=missing() eg. generate dummy1= mpg <=25 if !=missing(mpg) Other ways of creating dummy variables can be found at:Stata FAQ Also see: What is true and false in Stata?
New subcommand for listing user written commands (November 2007)
Undocumented commands (October 2007)
User written program  examples (September 2007)
Setup . sysuse auto, clear Create highrep78 containing the value of rep78 if rep78 is equal to 3, 4, or 5, otherwise highrep78 contains missing (.) . egen highrep78 = anyvalue(rep78), v(3/5) List the result . list rep78 highrep78 To see a description of examples type the following on the Stata command line when online ssc describe examples To install examples, type the following on the Stata command line when online ssc install examples
Settings for Stata (August 2007)
Copy as picture  Copying from the results windows to Word and Excel (July 2007)
Estout  Stata Regression Tables (June 2007)
Adoupdate (May 2007)
Nested Do file (April 2007)
Disadvantages:
To learn more see: Stata 9 Users guide [U] 16.2 and [U]16.6.2 Max. depth of nested do files, in Stata type help limits
Personal help file (March 2007)
spmap  Visualization of Spatial Data (February 2007)
stcmd  Using Stat/Transfer within Stata (January 2007)
encode (December 2006)
(Note: when Stata encodes it produces a value label: to see this type label list ) If this is not the encoding that you require a way around this is to define a value label first and then use the label options for encode. Eg. If you have: var1 a b c But would like var1a encode like: var1a 3 1 2 You would first define the value label eg. label define preference1 a 3 b 1 c 2 And then applying this using the encode command encode var1, label(preference1) gen(var1a) Resulting in:
The code to run the above: clear input str1 var1 a b c end label define preference1 3 a 1 b 2 c encode var1, label(preference1) gen(var1a) label list list, nolab For more information see: Stata 9 Data Management manual
kdensity (November 2006)
Intermediate graph commands (October 2006)
MATA (September 2006)
numlabel (August 2006)
viewsource (July 2006)
datasignature  Determine whether data have changed (June 2006)
Simple Thematic Mapping(May 2006)
Separate (April 2006)
Macro Expressions (March 2006)
inlist() & inrange() functions (Febuarary 2006)
set trace on (January 2006) with our data a section of the trace will look like this: The first line is the line being executed. It has a  in front of it to indicate it is being executed. The second line is after macro substitution has occurred. It has a = in front of it to indicate that substitution has occurred. See also the user written command: trTo install this command: ssc install tr For more information on trace see the Stata 9 programming manual. Also see the Stata command pause.
Dofile Editor(December 2005) Regular Expressions(November 2005) Regular expressions allow the matching of complex text patterns. Regular expression commands have been included in
Stata 9 with the commands: For example clear input /// str25 date "12jan2003" "1april1995" "17may1977" "02september2000" end listThe following could the used: gen day=regexs(1) if regexm(date, "(^[09]+)") breaking this down: regexm: match expression ^: start at the beginning (LHS)of the string [09]: the first character to be any numbers 0 to 9 +: one or more of the previous ie. characters between 0 and 9 (stops when a letter comes up eg. j of jan) ( ): the brackets around indicates the subexpression. In this case there is only one group hence regexs uses 1 regexs(): returns subexpression ie. first subexpression The ouput from the regular expression: . gen day=regexs(1) if regexm(date, "(^[09]+)") . list ++  date day   1.  12jan2003 12  2.  1april1995 1  3.  17may1977 17  4.  02september2000 02  ++ Another example: We have some text that includes citations. We wish to create a new variable that contains the text of the last citation. In this case the last citation is not at the end of the text so it is useful to reverve the text and then look for the desired pattern. clear input /// id str200 cit_1 1 "EP696218A  WO9215370A SUND _SUNDIndividual_" 2 "WO9425112A  GB298635A" 3 "EP578126A  CH180906A AGE_OK" 4 "EP562128A  DE1684639A" 5 "WO9318277A  DK137935B" 6 "US4434855A SEC OF NAVY _USNA_" end list gen kk1=reverse(regexs(1)) if regexm(reverse(cit_1), "([AZ][][09]+[AZ]+)") listFor a FAQ on regular expression go here
Text Editors(October 2005)
The function sum() (September 2005) sum(x) returns the running sum of x. A basic use of sum() would be: generate running_tot =sum(1) Another example of the use of sum() is: given the data below you need to create a new id var2 1 71 7 1 7 1 7 1 7 1 7 1 8 1 8 2 8 2 8 2 1 bysort id: gen running _tot=sum(var2[_n]!=var2[_n1]) further information can be found by typing help sum() on the Stata command line
clonevar (July 2005) Stata 9 has a useful commands that generates an exact copy of an existing variable. eg clonevar MPG=mpg for more information see help clonevar
Docking Stata 9 Windows (June 2005) Getting the path and file name onto the Stata command
line (March 2005) Tabout  a user written command
(February 2005) ssc install tabout (or: ssc install tabout, replace). To make learning the syntax easy, an example file which can be used as a tutorial is available here
window command (Feburary 2005) The window command can be be useful for adding your frequently used commands to the pull down menu, pushing commands to the review window and displaying the current file in the top left hand corner of the Stata window and a lot more. To have your current file name displayed on the Stata window you can add the following to your do file: window manage maintitle "`c(filename)'" See your programming manual for further details on the window command
ds  Describing Variables and Saving Results (January 2004) ds lists the variable names of the dataset currently
in memory in a compact form. The command is useful if you require a list of
variables that satisfies certain criteria. The list that results is saved in
r(varlist) which can be used in other commands
eg. use "c:/stata8/auto.dta",
clear See describe in the Stata reference manual for
more details.
WORKING IN ROWS (December 2004) The egen command has a number of functions that make it easier to work with data in rows. Rather than using xpose or reshape to convert the data to columns these commands may be able to be used. Egen's row functions" rfirst(varlist) rlast(varlist) rmax(varlist) rmean(varlist) rmin(varlist) rmiss(varlist) robs(varlist) [, strok] String variables may not be specified unless option strok is also rsd(varlist) rsum(varlist)
A REMINDER TO START A LOG (November 2004) Would you like to be reminded to start a log each time that you start Stata. One way of doing is this is to include the command below in your profile.do file db log For information on profile see the GETTING STARTED MANUAL  More on starting and stopping Stata
Version Control (October 2004) PROBLEM: Stata is continually being improved, meaning programs and dofiles written for older versions might stop working. SOLUTION: Specify the version of Stata you are using at the top of programs and dofiles that you write:  myprog.do  use mydata, clear
For further information see the Stata programming manual
Assert (September 2004) Assert is a useful command for verifying your data. e.g.. assert sex=="Male"  sex=="Female" assert mpg<50 & mpg>10 Also see Stata reference manual for further information.
