Home

Products

  • Overview of Stata
  • Why buy Stata?
  • Stata Journal
  • Stat/Transfer
  • Prices

    Australia

  • New purchases
  • Upgrade
  • Bookshop
  • GradPlan


  • New Zealand

  • NZ - New purchases
  • NZ - Upgrade
  • NZ - Bookshop
  • NZ - GradPlan
  • Support

  • Starting Stata
  • Stata tips - General
  • Stata tips - Graphs
  • Stata tips - Tables
  • Technical
  • Stata Courses & Training


  • Order form

    Contact us

    Tips for using Stata

    This document describes some tips to enhance your efficient use of Stata. We will keep adding tips to the top of our home page to encourage you to visit it each month! We will move the monthly tips to the bottom of this page when we place new tips on our home page.

    Our bookshop has several publications to assist in learning Stata data management and analyses.

    Options
    One of the strengths of Stata is the system of options, typed after a comma. If you list your data you see the value labels of the variables. If you add the nolabel option after a comma you will see the underlying code values.
    list var1 var7-var10 var3 if gender==1
    To see the underlying value codes rather than labels, add the nolabel option after a comma:
    list var1 var7-var10 var3 if gender==1, nolabel
    Note that you can list the variables in any order that you define.
    BTW, the comma is a toggle. If used a second time it turns off the options. We could have writen the above command as:
    list var1 var7-var10 var3, nolabel, if gender==1

    Edit and browse
    You know that you can use the Editor button to invoke a spreadsheet format for entering or changing data. However, if you type the command edit you can limit what you see to a few variables, in any order that you define:
    edit var1 var7-var10 var3 if gender==1
    To see the value codes rather than labels, add the nolabel option after a comma:
    edit var1 var7-var10 var3 if gender==1, nolabel
    When you want to leave the Editor, Stata checks that you want to preserve the changes you made.

    Do Editor
    The Do-file Editor is very handy, invoked from a menu button or by typing doedit. You can enter several lines or insert another file, such as one of your earlier *.do files. You can select a line in the Do Editor and Do only that line. Or Do from that line to the end of the set of instructions. Or select several lines and Do them. At the end you can save the contents of the Do Editor as a *.do file.

    Folders
    Type adopath or sysdir to see the location of various Stata folders for your main files, updates, STB files, personal ado files, etc.

    Statalist
    There is a Stata list server with useful advice about Stata, including new programs to help with special problems. If you subscribe to Statalist you get messages throughout the day. If you subscribe to Statalist-digest you get a single file with all the messages once each day. See http://www.stata.com/support/statalist/faq

    The Stata Journal and Stata Technical Bulletin (STB)
    These publications contain supplementary information about Stata commands and the use of Stata in research. The first issue of The Stata Journal was released at the end of 2001 The last issue of the Stata Technical Bulletin (#61) was in May 2001.
    From within Stata you can see what STB procedures are available and download the ones that interest you. From the Help menu, select STB and User-written Programs. After that, choose the hypertext links (clickable blue words in the help window) for the Stata site then then click on stb
    For a complete list of all STB articles, see http://www.stata.com/info/products/stb/stbvols.html



    Useful links
    The Stata resources web page is worth a look at. It has links to free downloadable tutorial etc.
    http://www.stata.com/links/resources1.html

    UCLA graphics page using Stata may also be of interest:
    http://www.ats.ucla.edu/stat/stata/Library/GraphExamples/default.htm

    Setting up docked windows (Great for learning to set up Stata 9 windows)
    http://www.ats.ucla.edu/stat/stata/faq/stata9gui/dockfloatpin.html

    These are tutorials for learning Stata
    http://data.princeton.edu/stata/

    http://dss.princeton.edu/online_help/stats_packages/stata/

    Stata Programming
    http://www.stata.com/meeting/11uk/baum.pdf


    Tips from our home page

    The following tips were initially presented on our home page. To see the current monthly tip click here

     

    Stata Tips



    List of past tips

    Working with Dates 2 December 2011
    Working with Dates 1 November 2011
    Doing things by levels of a variable October 2011
    Speeding up Stata - the if statement September 2011
    Stata 12's new Excel command August 2011
    Stata 12 PDF files of logs and graphs July 2011
    Using value labels for bar graph labels June 2011
    Automatically sending emails from Stata - Windows platform May 2011
    Generating a dataset April 2011
    Printing log files March 2011
    Working with dates February 2011
    Producing Multiple graphs January 2011
    Stata 11 PDF December 2010
    Regular Expressions November 2010
    Stata's profile.do command October 2010
    Use System variables' _n and _N September 2010
    Producing an edited log file August 2010
    The input command July 2010
    Splitting the Do Editor June 2010
    Factor variables and lincom to produce a table May 2010
    Stata graphs April 2010
    Tabdisp March 2010
    Tables to spreadsheet February 2010
    Tables to spreadsheet January 2010
    Point Estimates for a Regression December 2009
    Doing thing quietly in Stata November 2009
    Graphing functions October 2009
    Stata 11 - Variable manager September 2009
    Getting a Subset of a large dataset into Stata August 2009
    Capture July 2009
    Transparent Graphs June 2009
    Getting Stata's Graph editor commands into Stata graphsMay 2009
    Weaving Stata results into a Word ReportApril 2009
    Stopping Stata during the running of a do fileMarch 2009
    Putting Greek symbols in graphs February 2009
    Doing things by levels of a variable January 2009
    Automation of Tables in Stata December 2008
    Memory usage in Stata November 2008
    Stata Comment October 2008
    Stata user written graphs September 2008
    Stata tables August 2008
    Sending Command(s) to the Stata Do Editor from the Stata Review Window July 2008
    Creating a Stata dataset from multiple Excel worksheets June 2008
    catplot May 2008
    Stata Users' Group Meeting Proceedings April 2008
    Programming Stata - learning by examples March 2008
    Mata - learning by examples February 2008
    Stata's display command January 2008
    Creating a binary variable from a continuous variable December 2007
    New subcommand for listing user written commands November 2007
    Undocumented commands October 2007
    User written program - examples September 2007
    Settings for Stata August 2007
    Copy as picture - Copying from the results windows to Word and Excel July 2007
    Estout - Stata Regression Tables June 2007
    Adoupdate May 2007
    Nested Do file April 2007
    Personal help file March 2007
    spmap -- Visualization of Spatial Data February 2007
    stcmd - Using Stat/Transfer within Stata January 2007
    encode December 2006


    Working with Dates 2(December 2011)

    A problem that comes up from time to time is where, say hospital wards (could also be hospital beds, cars, hotel rooms,
    machines etc.) are used for a patient of minutes/hours/days and management wishes to know at the end
    of the month how many minutes each ward is used for.

    An example:

    
    
    clear
    
    input ///
      str30 date_in     str30 date_out   ward
    "7/22/2011 22:59" "7/27/2011 10:12"   1 
    "8/27/2011 12:05" "8/27/2011 21:07"   2 
    "8/27/2011 10:46" "8/28/2011 19:45"   1 
    "8/28/2011 15:34" "8/28/2011 16:43"   2
    "8/28/2011 23:24" "8/29/2011 13:43"   1
    "8/27/2011 14:32" "8/28/2011 15:15"   2
    "8/28/2011 09:43" "8/28/2011 17:49"   1
    "8/28/2011 01:33" "8/28/2011 02:32"   2
    "8/28/2011 04:43" "8/29/2011 05:53"   1
    "8/31/2011 07:30" "8/31/2011 08:11"   2
    end
    
    list
    set more off
    
    split date_in, gen(kk)                       // (1)
    split date_out, gen(zz)
    
    generate double date_in2=date(date_in,"MDY hm")   // (2)
    format date_in2 %td  
    
    generate double date_out2=date(date_out,"MDY hm")
    format date_out2 %td  
    
    summarize date_in2                           // (3)
    local d1=r(min)                              // (3)
    
    summarize date_out2                          // (3)
    local d2=r(max)                              // (3)
    
    local range=`d2'-`d1'                        // (4) 
     
    forvalues i=0/`range' {                      // (5)
      local kk=`d1'+`i'                          // (6)
      generate datea`kk'=1 if inrange(`d1'+`i', date_in2,date_out2)  // (7)
      
      label var datea`kk' `="`=day(`d1'+`i')'"+ "_"+     ///         // (8)
      "`=month(`d1'+`i')'"+ "_"+"`=year(`d1'+`i')'"'
    }
    
    generate id=_n                                // (9)
    reshape long datea, i(id) j(datekk) string    // (10)
    
    bysort id:gen double datea1=sum(datea) if !missing(datea)
    bysort id:replace datea=sum(datea) if !missing(datea)
    
    levelsof id, local(id1)
    
    foreach i of local id1 {
      summarize  datea if id==`i'
      replace datea1=24*60 if datea!=`r(min)' & datea!=`r(max)'      ///
      & id==`i' & !missing(datea)
    
      replace  datea1=24*60-(clock(kk2,"hm")/(1000*60)) if           ///
      datea==`r(min)' & id==`i' & !missing(datea)
    
      replace  datea1=clock(zz2,"hm")/(1000*60) if                   ///
      datea==`r(max)' & id==`i' & !missing(datea)
    
    //enter and discharge the same day
      replace  datea1=(clock(zz2,"hm")-clock(kk2,"hm"))/(1000*60) if ///
      datea==`r(max)' & datea==`r(min)' & id==`i' & !missing(datea)
    }
    
    collapse (sum) datea1, by(ward datekk)          // (11)
    destring datekk, gen(date)                      // (12)
    format date %td
                
    rename datea1 time                              // (13)
    label var time "time in minutes"                // (14)
    
    list, sepby(ward)                               // (15)
    
    Going through the above:

    (1) Split string dates into day and time
    (2) Date/time; input as strings are converted to elasped time (numbers of milliseconds from a datum). (3) The minimum and maximum dates are obtained with the summarize command
    and saved in local macros.
    (4) The range is calculated and saved in a local macro.
    (5) Using the forvalues command the days of the range are looped through.
    (6) The date, in elasped days is calculated.
    (7) A new variable for each day is calculated and 1 included in the observation where
    the loop date is in the range indicated by the inrange() function.
    (8) The newly created variable is give a label; which is the loop date.
    (9) Generate a unique id value to be used by the reshape command.
    (10) Reshape the data from wide to long data format.
    (11) Collapse the data to give the required results.
    (12) Use the destring command to convert a string variable to a numeric variable.
    (13) Rename variable.
    (14) Include a variable label.
    (15) Finally, list the results.


    For help on specific commands type:
    help
    and then the specific command eg.
    help input
    help generate
    help summarize
    (the saved results from the summarize command can be seen be typing: return list after the summarize command
    help macro
    help forvalues
    help collapse
    help destring
    help rename
    help label
    help list





    Working with Dates (November 2011)

    A problem that comes up from time to time is where, say hotel rooms (could also be hospital beds, cars,
    machines etc.) are booked for a number of days by the one person and management wishes to know at the end
    of the month how many rooms for each day were occupied.

    An example:

    
    
    clear
    set more off
    
    input ///
    str30 date_in str30 date_out
    "7/22/2011" "8/27/2011" 
    "8/27/2011" "8/27/2011" 
    "8/27/2011" "8/28/2011" 
    "8/28/2011" "8/28/2011" 
    "8/28/2011" "8/29/2011" 
    "8/27/2011" "8/28/2011" 
    "8/28/2011" "8/28/2011" 
    "8/28/2011" "8/28/2011" 
    "8/28/2011" "8/29/2011" 
    "8/31/2011" "8/31/2011" 
    "8/31/2011" "8/31/2011" 
    "8/31/2011" "9/4/2011" 
    "8/23/2011" "8/23/2011" 
    "8/23/2011" "8/24/2011" 
    "8/24/2011" "9/15/2011" 
    "8/4/2011" "8/4/2011" 
    "8/4/2011" "8/8/2011" 
    "8/10/2011" "8/10/2011" 
    "8/10/2011" "8/17/2011" 
    end
    
    list
    
    generate date_in1=date(date_in,"MDY")        // see (1) below
    generate date_out1=date(date_out,"MDY")      // (1)
    
    format date_in1 date_out1 %td                // (2)
    
    summarize date_in1                           // (3)
    local d1=r(min)                              // (3)
    
    summarize date_out1                          // (3)
    local d2=r(max)                              // (3)
    
    local range=`d2'-`d1'                        // (4) 
     
    forvalues i=0/`range' {                      // (5)
      local kk=`d1'+`i'                          // (6)
      gen datea`kk'=1 if inrange(`d1'+`i', date_in1,date_out1)  // (7)
      
      label var datea`kk' `="`=day(`d1'+`i')'"+ "_"+     ///    // (8)
      "`=month(`d1'+`i')'"+ "_"+"`=year(`d1'+`i')'"'
    }
    
    generate id=_n                                // (9)
    reshape long datea, i(id) j(datekk) string    // (10)
    
    collapse (sum) datea, by(datekk)              // (11)
    destring datekk, replace                      // (12)
    format datekk %td
    rename datea rooms_oc                         // (13)
    label var rooms_oc "rooms occupied"           // (14)
    list, sep(0)                                  // (15) 
    
    Going through the above:

    (1) Dates; input as strings are converted to elasped time (numbers of days from a datum).
    (2) Dates are formated.
    (3) The minimum and maximum dates are obtained with the summarize command
    and saved in local macros.
    (4) The range is calculated and saved in a local macro.
    (5) Using the forvalues command the days of the range are looped through.
    (6) The date, in elasped days is calculated.
    (7) A new variable for each day is calculated and 1 included in the observation where
    the loop date is in the range indicated by the inrange() function.
    (8) The newly created variable is give a label; which is the loop date.
    (9) Generate a unique id value to be used by the reshape command.
    (10) Reshape the data from wide to long data format.
    (11) Collapse the data to give the required results.
    (12) Use the destring command to convert a string variable to a numeric variable.
    (13) Rename variable.
    (14) Include a variable label.
    (15) Finally, list the results.


    For help on specific commands type:
    help
    and then the specific command eg.
    help input
    help generate
    help summarize
    (the saved results from the summarize command can be seen be typing: return list after the summarize command
    help macro
    help forvalues
    help collapse
    help destring
    help rename
    help label
    help list





    Doing things by levels of a variable (October 2011)

    levelsof is a useful Stata command for doing something by levels of a variable. For example producing a histogram of mpg by levels of the variable foreign eg.

    
    clear
    sysuse auto, clear
    levelsof for, local(level)
    
    foreach i of local level {
      histogram mpg if for==`i', name(a`i')
    }
    


    However, levelsof fails when there are many levels, as can be seen from the snipit of code:

    
    clear
    set more off
    set obs 100000
    gen a=_n
    levelsof a, local(aa) 
    


    The levelsof help file states that this command is best used if the number of levels is modest.

    What to do if the number of levels exceeds the limit?

    The following are 2 methods:

    Method 1

    This method contracts the variable that the levels are required for and then merges it with the dataset, hence the levels are contained in the Stata dataset:

    Example:
    
    
    sysuse auto, clear
    expand 10000
    graph drop _all
    
    preserve
    
    contract mpg
    rename mpg levels
    save c:/kk, replace
    
    restore
    
    merge 1:1 _n using c:/kk
    
    drop _freq
    drop _merge
    
    sum levels
    
    forvalues i=1/`=r(N)' {
        scatter price weight if mpg==mpg[`i'], name(a`i')
    }
    
    exit
    


    Method 2
    Using Mata to get the levels of a variable

    
    sysuse auto, clear
    graph drop _all
    expand 10000
    set more off
    
    mata:
      a=uniqrows(st_data(.,"mpg"))
      a
        for(i=1;i<=rows(a);++i){
           st_local("i1",strofreal(a[i]))
           stata("scatter price weight if mpg=="+st_local("i1")+", name(a" + st_local("i1")+")")
        }
    end
    


    For help on specific commands type:
    help
    and then the specific command eg.
    help levelsof
    help contract
    help mata
    help mata st_local()
    help mata stata
    help mata unique





    Speeding up Stata - the if statement (September 2011)

    Stata is fast but it can be sped up by taking a close look at the way your Stata commands have been coded. In this tip we will look at the if qualifer.

    The if qualifier statement is computationly intensive and adds considerable time to the running of a command that includes this. However there are certain circumstanes where this can be replaced and hence Stata's running time reduced.

    Example 1
    This shows how you would normally run a number of regressions eg. just adding the qualifiers behind the regress command.

    
    // creating a data set
    clear
    timer clear   
    //creeat data
    set obs 10000000
    gen a=uniform()
    gen b=uniform()
    gen c=uniform()
    save c:/exp1, replace
    clear
    
    //Example 1
    //running regressions
    timer on 1
    use c:/exp1
    regress a b c if c<.5 
    regress a b c if c<.5 
    regress a b c if c<.5 
    timer off 1
    timer list
    
    //the timer gives the following results:
    . timer list
       1:     20.25 /        1 =      20.2500
    

    Example 2
    The above example's comands have been modified to bring in only the required observations
    (the ones that satisfy the qualifier). To do this we use the 2nd syntax of the use command.
    
    clear
    timer on 2
    use if c<.5 & b<.5 using c:/exp1
    regress a b c  
    regress a b c 
    regress a b c  
    //use c:/exp1
    timer off 2
    
    timer list
    //the timer gives the following results:
    . timer list
       1:     20.25 /        1 =      20.2500
       2:      9.75 /        1 =       9.7500
    
    As you can see example 2 runs considerably faster than example 1


    Example 3
    Another way of speeding up Stata is to create a variable where 1 equals
    the observations that are to be included in the regression and then use a
    less complex if statement.

    
    clear
    timer on 3
    use c:/exp1
    mark  a1 if c<.5 & b<.5
    
    regress a b c  if a1
    regress a b c  if a1
    regress a b c  if a1
    timer off 3
    timer list
    
    //the timer gives the following results:
       1:     20.25 /        1 =      20.2500
       2:      9.75 /        1 =       9.7500
       3:     17.28 /        1 =      17.2820
    
    


    For help on specific commands type:
    help
    and then the specific command eg.
    help use
    help mark





    Stata 12's new Excel command (August 2011)

    With Stata 12 there are some new commands that make getting tables into an Excel spreadsheet easier.

    Stata 12 returns a matrix of the regression table in r(table) to see this do a regression and type:
    matrix list r(table)

    Stata 12 has a command for exporting into data in an Excel file eg. export Excel This command can be access via GUI eg. File>Export>Excel spreadsheet or via the commandline. To see the syntax type:
    help import_excel


    The following is an example of getting regression results into an Excel spreadsheet.

    
    clear all
    
    sysuse auto, clear
    set more off
    
    ds, not(type string)
    
    capture erase "c:\stuff.xls"
    
    local z=1
    
    foreach i of varlist `r(varlist)' {
      sysuse auto, clear
        if "`i'"=="length" {
          continue
         }
    
    regress length `i'
    
    matrix a1=r(table)
    matrix a2=a1[1..6,1..2]'
    matrix list a2
      clear
       svmat a2, names(matcol)
       generate name="`i'" in 1
        replace name="_cons" in 2
    
    if `z'==1 {
      export excel using "c:\stuff.xls", sheetmodify cell(a`z') firstrow(variables)
    }
    else {
      export excel using "c:\stuff.xls", sheetmodify cell(a`=((`z'-1)*2)+2') 
    }
    local ++z
    } //loop
    
    For help on specific commands type:
    help
    and then the specific command eg.
    help import





    Stata 12 PDF files of logs and graphs (July 2011)
    In Stata 12 log files are still output as either SMCL or text. However, in Stata 12 these log files can be converted into PDF files. This can be easily done with the Stata translate command for example:

     
    
    log using c:/log1, replace
    sysuse auto, clear
    tab rep78 foreign
    log close
    
    translate c:/log1.smcl  c:/log1.pdf , translator(smcl2pdf)
    
    
    Also, in Stata 12 you can produce a PDF of a graph from within Stata. Example
     
    
    sysuse auto, clear
    scatter mpg weight  //, name(g1)
    graph export c:/graph.pdf  //name(windowname) 
    
    
    For help on specific commands type:
    help
    and then the specific command eg.
    help translate
    help graph export





    Using value labels for bar graph labels (June 2011)
    It is sometimes more convenient to use value lables rather than the graph relabel options to change graph bar labels. In the example below using value labels also allows the legend to be spread over the width of the graph.

    An example:

    clear all
    sysuse auto

    label define origin 0 "Europe de l`=char(146)'Ouest" ///
    1 "Asie de l`=char(146)'Est", modify

    graph hbar mpg trunk turn, over(foreign) ///
    legend(row(1) span) stack name(two,replace)



    For help on specific commands type: help and then the specific command eg. help label


    Automatically sending emails from Stata - Windows platform (May 2011)

    If you are running a large model and wish to know how Stata is progressing or would like a log file emailed to you or others when Stata has finished a do file or would like Stata to send out emails based on a program that you write, then the following can be used.

    To do this a program called CommandLineEmailer must be downloaded. (This is not a Stata program) Intructions to download this are at Notes for Options 1, 8. below. To run CommandLineEmailer a small text file is written in stata.

    Options 1

    Getting Stata to automatically send an email to indicate progress in the running of a do file:

     
    
    capture erase kk2.txt
    
    log using c:/kklog,text replace
    set more off
    
    forvalues i=1/2000 {
    
    //data to run the email program
    
    if mod(`i',100)==0 {                                               //<--1
      tempname fh
      file open   `fh' using kk2.txt, write                            //<--2
      file write  `fh' "smtpserver = mail.whatever.com.au" _n          //<--3
      file write  `fh' "from       = myeamail@whatever.com.au" _n      //<--4
      file write  `fh' "to         = reciever@whatever.com.au" _n      //<--5
      file write  `fh' "subject    = Test Message" _n                  //<--6
      file write  `fh' "body       = `i' Test Message" _n              //<--7
      file close  `fh'
      !CommandLineEmailer /p:kk2.txt                                   //<--8
      erase kk2.txt                                                    //<--9
    }
    
    log close
    
    exit
    
    
    


    Notes for Option 1:

    1. mod(`i',100)==0 determines when an email is to be sent. Other methods can be used.

    2. Using Stata's file command you create a text file that contains the instructions to run CommandLineEmailer The text file created was called : kk2.txt

    3. The address "mail.whatever.com.au" must be changed to your address. To find this out with Windows Live:
    Open Windows Live
    Using pulldown menu: Tools>Accounts
    Click on: Mail
    Click on: Properties
    Click on: the "Servers" tab
    Find the address at: Outgoing Mail [STMP]

    The "_n" indicates newline.

    4. Change the from email address to that required.

    5. Change the to email address to that required.

    6. Change the Subject title to that required.

    7. Change the message in the body of the email to that required. In the above we have included `i' to indicate the number of loops that have been completed. Other data can be included.

    8. Calls the file that will run the above code.
    ! send commands to your operating system see: help shell
    CommandLineEmailer: is the file that must first be downloaded. This can be obtained from:
    http://www.codeproject.com/KB/IP/cpcommandlineemailer.aspx
    eg. Download compiled utility - 6.05 Kb
    (You must log into to download - free and easy to do)

    9. Erase text file so a new one can be written.

    Option 2

    If you require that the log file be emailed to you (or others) when the analysis has been completed. The following can be done:

     
    
    capture erase kk2.txt
    
    log using c:/kklog,text replace
    set more off
    
    forvalues i=1/2000 {
      display "Looping index: `i'"
    }
    
    log close
    
    //text file to run CommandLineEmailer
    tempname fh
    file open   `fh' using kk2.txt, write
    file write  `fh' "smtpserver = mail.tpg.com.au" _n
    file write  `fh' "from       = myemail@whatever.com.au" _n   
    file write  `fh' "to         = email@whatever.com.au" _n     
    file write  `fh' "subject    = Test Message" _n
    file write  `fh' "body       = log sent: `c(current_date)' `c(current_time)'" _n
    
    file write  `fh'  "attachment = c:\kklog.log" _n                          //<--10                               
    file close `fh'
    
    !CommandLineEmailer /p:kk2.txt  
    
    exit
    
    


    Note for Option 2:

    10. Attaches the log file to the email.


    See Option 1 notes above for other details

    For help on specific commands type:
    help and then the specific command eg. help obs

    Generating a dataset (April 2011)

    Sometimes researchers expect a large dataset at some time in the future and wish to make sure that their version of Stata can handles the dataset (within the limits of their version of Stata). Also, they may wish to check that their version does the analysis in a timely manner and their computer is set up to handle the data; otherwise there may be a need to upgrade the computer and/or upgrade the flavour of Stata eg. Stata/MP

    To see the limits of your existing flavour of Stata type: Help limits

    The following examples generate sample data sets to experiment with.

    (1) Generates a data sets with a number of continuous variables and observations that are specified.

     
    
    clear all
    set memory 300m         //<-- allocates 300 megabits of memeory to Stata
    set obs 1000            //<--No. of observations
    gen y=uniform()*10
    forvalues i=1/100 {     //<--No of variables
     gen a`i'=uniform()*100 //<--cont. variables
    }
    summarize 
    
    


    (2) Generates a binary variable and continuous variables.

     
    
    clear all
    set memory 300m
    set obs 1000             //<--No of observations
    generate y=uniform()<.5  //<--binary variables 
    forvalues i=1/100 {      //<--No. of cont. variables
     generate a`i'=uniform() //<--cont. variables 
    }
    tabulate y
    
    


    (3) Generates a categorical variable and continuous variables.

     
    
    clear all
    set memory 300m
    set more off
    set obs 1000             //<--No of observations
    generate y=mod(_n,4)+1   //<--cat. variable
    forvalues i=1/10 {       //<--No of cont. variables 
     generate a`i'=uniform() //<--cont. variables
     }
    tabulate y
    
    


    For help on specific commands type: help and then the specific command eg. help obs

    Printing log files (March 2011)

    Recently a few people have inquired about the printing of log files. People have had problems with the truncation of right hand side the log file.

    Stata has a few settings that allows control over the way a log is printed.

    Option 1
    Stata has various system settings. These can be seem by typing: query
    To set the width of the text across the page use:

    set linesize #
    Example:
    set linesize 85

    The above example sets the linesize on the Results windows and hence the log to 85 characters.
    (Note: not all commands are effected by the linesize setting, see the Stata 11 manual for more details)

    Note: The linsize setting must be done prior to running the log.

    Option 2
    When printing you can also control the print font size. To change this load the log file into a Viewer window and :



    Option 3
    Print using the print command and include overrides eg.
    print c:/experiment.smcl, header( off) fontsize( 6) logo(off) lmargin(3)

    The overrides for the translators can be found by typing the "translator query" and the the name of the translator. For example:

    translator query smcl2prn


    For further help on the above code, type the following on the Stata command line:
    help log
    query
    help viewer
    help translator


    Working with dates (February 2011)

    Stata has a considerable collection of time and date functions. These can be found by typing:
    help date()

    Often you wish to limit the command to before, after or between particualar dates.This is easily done using the date pseudofunction or if the dataset has been set for time series the tin() function.

    Example using a Pseudofunction

    Find the number of observations greater than a specified date.

     
    clear
    input str20 starts_d a
    "20jan1980" 1
    "20jan1981" 2
    "20jan1982" 3
    "20jan1983" 4
    "20jan1984" 5
    "20jan1985" 6
    "20jan1986" 7
    "20jan1987" 8
    "20jan1988" 9
    end
    list
    
    generate date1=date(starts_d,"DMY")       //<-Note 1
    
    summarize a if date1>td(25April1985)   //<-Note 2
    
    

    Note 1: Generates a new variable (date1) which is the elapsed time in days from a date datum (1 Jan 1960). This variable is numeric.

    Note 2: Summarizes a subset of the data. The subset being determined by the pseudofunction function td(). The number of observations in the subset are shown under obs.

    Example using the tin() function
    Find the number of observations up to a specified date.

     
    
    clear
    input str20 starts_d a
    "20jan1980" 1
    "20jan1981" 2
    "20jan1982" 3
    "20jan1983" 4
    "20jan1984" 5
    "20jan1985" 6
    "20jan1986" 7
    "20jan1987" 8
    "20jan1988" 9
    end
    list
    
    generate date1=date(starts_d,"DMY")     
    format date1 %td
    tsset date1                 //<-Note 3   
    
    list  if tin(,25Apr1985)    //<-Note 4  
    
    

    Note 3: tsset is the command to set the data for time series

    Note 4: tin() determines the subset of the data. This function allows a lower and upper limit to be specified; the lower limit being on the left and the upper on the right. If the left hand limits is omitted Stata assumes that the lower limit is to be taken from the beginning of the data and conversely if the right hand limit is omitted Stata assumes the end of the dataset.



    For further help on the above code type the following on the Stata command line:
    help date()
    help tsset

    Producing Multiple graphs (January 2011)

    Multiple graphs can be produced in Stata 11 with loops. If all the numeric variables are required to be graphed as histograms the following can be used:

     
    sysuse auto, clear
    
    foreach i of varlist _all {
     capture confirm  numeric variable `i'
      if _rc==0 {
        histogram `i', name("`i'")
      }
    }
    
    
    The Stata "confirm" command checks if the variable is a numeric variable. If it is the Stata prefix "capture" command returns _rc as 0 if not some other value is returned. Then the return code _rc is then checked with the "if" command, if true the histogram is drawn if false the next variable in the "foreach" loop is run.


    If you do not wish to run all the variables in the dataset the following can be used:
     
    sysuse auto, clear
    graph drop _all  // drop existing graphs
    local a "mpg turn"
    
    foreach i of local a {
     capture confirm  numeric variable `i'
      if _rc==0 {
        histogram `i', name("`i'")
      }
    }
    
    exit
    
    


    Alternatively using the Stata's ds command:

     
    sysuse auto, clear
    graph drop _all  // drop existing graphs
    
    ds , has(type int) 
    return list
    
    foreach i of varlist `r(varlist)' {
    
      histogram `i', name("`i'")
    }
    
    exit
    
    


    Or

     
    sysuse auto, clear
    graph drop _all  // drop existing graphs
    
    ds make ,  not
    return list
    
    foreach i of varlist `r(varlist)' {
    
      histogram `i', name("`i'")
    }
    
    exit
    
    This time still using the ds command but excluding the variables that you do not wish to graph with the not options.


    The default display for multiple graphs is to show each graph in a separate graphics window. To show all the graphs in the one window (tab graphs) the stata setting: autotabgraphs can be set to on eg.

    set autotabgraphs on

    Also, when displaying graph in the one graphics window the display can be altered by pulling the tab into the desired part of the window. An example:



    For further help on the above code, type the following on the Stata command line:
    help capture
    help ds
    help forvalues

    Stata 11 PDF (December 2010)

    Stata 11 includes the manuals on PDF; all 8000+ pages! The manuals include detailed examples of Stata commands, technical details, references and the maths for the command. While Stata's online help is handy for those that are already familiar with the command, the manuals are very useful for learning about new commands. There are various way of accessing the PDF manual. These are :

    1. To access the entire set of PDF manuals you can use the Pull down menu: Help>PDF Documentation




    2. For a specific entry, open a Stata online help page (eg. help regress ) and then click on the hyperlink




    3. Creating a hyperlink on the Results Windows. This is particularly helpful for Stata courses or emailing a reference to a fellow Stata user.

    display in smcl "{manpage GSM 141} {hline 2} starting MAC"
    display in smcl "{manlink R regress} {hline 2} Linear regression"

    OR

    Creating your own ado file with PDF hyperlinks eg.

    *******pdf.ado************
    program pdf
    display in smcl "{manpage GSM 141} {hline 2} starting MAC"
    display in smcl "{manlink R regress} {hline 2} Linear regression"
    end
    **************************

    save the above file as pdf.ado and put it in the adopath (suggest c:/ado/personal)
    then to bring up the hyperlink type pdf on the Stata command line.


    For further help on the above code, type the following on the Stata command line:
    help adopath
    help display
    help smcl


    Regular Expressions (November 2010)

    Stata has regular expressions that allow you to work with simple or complex text.

    Regular expressions are listed under string functions.

    One application of regular expression is for working with address data. The following show how to (in most cases) separate the postcode and state from an address.

     
    clear
    
    input ///
    str100 address
    "1234 West St Blackburn  3000 Vic"
    "West  St 1234 Blackburn  3000 vic"
    "West  St 1234 Blackburn  Vic 3000"
    "West  St 1234 Blackburn  sa 3000"
    "12 West St Backburner  2001 nsw"
    end
    list
    
    //getting postcode
    generate postcode1=regexs(2) if regexm(address,"(^.*)([0-9][0-9][0-9][0-9])")       //comment: reg1
    
    //get state
    generate state=regexs(0) if regexm(address,"([Vv][Ii][Cc]|[Nn][Ss][Ww]|[Ss][Aa])")  //comment: reg2
    
    //You could varify the first number of the postcode matches the state
    generate check=1 if lower(state)=="vic" & regexm(postcode1,"[0-9]") & regexs(0)=="3" //comment: reg3
    
    list
    
    Notes:
    reg1: (^.*) means get any text "." zero or more times "*" and the brackets around this indicate a subsection of the string - in this case subsction 1
    Subsection 1 is to continue until the last 4 digit number as indicated by: ([0-9][0-9][0-9][0-9])

    reg2: ([Vv][Ii][Cc]|[Nn][Ss][Ww]|[Ss][Aa]) requires a match of 3 characters the first character being either V or v and the second character being either I or i etc. if the first 3 characters have not been found then it continues to look for a match with the next group of 3 characters. The "|" symbol is a logical OR.

    reg3: looks for a match of state: lower(state)=="vic" , the lower() function makes sure that we are comparing the states in lower case. regexs(0)=="3" checks the match of the previous statement with the number 3; the correct start of the vic postcode.

    Assuming that the postcode has been incorrectly coded with the inclusion of alpha characters and needs to be cleaned up. The following is one way of doing this.

     
    clear
    
    input ///
    str100 address
    " 3a00c1 West St Blackburn  3a00c0 Vic"
    "West  St 123 Blackburn  3Re00c1 vic"
    "West  St 123 Blackburn  Vic 3f000"
    "West  St 123 Blackburn  sa 30jj00"
    "12 West St Backburner  2001 nsw"
    end
    list
    
    tempvar a1 a2 a3
    gen `a1'=""
    gen `a2'=""
    gen `a3'=""
    
    local aa "[A-Za-z]"
    
    //assume that the post code is in the second half of the string
    replace `a1'=regexs(0) if regexm( substr(address,strlen(address)/2,.)," ([3])(`aa'|[0-9])*") //comment: reg4
    replace `a2'=regexs(3) if regexm(`a1', " ([3])(`aa'*)([0-9]*)")                                
    replace `a3'=regexs(5) if regexm(`a1', " ([3])(`aa'*)([0-9]*)(`aa'*)([0-9]*)")                 
    
    generate code="3"+`a2'+`a3' if `a1'!=""
    
    list
    
    Notes:
    reg4: substr(address,strlen(address)/2,.) limits the search to the second half of the string. The space in " ([3]) between the " and (, indicates that a space is require and ([3]) indicates that this must start with the number 3. The second subsection: (`aa'|[0-9])*") looks for lower or uppercase characters OR; as indicated by OR symbol: "|", a number. The "*" at the end of the 2nd statements indicates zero or more times.



    The following is problem that requires the separating of the days, months and years into separate variables.

     
    
    clear
    input ///
    str40 dpr         
    "2 yrs 5months 26 days"   
    "3 yrs 2 months"                 
    "1yr 9 months"                  
    "1 yr 8 months"                   
    "1 yr 11 months 28 days"           
    "1 yr 12 days"                 
    "3 yrs 3 months12 days"         
    "3yrs 4 months 26 days"          
    "1 yr 9mnths 8 days"     
    end        
    list
    
    generate   year=regexs(1) if regexm(dpr, "^([0-9])([years ])")
    generate months=trim(regexs(1)) if regexm(dpr, "([0-9][ ]?)m")
    generate   days=regexs(1) if regexm(dpr, "([0-9]+[ ]?)d")
    
    list
    


    For further help on the above code, type the following on the Stata command line:
    findit regular expressions

    Stata's profile.do command (October 2010)

    A useful addition to your Stata setup is a profile.do file. This is a do file that Stata looks for and runs when starting a Stata session.
    To create a profile.do file, click on the "New do-file editor" or type doedit on the Stata command line and then type in commands that you wish to have executed when Stata starts up. Then save this file where Stata can find it ie. on the adopath.

    Included in the profile.do file can be:
    Stata settings eg.

    set memory 30m
    set matsize 800


    Setting the default directory:
    cd c:/data

    defining quick keys ie.

    global F4 "summarize;"
    Pressing F4 now executes the summarize command.

    global F5 "sysuse auto, clear;"
    Pressing F5 loads the auto data set that comes with Stata.

    global F6 " display in smcl _newline(60);"
    Pressing F6 creates 60 new lines so the Results window looks clean.


    The profile.do file can also be used to load dialogue boxes into the USER pulldown menu. For an example see:
    http://www.stata-journal.com/sjpdf.html?articlenum=pr0012
    (this show how to include meta-analysis dialogue boxes)

    When including a profile.do make sure that it is on the adopath; so Stata can find it. To see the adopath type: adopath


    For further help on the above code see:
    Stata 11 Getting started manual
    help adopath
    help profile.do


    Use System variables' _n and _N (September 2010)

    Stata's system variabels' _n and _N can be used to do a large number of otherwise difficult tasks. In this tip we will illustrate some of things that these can be used for.
    Defintion:
    _n : Current observation
    _N : Total number of observations in data set currently in memeory


    **Example 1
    Generating observations that are a sequent of numbers equal to the Stata observation number. The resulting variable: number

    Generating observations equal to the last observation number. The resulting variable: number_T

     
    clear all
    set obs 10
    generate number=_n
    generate number_T=_N
    
    

    The result of running the above is:
     
         +-------------------+
         | number   number_T |
         |-------------------|
      1. |      1         10 |
      2. |      2         10 |
      3. |      3         10 |
      4. |      4         10 |
      5. |      5         10 |
         |-------------------|
      6. |      6         10 |
      7. |      7         10 |
      8. |      8         10 |
      9. |      9         10 |
     10. |     10         10 |
         +-------------------+
    


    **Example 2
    Reversing the data so that the _N (last) observation become the first. This done for a particular variable.
     
    clear
    set obs 10
    generate number=_n
    generate rev_number=number[_N-_n+1]
    list
    

    The result of running the above is:
     
         +-------------------+
         | number   rev_nu~r |
         |-------------------|
      1. |      1         10 |
      2. |      2          9 |
      3. |      3          8 |
      4. |      4          7 |
      5. |      5          6 |
         |-------------------|
      6. |      6          5 |
      7. |      7          4 |
      8. |      8          3 |
      9. |      9          2 |
     10. |     10          1 |
         +-------------------+
    


    **Example 3
    Used _N with the bysort command to generate a variable that has the total number of children in families.
     
    clear
    
    input ///
    famid child
    1 1
    2 1
    2 2
    2 3
    3 1
    3 2
    3 3
    3 4
    end
    
    bysort famid: generate number=_N
    
    list, sepby(famid)
    

    The result of running the above is:
     
         +------------------------+
         | famid   child   number |
         |------------------------|
      1. |     1       1        1 |
         |------------------------|
      2. |     2       1        3 |
      3. |     2       2        3 |
      4. |     2       3        3 |
         |------------------------|
      5. |     3       1        4 |
      6. |     3       2        4 |
      7. |     3       3        4 |
      8. |     3       4        4 |
         +------------------------+
    


    **Example 4
    _n and _N can also be used as a qualifier. In this example marking ,for each family, the child who has the greatest income. The income variable is in brackets which tells Stata to sort this variable by income. When sorted the last observation (_N) ,by family, is the greatest income for that family.
     
    clear 
    
    input ///
    famid child income
    1 1 100
    2 1  150
    2 2  200
    2 3  250
    3 1  10
    3 2  100
    3 3  500
    3 4  250
    end
    
    bysort famid (income): generate number=1 if _n==_N
    
    l, sepby(famid)
    

    The result of running the above is:
     
         +------------------------+
         | famid   child   number |
         |------------------------|
      1. |     1       1        1 |
         |------------------------|
      2. |     2       1        3 |
      3. |     2       2        3 |
      4. |     2       3        3 |
         |------------------------|
      5. |     3       1        4 |
      6. |     3       2        4 |
      7. |     3       3        4 |
      8. |     3       4        4 |
         +------------------------+
    


    **Example 5
    Generating lags and leads in the data.
     
    clear 
    
    input ///
    time sales
    1  100
    2   150
    3   200
    4   250
    5   10
    6   100
    7   500
    8   250
    end
    
    generate lead=sales[_n+1]
    generate lag=sales[_n-1]
    generate lags=(sales[_n-1]+sales[_n-2])/2
    
    list
    

    The result of running the above is:
     
         +----------------------------------+
         | time   sales   lead   lag   lags |
         |----------------------------------|
      1. |    1     100    150     .      . |
      2. |    2     150    200   100      . |
      3. |    3     200    250   150    125 |
      4. |    4     250     10   200    175 |
      5. |    5      10    100   250    225 |
         |----------------------------------|
      6. |    6     100    500    10    130 |
      7. |    7     500    250   100     55 |
      8. |    8     250      .   500    300 |
         +----------------------------------+
    


    For further help on the above code see:
    Users guide: [U]13.4 System variables ( variables)
    help bysort

    Producing an edited log file (August 2010)

    Stata's log file reproduces what you see in the Results windows. Often there is a lot of material that is not needed for a final report and this material needs to be edited before presenting a report to others. Stata's log file can be edited from the do file as it is written. Just write a do file as is normally done and then decide what is required to be included.

    The example below has 2 ways of contolling the final log file output:
    1. Turning the log on and off so only the material that you wish to see is added
    To do this write a few local macros at the start of the log file and include these where required between Stata commands.

    2. Removing any text as required
    Using filefilter to remove the unnessary text.

     
    
    //set macros
    local new "capture log using out1, text replace"
    local on "capture log using out1,text  append"
    local off "capture log close"
    
    
    sysuse auto, clear    
    `new'      
    
    *this is a comment
    
    `off'                                                //off
    regress mpg weight
    
    `on'                                                 //on
    display "`e(rss)'"
    
    `off'                                                //off
    generate gpm=1/mpg
    
    `on'                                                 //on
    
    *this is GPM
    
    summarize gpm
    `off'                                                //off
    
    type out1.log  //displays log file before filefilter
    
    filefilter out1.log out2.log, from("off'") to(" ") replace
    filefilter out2.log out3.log, from("`") to(" ") replace 
    filefilter out3.log out4.log, from("  //off") to("") replace 
    filefilter out4.log out5.log, from(".") to("") replace 
    
    type out5.log  //displays log file
    
    
    For further help on the above code see:
    help macro
    help filefilter
    help type

    The input command (July 2010)

    When you're working with a data management or statistical command in Stata that you have not previously used, you may not be confident that you are doing this correctly. So rather then work with the complete data set it's often useful to make up a small data set that contains the critical points and run this to see if it is doing what you had anticipated. Once satisfied you can run this on the complete data set. For example if I wished to identify the observations that included the current date and up to 4 days in advance the following could be used:

     
    clear
    
    input ///  
    str15 dates
    "12/7/2010"
    "13/7/2010"
    "14/7/2010"
    "15/7/2010"
    "16/7/2010"
    "17/7/2010"
    "18/7/2010"
    "19/7/2010"
    end
    list
    
    gen date1=1 if inrange(date(dates, "DMY"), date(c(current_date),"DMY"),date(c(current_date),"DMY")+4)
    list
    exit
    


    After running the above we see the result

     
    . list
    
         +-------------------+
         |     dates   date1 |
         |-------------------|
      1. | 12/7/2010       . |
      2. | 13/7/2010       . |
      3. | 14/7/2010       1 |
      4. | 15/7/2010       1 |
      5. | 16/7/2010       1 |
         |-------------------|
      6. | 17/7/2010       1 |
      7. | 18/7/2010       1 |
      8. | 19/7/2010       . |
         +-------------------+
    
    . exit                                                                                               
    
    Errors in logic can now more easily be spotted and you have saved time by not running the complete data set. When this had been satisfactorily run it could be included in the main do file.

    For further information on this command see:
    help input

    For further help on the above code see:
    help comments
    help date
    help dates
    help inrange()
    help creturn list





    Splitting the Do Editor (June 2010)

    In Stata 11 the do editor can be split, making it easier to do some types of work. To do this there must be at least two tabs on your do editor. Pull one of these to the middle of the editor. When a selection box appears select one and 2 tabbed do editors windows appear.

    Pulling a tab to the centre



    Now there are two do editor windows


    Factor variables and lincom to produce a table (May 2010)

    Stata 11's factor variable can be combined with lincom to quickly produce tables.

    In this example we look at the table on P226 of "Statistical Modeling for Biomedical Researchers: A Simple Introduction to the Analysis of Complex Data, 2nd Edition by William D. Dupont" (See out bookshop to order).

    The data set can be downloaded: http://biostat.mc.vanderbilt.edu/dupontwd/wddtext/index.html

     
    
    set more off
    cd "C:\data\dupont"  //if the data is stored in a different directory change this
                          //to where it has been stored
    use "5.5.EsophagealCa.dta", clear
    
    recode tobacco  3=2 4=3, g(smoke)
    
    label define q_smoke 1 "0-9" 2 "10-29" 3 ">=30"
    label value smoke q_smoke
    
    logistic cancer i.alcohol i.smoke i.age [fw=patients]
    
      forvalues i=1/4 {  //alcohol
         forvalues j=1/3 {   //smoke
    
           qui: lincom  `i'.alcohol + `j'.smoke, or
           local a`i'`j'=r(estimate)
    
         }
      }
    
    local a11=1
    
    decode alcohol, gen(a)
    
    contract a
    keep a
    
    matrix aa=( `a11', `a12', `a13' \ `a21', `a22' ,`a23' \ `a31', `a32' ,`a33'\ `a41', `a42' ,`a43')
    svmat aa
    rename aa1 Tobacco_0_9
    rename aa2 Tobacco_10_29
    rename aa3 Tobacco_30
    
    list
    
    
    exit
    
    In the above the forvalue loop gets the different levels of alcohol and smoke. These are then applied to the factor variables in the lincom command. The returned values from lincom are then stored in a Stata matrix; one at a time. After going through all the combination of alcohol and smoke the matrix is then put into Stata and some labels applied.

    For more information on the specific commands type help and then the command eg. help lincom






    Stata Graphs (April 2010)

    From time to time Stata is used to produce non-standard/interesting graphs. I have compiled some of these graphs. These have mainly been presented on the Statalist. To see these graphs click here . This page will be updated from time to time.

    To see some of the User written graph commands click here . (from a previous tip)






    Tabdisp (March 2010)

    tabdisp is a Stata command that allows you to display Stata tables. This command allows lots control of the way that the elements are displayed.
    If cell percentages are required the following can be used:

     
    
    sysuse auto, clear
     
    contract for rep78
    list
    summarize _freq
    generate percentage=(_freq/r(sum))*100
    
    tabdisp for rep78, cell(percentage) cellwidth(7)
    
    
    

    Or if the % symbol is also required:
    
    sysuse auto, clear
    
    contract for rep78
    list
    
    summarize _freq
    
    generate percentage=(_freq/r(sum))*100
    gen freq=string(percentage, "%5.2f")
    
    replace freq=freq + "%"
    
    tabdisp for rep78, cell(freq) cellwidth(7)
    
    
    

    If the above is what was required then instead the user written program: tab2way or tab3way could be used

    There are many other ways to display your data eg. including the words max and min in the table cells
    
    sysuse auto, clear
    
    contract for rep78
    list
    
    sort _freq
    
    tostring _freq, gen(freq)
    
    replace freq=freq+ " Max" in `=_N'
    replace freq=freq+ " Min" in `=_n'
    
    tabdisp for rep78, cell(freq) cellwidth(7)
    
             
    


    For help on the individual commands type help and then the command name.
    To download the user written command tab2way or tab3way , type: ssc install tab2way or ssc install tab3way



    Tables to spreadsheet (February 2010)

    The tabulate command allows the values of the table to be save as matrices eg. options for the tabulate command are: matcell(), matrow() and matcol(). These matrices can be put into a spreadsheet. The table command however does not have these matrix options. However, there are workarounds that make it easy to put the results that the table command would have given into a spreadsheet. This tip explores a number of ways that this can be done:

    This is the command what we wish use and then get the resulting table out of Stata and into a spread sheet

     
    
    sysuse auto, clear
    
    //table in offical stata
    table for rep78 , c(mean price)
    
    


    The following gives us what we want but does not allow the output to be put into a spreadsheet
    
    sysuse auto, clear
    
    collapse (mean) price, by(foreign rep78)
    list
    
    tabdisp foreign rep78 , c(price)
    
    


    This time getting the table into a Stata data set so it can be exported to a spreadsheet
    This method has the advantage that the colum and row labels are also included
    
    sysuse auto, clear
    
    collapse (mean) price, by(foreign rep78)
    list
    
    drop if rep78==.
    
    reshape wide price, i(foreign) j(rep78)  //because the data is in long form it can be reshape
                                                 // into the required table
    list
    
    outsheet using c:/table, replace  //outputting the table to a form that can be read with a spreadsheet
    
    


    This time using Mata to manipulate the initial data
    
    sysuse auto, clear
    collapse (mean) price, by(foreign rep78)
    fillin foreign rep78
    drop if rep78==.
    sort for rep78
    list
    
    
    mata:       //start of Mata
    a=st_data(.,.)
    a
    
    s=J(2,6,.)
    s
    
    for(i=1; i<=10; i++) {
    r=a[i,2]
    c=a[i,1]
    s[r+1,c]=a[i,3]
    }
    
    
    names = st_varname((1..3)) names b2=st_varvaluelabel(names[1,1]) b2 if(b2!="") { zy2=uniqrows(a[.,1]) b3=st_vlmap(b2, zy2) b3 } else { b3=strofreal(uniqrows(a[.,1])) b3 } b2a=st_varvaluelabel(names[1,2]) b2a if(b2a!="") { zy2a=uniqrows(a[.,2]) b3a=st_vlmap(b2a, zy2a) b3a } else { b3a=strofreal(uniqrows(a[.,2])) b3a } table=(""\b3a) ,(b3',"."\strofreal(s)) table mm_outsheet( "c:/table1" ,table, mode="r") //user written program output to a Excel readable file end
    As you can see there are a number of different ways of getting table information out of Stata.


    For help on the individual commands type help and then the command name. To download the user written command mm_outsheet, type: ssc install moremata



    Tables to spreadsheet (January 2010)

    When a large number of tables are required to be put into a spreadsheet and no use written program is available to easily do this the following method can be used:

    Write a program for the particular table (or any output) that you require. If there are a number of different tables then written a program for each type of table.

    The program starts a log file and then runs the table command. It then closes the log file. The log file is then put through a file filter to remove any unwanted text.

    The partially cleaned up log file is then imported into Stata using the insheet command and then further cleaned up; removing any unwanted text and then the columns in the table are split into Stata columns. The extent of the clean up depends on the desired output.

    Having finished the cleaning up, this is either saved or appended to, using the required program option.

    Then you go on to append the next table to the file.
    When finished the file containing the tables can be opened in a spreadsheet


    clear programs //Clears the previous program to allow for modfications.
    //This can be removed when you are happy with the program
    program tables
    version 11.0
    syntax varlist(max=2 min=2) [, append] gen(string)
    tokenize `varlist' //split varlist
     
    capture log using a, text replace
    label var `1' `=strtoname("`:var lab `1'' " )' //combining the label into one word
    label var `2' `=strtoname("`:var lab `2'' " )'
    table `1' `2', stubwidth(40)
    log close
     
    filefilter a.log a1.log , from("-") to("") replace //deleting unwanted text in the log file
    filefilter a1.log a2.log , from("|") to("") replace
    filefilter a2.log a3.log , from("+") to("") replace
    insheet using a3.log, clear //brings the modified log file into Stata
    drop in -4/-1 //get rid of other material
    drop if strpos(v1,"log")
    drop if strpos(v1,"pause")
    drop if strpos(v1,"resumed")
    drop if strpos(v1,"unnamed")
    drop if strpos(v1,":")
    capture drop v2
    split v1
    drop v1
     
    //additional cleaning up if required
    replace v11=subinstr(v11,"_", " ",. ) in 1/2
    quietly: d
    local a1=round(`r(k)'/2)
    replace v1`a1'=v11[1] in 1
    replace v11="" in 1
     
    set obs `=_N+2' //two line space between tables
     
    if "`append'"!="append" {
    save `gen', replace
    }
     
    if "`append'"=="append" {
    append using `gen'
    save `gen' , replace //saves file to hard disc
    }
    end //end of program
     
     
     
    *********************************************
    **the commands that calls the above program
    *********************************************
    sysuse auto, clear
    set more off
    cd c:/
    tables for rep78, gen(aa) //1st table
     
    sysuse auto, clear //2nd table
    tables rep78 for, append gen(aa)
     
    sysuse auto, clear //3rd table
    tables for rep78, append gen(aa)
     
    list ,noheader sep(0) noobs
     
    outsheet using c:/aa.csv, comma nonames replace //saving to disk this can be opened in a spreadsheet
     
    exit


    For more information see:
    see help for the specific command


    User written table output commands include:
    tabout
    logout
    esttab



    Point Estimates for a Regression (December 2009)

    After a regression point estimates can be obtained with:
    Examples:

    sysuse auto, clear
    regress mpg weight
    display _b[weight]*3000+_b[_cons]


    OR

    sysuse auto, clear
    regress mpg weight
    lincom weight*3000+_cons


    OR

    sysuse auto, clear
    regress mpg weight
    //then open the data editor and add an additional observation for the weight variable eg. 3000
    //then run the following
    predict a
    //then display the point estimate with the following
    display a[_N]


    This then displays the point estimate. The last method is useful when a number of estimates need to be made.

    For more information see:
    help predict



    Doing thing quietly in Stata (November 2009)

    Stata's quietly command allows commands to be run without outputting to the results window. This is useful if you only require the returned results (eg. r(mean) etc see help return list ) and not the actual output.

    Example:
    sysuse auto, clear
    quietly summarize mpg, detail

    or

    quietly: summarize mpg, detail


    Also you can have a block quiet:

    sysuse auto, clear
    quietly {

    summarize mpg, detail
    local a=r(mean)
    summarize price, detail
    local a=r(mean)
    }


    If you wish to see specific output in a quiet block you can add noisily to this
    Example:

    sysuse auto, clear
    quietly {
    summarize mpg, detail
    local a=r(mean)
    noisily summarize price, detail
    local a=r(mean)
    }

    For more information see:
    help quietly



    Graphing functions (October 2009)

    The graph histogram command allows a normal distribution option to be included in this graph. The twoway graph however does not have this option. However, this can also be easily done by adding a function graph, as shown in the following example:

    sysuse auto, clear
    quietly summarize mpg
    twoway (histogram mpg, bin(10), ) ///
    (function y=normalden(x, `r(mean)', `r(sd)'), range(4 44) xlabel(#10) )




    Lots of other functions can be drawn eg.

    twoway function t=tden(1, x), range(-5 5) xsize(4) ysize(2) color(blue) ///
    lstyle(p1solid) xlabel(-5(1)5) recast(area) || function z=normden(x), range(-5 5) ///
    color(maroon) lwidth(thick)

    twoway function c=chi2(1,x), range(0 5) xsize(4) ysize(3) yline(.5)

    twoway function c=Fden(5, 10, x), range(0 5) xsize(4) ysize(3) yline(.3)


    Stata 11 - Variable manager (September 2009)

    Getting variable names into a do file:
    The Stata 11 variable manager makes this easy. Just highlight the variable name(s) in the variable manager, right click and then click onto "copy variable list" . Go to the do Editor and paste where required.


    Filtering variable names
    On the top left hand side of the variable manager is the variable filter. Start typing any part of the variable name in the filter and the variables that include this text remain in the variable manager list; the others disappear. This is a great feature for looking for a particular variable in a large dataset.

    For more information:
    help varmanage (Access Stata's PDF manual by clicking on the online help hyperlink: [D] varmanage )


    Getting a Subset of a large dataset into Stata (August 2009)

    The various flavours of Stata have limits on various commands, label lengths, macro lengths etc. One of the limits is the maximum number of variables that can be loaded into Stata.
    In Stata/IC 11 the limit is set at 2,047 variables

    To see the limits of the various flavours of Stata see: help limits

    If your data set contains more than 2047 variables and you do not need all of these in Stata then the second syntax of Stata's use command can be used to get a subset of this data set into Stata
    help use
    use [varlist] [if] [in] using filename [, clear nolabel]

    example:
    use mpg using "c:/program files/stata11/auto", clear

    This loads only the mpg variable into Stata.
    If you wish to inspect a dataset in memory (to see variable names etc.) you can use the second syntax of Stata's describe command

    describe [varlist] using filename [, file_options]

    example:
    describe using "c:/program files/stata11/auto", varlist
    return list


    Also see:
    help memory


    Capture (July 2009)

    Controlling the unknown

    Stata commands that result in an error, issue a non zero return code (_rc). In Stata 10 and Stata 11 the return codes can be seen in the Review Windows (you may need to expand the Reviews window to see the _rc column)

    If a command in an do file produces an error the do file will stop. This can be prevented by prefixing the command with the capture command eg.

    log close //example 1

    capture log close //example 2

    In the above example 1, a do file/program would stop running if there was no log file open. Stata requires a log file to be open before it can be closed and no other log file open before it can open a log file.
    In example 2, a do file/program would continue to run even if there was no log file open. The capture command allows errors to be ignored.

    Apart from preventing a do file/program from stopping, the capture command can also capture a command's return code in _rc. The return code (_rc) can then be used to make a decision in your do file/program.

    Example
    sysuse auto, clear

    tostring mpg, replace //for the purposes of the example convert mpg to a string variable
    describe
    foreach v of varlist price-foreign {
    capture confirm numeric variable `v'
    display _rc //allow you to see the return code
    if _rc { //if _rc is not 0 (zero) the statement is true and Stata goes into the loop
    destring `v',replace
    describe `v'
    }
    }

    Also see:
    http://www.stata.com/statalist/archive/2009-06/msg00623.html    (An example of how to use a return code to set up the default directory in Stata.)
    help confirm
    help capture



    Transparent Graphs (June 2009)

    Stata graphs can be made transparent in MS Word and other software. For example the following graph was produced in Stata and then made transparent in Word.

    The above graph was produced in Stata by:

    sysuse auto, clear

    twoway ///
    (histogram mpg if rep78==3, fcolor(green)) ///
    (histogram mpg if rep78==4, fcolor(blue))
    graph export c:/hist.wmf, replace

    Then in Word 2003
    Insert>Picture>from file and then c:/hist.wmf

    Click on graph
    Edit picture
    Right Click on a bar that you wish to make transparent
    Format AutoShape>Color and lines tab>Fill section and the move the transparency slider to 50% and press OK
    Continue to edit all the bars this way. The legend can also be modified as per above
    Save


    Also see:
    http://www.stata.com/statalist/archive/2009-04/msg00574.html
    http://www.stata.com/statalist/archive/2009-04/msg00612.html


    Getting Stata's Graph editor commands into Stata graphs (May 2009)
    Stata has a great graph editor. However, after you have modified your graph the editor will not produce the normal Stata code for this graph. However, it is possible to retrieve the editing commands if they have been recorded using the Stata graph editor recorder, adding gr_edit at the start of each editor line and then adding this to the initial graph code. Now you have the code to reproduce the graph.

    Example:
    Assume that you have run the following
    sysuse auto, clear
    histogram mpg
    Then click on the Start Graph Editor icon and pressed the Start recording icon. Then altered the color of the histogram bins. Then stop the recorder and saved the record on the hard disk with a suitable name and path. Then opened the record (just saved) in Stata's do editor.
    the line:
    plotregion1.plot1.style.editstyle area(shadestyle(color(gs7))) editcopy
    was retreived and gr_edit added to the from of this.

    the complete file would look like:
    sysuse auto, clear
    histogram mpg
    gr_edit plotregion1.plot1.style.editstyle area(shadestyle(color(gs7))) editcopy


    this is run and will produce the original graph complete with the edit.

    Alternatively you could save the recording and include it as follows:
    sysuse auto, clear
    histogram mpg, play(hist1) //hist1 is the name of the recording


    Also see:
    http://www.stata.com/statalist/archive/2008-07/msg00932.html
    help graph play


    Weaving Stata results into a Word Report (April 2009)
    It is possible to put results from Stata into a word document by first obtaining your data in Stata and then using mail merge to get this into Word.

    For example, if you wish to automate you report writing and required the max. and min. mpg in a Word report (using the auto.dta data set that comes with Stata ) this can be done with the following do file: The user written program moremata is used this must first be installed. To install type the following on the Stata command line eg.
    ssc install moremata

    Once installed run the following Stata do file is run

    ********************weaving do file*********************************
    sysuse auto, clear
    *determine max and min mpg
    quietly: sum mpg
    local max_mpg =r(max)
    local min_mpg =r(min)
    di `max_mpg' //only if required to see results in Stata
    di `min_mpg' //only if required to see results in Stata

    mata
    a="max_mpg"\st_local("max_mpg")
    a1="min_mpg"\st_local("min_mpg")
    a2=a,a1
    a2
    mm_outsheet("c:/tips.txt", a2, mode="r")
    end
    ********************weaving do file*********************************

    After running the above the text file tips.txt is produced (in C:/ drive)
    Then in your MS Word report include the following:

    The maximum value of mpg is: {MERGEFIELD "max_mpg"}
    The minimum value of mpg is: {MERGEFIELD "min_mpg"}

    Open the data source in Word and then run Mail Merge

    After running mail merge your report should look like:

    The maximum value of mpg is: 41
    The minimum value of mpg is: 12


    Using this method you can include tables, graphs etc. into your Word document.

    References:
    http://ideas.repec.org/p/boc/asug05/14.html

    Also look at:
    findit texdoc
    findit esttab
    findit estout


    Stopping Stata during the running of a do file (March 2009)
    When running a do file you may wish to inspect the data at various points. Stata has a number of way of doing this. For example:

    Option 1:
    Using the edit command. Opens the data editor and allows you to inspect the data. When the editor is closed the do file continues to run. (Instead of edit you could have used browse to open the data browser)

    sysuse auto, clear
    regress mpg weight
    edit //stops Stata and opens the data edit window
    summarize
    exit


    Options 2:
    Stopping Stata by using the more command

    sysuse auto, clear
    regress mpg weight
    more
    summarize
    exit


    Options 3:
    sleep stops Stata for a specified number of milliseconds

    sysuse auto, clear
    regress mpg weight
    sleep 1000 //sleep specifies the number of milliseconds to wait
    beep //used to wake you up if the sleep is too long
    summarize
    exit


    Options 4:
    exit stops a do file. To run more of the do file move the exit command down the do file and run again.

    sysuse auto, clear
    regress mpg weight
    local a 1
    exit //program stop at this point then move to another line and run again
    display `a'

    For more information see:

    help edit
    help browse
    help more
    help exit



    Putting Greek symbols in graphs (February 2009)
    Greek symbols (or other symbols) can be added to Stata graphs. To add these you must first set up your computer for this eg.

    In Windows XP
    Click on the start button (bottom left hand side of screen)
    Click on the Control Panel
    Click on Regional and language option
    Click on the Advanced tab
    Select Greek (or another language with you require this)
    then click Apply and then OK (the computer will then be required to be restarted )


    then in Stata:
    using the pull down menu:
    Edit>Preferences>Graph Preferences
    Then font select Arial Greek

    To see the numbers used in the extended code you can use the Nick Cox written graph:
    asciiplot
    (this is a user written command and must first be downloaded)
    To download asciiplot type the following on the Stata command line
    ssc install asciiplot

    then for example, type:

    scatter weight mpg, title( Example of Greek characters in a Graph `=char(238)' `=char(243)' `=char(236)' )




    Or the Stata graphics Editor can be used to include Greek symbols

    For more information see:

    Data Management Manual: char(n)

    For an article on char()
    See http://www.stata-journal.com/sjpdf.html?articlenum=dm0006



    Doing things by levels of a variable (January 2009)
    Using Stata's bysort prefix command

    bysort is a Stata prefix command that allows you to execute commands by levels/groups of the variable(s) that you specify

    Example:
    If you wanted to generate a new variable with a 1 at the first occurrence of each level of mpg the following can be used (using the auto data set that comes with Stata):

    sysuse auto, clear //load the auto data set into Stata
    bysort mpg: gen first=1 if _n==1

    If you wanted to generate a new variable with a 1 at the last occurrence of a level of mpg the following can be used:

    sysuse auto, clear //load the auto data set into Stata
    bysort mpg: gen last=1 if _n==_N

    Sorting within the group eg. if you wanted the car with the smallest weight within each level of mpg the following can be used:

    bysort mpg (weight): gen first_low_weight=1 if _n==1

    Note that the brackets around the weight variable name indicates to Stata that this is not be used as the level/group criteria but weight is to be sorted within each level of mpg

    For more information see:

    Stata 10 Data Management manual
    Online help bysort
    Online help for other prefix commands: help prefix
    help _n
    help _N



    Automation of Tables in Stata (December 2008)

    A tutorial showing different options for the automatic production of tables can be obtained by the following commands:

    ssc install tabletutorial

    to install and then

    help tabletutorial




    Memory usage in Stata (November 2008)

    Stata generally stores all of the dataset that it is working with, in the computer's memory. Therefore, the computer should have sufficient RAM to load all of the data. Storing data in memory allows fast access to the data. If the computer has insufficient memory and the operating system allows, the data is stored on the computer hard disk, however this can be very slow ie. Stata uses virtual memory where the operating system allows

    Stata assigns an amount of memory for it's self so that it can store the data in RAM, so whatever this is set to must be sufficient to store the entire data set. The memory settings in Stata can be changed to allow sufficient memory for the data set.

    What is sufficient memory?
    To determine this the online calculator can be used
    online calculator

    A quick way to determine the average width of the variable ( bytes) is as follows:
    (type the following on the command line or into a do file:)

    describe
    display r(width)/r(k)

    Then put this number (average variable width) into the online calculator. The result from the online calculator is the minimum memory required so allow 30-50% more then this for additional variables etc.

    then set the memory using the set memory command eg. set memory 50m

    Other useful Stata memory commands are:
    compress
    memory


    References
    http://www.stata.com/statalist/archive/2005-07/msg00348.html




    Stata Comment (October 2008)
    Stata has a number of ways of adding comments to Stata code. Some of these are:

    *
    The star at the start of a line tells Stata to ignore what follows eg.
    *this is ignored

    /* */
    The /* */ are used to add comments between code eg.
    regress mpg /* weight is the independent variable */weight
    or /* */ can be used to concatenate two lines of code eg.
    twoway scatter (mpg weight) /*
    */ (lfitci mpg weight)


    ///
    Stata ignores what is after /// and continues on the next line eg.
    regress mpg /// dependent variable
    weight


    The #delim ; command is useful in a number of ways. One use is to comment out blocks of code/text eg.

    #delimit ;
    *
    display "this is a comment"
    display "this is a comment"
    display "this is a comment"
    *;
    #delimit cr

    di "this is the end"
    exit


    the lines between #delimit; and #delimit cr are ignored

    For further information on comments type help comments
    For further information on #delimit type help delimit



    Stata user written graphs (September 2008)


    New graphs have been added.

    Apart from the official Stata graphs many users have written special graph commands and have made these available for download.
    To see just some of these click
    More user written graph will be added in the future.
    To see how these have been written (the code) use Stata's viewsource command.




    Stata tables (August 2008)

    If you required a table of cell percentages you could:
    sysuse auto, clear
    svyset rep78
    svy:tab rep78 for, per


    An easier way is to use Philip Ryan's user written command tab2way:

    tab2way rep78 for, cellpct

    this command also has lots of other options. To download this type the following on the Stata commandline:
    ssc install tab2way

    Stata users have written many commands for tables. To see a list of some of them type the following on Stata command line (when online):
    findit tab table
    Then to download click on the hyperlink and follow the instructions


    Sending Command(s) to the Stata Do Editor from the Stata Review Window (July 2008)

    While running Stata interactively, either with dialogue boxes or from the command line the command(s) that you issue to Stata are recorded in the Review Window. These commands can be put directly into the Do Editor for rerunning a session of Stata again, modifying the commands and rerunning or as a record of the analysis.
    Putting the contents or some of the contents of the Review Window into the Do Editor can be done as follows:

    In the Review Windows selecting the command(s) that you wish to go into the Do editor by:

    • Clicking on the command; if a single command is required
    • If more than one command is required. Holding down the shift key and select the commands
    • If all the commands that are currently in the Review Window are required then right click and and press select all

    Then: Right clicking the mouse button and selecting send to do-file editor
    The Do editor will then open with the highlighted command(s)in it. To run this using the Do Editor pulldown memu select: Tools>Do or using the icon (in Stata 10 this is the icon on the far right) or save this file and run from the command line eg. Saving this as c:/dofile and run by typing do c:/dofile on the Stata commandline.
    For more details the Stata command type the following on the Stata commandline:
    help do




    Creating a Stata dataset from multiple Excel worksheets (June 2008)
    There are a number of ways of doing this:

    • odbc
    • Stat/transfer
    • Stata with the append command

    In this example the Excel file is called book2 and is in c:/ drive. The file has two work sheets: kk1 and kk2

    odbc
    clear
    tempfile kka
    odbc load, dsn("Excel Files;DBQ=c:\book2.xls") table("kk1$")
    save `kka'
    list

    clear
    odbc load, dsn("Excel Files;DBQ=c:\book2.xls") table("kk2$")
    list
    append using `kka'
    list
    exit

    Also see:
    http://www.ats.ucla.edu:80/stat/stata/faq/odbc.htm



    Using Stat/Transfer 9
    With Stat/Transfer this would be done as follows:
    Open tab: option 3
    And then tick "concatenate worksheet pages"



    Stata with the append command

    Save each Excel worksheet as a csv in Excel. In this example c:/book2_kk1.csv and c:/book2_kk2.csv are the two files created

    insheet using c:/book2_kk1.csv, clear
    save c:/book2_kk1
    list
    clear
    insheet using c:/book2_kk2.csv, clear
    list
    append using c:/book2_kk1
    list



    For more details the Stata commands type the following on the Stata commandline:
    help append
    help insheet





    catplot (May 2008)
    Nick Cox has written a useful graph command (catplot) that graphs categorical variables. This user written program can be downloaded for free.

    To download this:
    On the Stata commandline window type:
    findit catplot
    then click on the hyperlink
    catplot from http://fmwww.bc.edu/RePEc/bocode/c
    and then follow instructions


    If the catplot command didn't exist and you wanted to produce a bar plot of the frequencies of the categories of rep78 then you would have to do something like:

    sysuse auto, clear
    tab rep78, g(z)
    graph hbar (sum) z* , bargap(13) asc ///
    yvaroptions(relabel(1 "1" 2 "2" 3 "3" 4 "4" 5 "5"))


    With catplot this is make easier with the following:

    sysuse auto, clear
    catplot hbar rep78



    For more details on catplot see the online help help catplot (once installed)
    For other graphs that Nick Cox has written see: http://www.ats.ucla.edu/stat/Stata/faq/graph/njcplot.htm





    Stata Users' Group Meeting Proceedings (April 2008)
    Material documenting the Stata Users' group meetings is worth looking through. It contains articles on a large number of topics.

    The list of Stata Users' Group meetings can be found at:
    http://www.stata.com/meeting/proceedings.html


    For example if you're not sure what regular expressions are, then have a look at:
    http://ideas.repec.org/s/boc/wsug07.html

    Or the following may be of interest:
    Panel data methods for microeconometrics using Stata
    http://repec.org/wcsug2007/cameronwcsug.pdf


    Powerful new tools for time series analysis may be of interest:
    http://repec.org/nasug2007/StataTS07.beamer.7727.pdf


    Interested in Stata and genetics? Then have a look at:
    A brief introduction to genetic epidemiology using Stata
    http://repec.org/usug2007/slides_nshephard.pdf

    There is lots more.




    Programming Stata - learning by examples (March 2008)
    Below are a number of do files that can be run in Stata thus allowing you to see how Stata programming works. By seeing the input and the ouput you can learn some of the basics of the Stata programming language (for the finer points refer to the Stata manuals). By learning some programming, Stata can be used more efficiently eg. the use of macro rather than typing in the same thing again and again. To use the tutorial:

  • Download the tutorial (do file)

  • Open the do file in the Stata do editor

  • Highlight an example

  • click on the do icon

  • Look at the result in the Results window - If you wish to confirm that you understand what is happening
  •        in the example make some changes to the example and check that the results are as you would expect.

    Tutorial 1 - do files
    Tutorial 2 - macros
    Tutorial 3 - loops
    Tutorial 4 - if statement
    Tutorial 5 - incrementing, _n, and _N
    Tutorial 6 - local extended macros

    More tutorials will be added in the following weeks

    Also see:
    Stata 10 Users Guide
    Stata 10 Programming Manual
    The Stata Journal (2005) Nicholas J. Cox "Suggestions on Stata programming style" 5, Number 4, pp. 560-566
    Nicholas J. Cox The Stata Journal (2002) Nicholas J. Cox "How to face lists with fortitude" 2, Number 2: pp. 202-222 click here
    Nicholas J. Cox Stata Netcourse NC151 "Introduction to Stata programming"
    Stata Netcourse NC152 "Advanced Stata programming"

    (Back issues of the Stata journal can be purchase from Survey Design and Analysis - contact details below)
    (To enroll in a Stata Netcourse please contact us)




    Mata - learning by examples (February 2008)
    Mata is a Stata matrix programming language. The advantage of Mata is that it is fast and for some problems the solution to these is easier in Mata.
    The Mata manuals are very useful for learning Mata. To complement the manuals attached are some Mata tutorials.
    The tutorials are a series of examples in a do file. To use the tutorial:

  • Download the tutorial (do file)

  • Open the do file in the Stata do editor

  • Highlight an example

  • click on the do icon

  • Look at the result in the Results window - If you wish to confirm that you understand what is happening
  •        in the example make some changes to the example and check that the results are as you would expect.

    Tutorial 1 Getting Data in Mata
    Tutorial 2 Looping, If statement and examples
    Tutorial 3 Subscripting matrics
    Tutorial 4 string and numerical matrices, getting a mata matrix into Stata
    Tutorial 5 Mata functions
    Tutorial 6 Mata pointers and Mata optimize
    Tutorial 7 Mata matrix maths and Solving simultaneous equation


    Also see:
    Stata 10 Mata manuals (The entire Mata manual can be found in Stata's online help for Mata eg. help Mata
    The Stata Journal (2007) William Gould (2004) "Mata Matters: Structures", 7, Number 4, pp. 556 – 570
    The Stata Journal (2007) William Gould (2004) "Mata Matters: Subscripting", 7, Number 1, pp. 106 – 116
    The Stata Journal (2006) William Gould (2004) "Mata Matters: Precision", 6, Number 4, pp. 550 – 560
    The Stata Journal (2006) William Gould (2004) "Mata Matters: Interactive use", 6, Number 3, pp. 387 – 396
    The Stata Journal (2006) William Gould (2004) "Mata Matters: Creating new variables–sounds boring, isn't", 6, Number 1, pp. 112 – 123
    The Stata Journal (2005) William Gould (2004) "Mata Matters: Using views onto the data", 5, Number 4, pp. 567 – 573
    The Stata Journal (2005) William Gould (2004) "Mata matters: Translating Fortran.", 5, Number 3, pp. 421 – 441

    (Back issues of the Stata journal can be purchase from Survey Design and Analysis - contact details below)




    Stata's display command (January 2008)

    Stata's display command is useful for writing to the Stata results window or using it as an online calculator

    The display command has features that allow various types of output and the tools to format and enhance these.

    Controlling the color of the output
    Example:
    display as text "green" as error " red" as result " yellow" as input " white"
    (text, error, result and input are styles)

    Controlling where the text is placed
    Example:
    display _column(50) "column"


    Including smcl (smcl is Stata's mark up and control language)
    Examples:
    display "{center: this}"
    display "{hline}"


    Formating
    Example:
    display %9.5f 9


    Stata's system values (type creturn list to see these)
    Example:
    display ("$S_DATE")
    display c(current_date)


    Link to Stata 11 PDF manual (New for Stata 11)
    Handy for passing on a reference to a specific topic in the PDF manuals
    Input the following commands and then click on the hyperlink in the Results windows.
    Examples:
    display in smcl "{manpage GSM 141} {hline 2} starting MAC"
    display in smcl "{manlink R regress} {hline 2} Linear regression"


    Also see:
    Stata 10 programming manual display
    Stata 10 programming manual smcl
    Stata Journal Ryan, Philip (2004) "Stata tip 4: Using display as an online calculator", 4:1 Page 93.
    In Mata: display()




    Creating a binary variable from a continuous variable (December 2007)

    On way of creating a binary variable is to generate a new variable containing 0 and then replace the contents of the variable with 1 based on a qualifier eg.

    generate dummy1=0
    replace dummy1=1 if mpg <=25

    this can be simplified to one line eg.

    generate dummy1= mpg <=25

    this works because mpg <=25 is either true or false. Stata qualifiers evaluates to 1 if true and 0 if false.

    If the variable that is part of the qualifier contains missing values then include the if condition: !=missing() eg.

    generate dummy1= mpg <=25 if !=missing(mpg)

    Other ways of creating dummy variables can be found at:Stata FAQ
    Also see: What is true and false in Stata?




    New subcommand for listing user written commands (November 2007)

    Many user written commands are stored in the SSC (Statistical Software Components) archive. In the lastest ado update for Stata 10 a new subcommand has been added to scc:

    ssc whatshot

    The syntax is: ssc whatshot [, n(#) author(name)]

    Examples:
    whatshot


    whatshot, author(cox)

    To get these commands you need to update Stata. To do this with the pull down menu:
    Help>Official Updates and then click on www.stata.com. Then follow instructions.

    For more information on SCC type help scc on the Stata commandline




    Undocumented commands (October 2007)

    In addition to the commands found in the Stata manual there are also undocumented commands that you may find useful. To see these type help undocumented.
    A commands that you may find useful is: twoway__histogram_gen
    This command generates coordinates of the bars in a histogram. An examples of how it works is:

    sysuse auto, clear
    twoway__histogram_gen mpg , fraction gen(h x)
    l h x in 1/20
    twoway (scatter h x) (histogram mpg, fraction)
    tab x h
    exit



    Another commands that you may find useful is the matalabel
    This command generates 3 matrices in mata, one for each of: value label name, value and the label

    An examples of how it works is:


    sysuse auto, clear
    matalabel , generate("a" "b" "c")
    mata
    a
    b
    c
    mata describe
    vallab=(a,c)
    vallab
    b
    end





    User written program - examples (September 2007)

    When learning new commands in Stata it is often useful to have examples of how the syntax is applied. Stata's documentation includes many examples and allows you to downloaded data sets for these (File/Example datasets), thus allowing you to reproduce the results. Also, Stata's online help includes many examples. Another useful source of examples is Nick Cox's examples user written program
    An example of some of what you get by typing examples egen


    Setup
    . sysuse auto, clear
    Create highrep78 containing the value of rep78 if rep78 is equal to 3, 4, or 5, otherwise highrep78 contains missing (.)
    . egen highrep78 = anyvalue(rep78), v(3/5)

    List the result
    . list rep78 highrep78



    To see a description of examples type the following on the Stata command line when online
    ssc describe examples

    To install examples, type the following on the Stata command line when online
    ssc install examples





    Settings for Stata (August 2007)

    Various features of Stata can be set to individual preferences or changed to meet the requirements for a particular analysis.
    To see what can be set type query on the Stata command line

    Amongst the things that can be set (in Stata 10) is whether or not you would like graphs tabbed on the graph window or each open graph in a separate graph window.
    The syntax for this command is:
    set autotabgraphs on , permanently

    Other set commands that you are likely to use are:
    set more off
    set memory

    For more information see the Stata 10 reference manuals

    [R] query -- Display system parameters
    [R] set -- Overview of system parameters






    Copy as picture - Copying from the results windows to Word and Excel (July 2007)

    Stata 10 has a copy feature that allows you to copy highlighted parts of the results windows to Word, Excel and other packages, as a picture. To use this, highlight what you want copied in the results window, right click the mouse button and click on to "Copy as Picture". Then paste into another package. In the other package this can usually be cropped and edited in the normal way.





    Estout - Stata Regression Tables (June 2007)

    Estout is a useful user written command for outputing regression results in various forms. For more information you can see the estout web site go here



    Adoupdate (May 2007)

    The commands under update are useful for keeping Stata's executable file and the official Stata ado files up to date (see help update). However, these do not keep the user written ado files up to date. (user written programs that you have downloaded). To ensure that you are working with the latest version of a user written ado file type adoupdate on the Stata command line or using the pulldown menu help/SJ and user written programs and then click on update. (You must be online to use this command)

    For more information on the adoupdate command see help adoupdate.
    Also, see update



    Nested Do file (April 2007)

    Stata allows you to break up your analysis in to logical sections; each part being a separate do file, with all the parts of the analysis contained in one do file. eg.


    **master****the do files below are contained in do file that you have name master.do (can be called any other name)
    .
    .
    .
    do projA_data
    do projA_error_checking
    run projA_data_man
    if M1==2 {
    do projA _A1 // projA_A1 exits finishes analysis
    }
    do projA_results
    exit
    **master*************


    In fact Stata allows nesting up to a depth of 64. eg. a do file calls another do file which calls another do file; up to 64 times.


    Nesting do files has some advantages:

    • Being able to reuse do files (that you have previously used an know that have no bugs) for other projects
    • Stata doesn't have a "goto line X" command. However if you break down your analysis into do files the same thing can be achieved.
    • Allows an quick overall view of the analysis
    • Easier to debug smaller do files than large do files
    • Some do files can be run (no output to the screen) and others can you can do (output to the screen). This is easier then using the Stata's quiet command


    Disadvantages:
    • More files to manage


    To learn more see: Stata 9 Users guide [U] 16.2 and [U]16.6.2
    Max. depth of nested do files, in Stata type help limits



    Personal help file (March 2007)

    Stata comes with help files for it's commands. However you may wish to compile a list of frequently used, but hard to remember commands in your personal help file

    Your own help file is saved as file with hlp extension eg. me.hlp on the adopath

    An example of a help file is as follows:

    {smcl}
    {* 03may2005}{...}
    {cmd:help Joe Blow } {right:updated 1 March 2007}
    {hline}
    {title:Wildcards and symbols}
    {p2col :{helpb comments:comments} *, ?, ///, etc.}{p_end}

    {hi:Wildcard zero or more} * or ~
    {hi:Wildcard one character} ?

    {hi:Continue onto next line} ///
    {hi:Commment out line} * at beginning or /// midline

    To learn more about smcl see the Stata Users Guide or look at a Stata help file (.hlp extension) in the do file editor.



    spmap -- Visualization of Spatial Data (February 2007)

    spmap is a user written command that can be down loaded for free.

    To download:
    Make sure that you are online.
    Type findit spmap
    Then click on the hyperlink.

    Once installed type help spmap to see the help file. At the bottom of the help file there are examples of what can be done. Click the hyperlink to see the graphs.

    Here are some of the examples.


    stcmd - Using Stat/Transfer within Stata (January 2007)

    The user written command stcmd can be used within Stata to change the data format of data sets stored on disk. stcmd uses Stat/Transfer to do this.

    To use this command you must first have Stat/Transfer and stcmd installed
    To get Stat/Transfer contact Survey Design and Analysis Services (details below).
    To get stcmd type findit stcmd in the Stata command Window and follow instruction to install the program


    Examples

    Using stcmd to convert a Stata data set to Excel
    stcmd "c:\Program Files\Stata9\auto.dta" c:/auto.xls, replace

    Using stcmd to convert a Stata data set to SPSS
    stcmd "c:\Program Files\Stata9\auto.dta" c:/auto.sav

    Using stcmd to converting many files from Excel to Stata
    stcmd mat*.xls *.dta.


    For more information see help stcmd (stcmd must first be installed)
    Also see fdasave for another way of changing the Stata data format to SAS






    encode (December 2006)

    encode is a useful command for converting strings to numbers. encode does this in alphabetical order eg.

    With the following dataset

    var1
    a
    b
    c

    encode var1, gen(var1a)

    Var1
    Var1a
    a
    1
    b
    2
    c
    3


    (Note: when Stata encodes it produces a value label: to see this type label list )

    If this is not the encoding that you require a way around this is to define a value label first and then use the label options for encode.

    Eg.
    If you have:

    var1
    a
    b
    c

    But would like var1a encode like:

    var1a
    3
    1
    2

    You would first define the value label eg.
    label define preference1 a 3 b 1 c 2

    And then applying this using the encode command
    encode var1, label(preference1) gen(var1a)

    Resulting in:
    Var1
    Var1a
    a
    3
    b
    1
    c
    2


    The code to run the above:
    clear
    input str1 var1
    a
    b
    c
    end
    label define preference1 3 a 1 b 2 c
    encode var1, label(preference1) gen(var1a)
    label list
    list, nolab


    For more information see:
    Stata 9 Data Management manual



    kdensity (November 2006)

    One of the problems with combining a number of histograms is that, generally where there are more than 3, the graph becomes unreadable. kdensity may be an a solution to this problem.

    sysuse auto, clear
    twoway (kdensity mpg if rep78==1, color(red)) ///
    (kdensity mpg if rep78==2, color(blue) ) ///
    (kdensity mpg if rep78==3, color(black)) ///
    (kdensity mpg if rep78==4, width(1)color(green) ) ///
    (kdensity mpg if rep78==5, color(purple)) , ///
    legend(label( 1 mpg at rep78=1) ///
    label( 2 mpg at rep78=2) ///
    label( 3 mpg at rep78=3) ///
    label( 4 mpg at rep78=4) ///
    label( 5 mpg at rep78=5))





    For more information see:
    Stata 9 graphics manual
    A Visual Guide to Stata Graphic by Michael Mitchael
    Stata Journal Vol 3 No. 2



    Intermediate graph commands (October 2006)

    Graphs cannot always be combined; even with the addplot option. However, you can still get combined graphs by using the pci , twoway scatteri and twoway pcarrow commands. For example if you wished to add a box plot to a scatter plot this could be achieved with the aid of the pci command and a twoway scatter. sysuse auto, clear
    qui sum mpg, detail

    local a= r(p25)
    local b= r(p75)
    local c=r(p50)
    local uav=`b'- 1.5*(`a'-`b')
    local lav=`b'+ 1.5*(`a'-`b')
    twoway (scatter mpg weight) ///
    (pci `a' 3000 `b' 3000, lcolor(red)) ///
    (pci `a' 3400 `b' 3400, lcolor(red)) ///
    (pci `a' 3000 `a' 3400, lcolor(red)) ///
    (pci `b' 3000 `b' 3400, lcolor(red)) ///
    (pci `c' 3000 `c' 3400, lcolor(red)) ///
    (pci `uav' 3200 `b' 3200, lcolor(red)) ///
    (pci `uav' 3150 `uav' 3250, lcolor(red)) ///
    (pci `lav' 3200 `a' 3200, lcolor(red)) ///
    (pci `lav' 3150 `lav' 3250, lcolor(red)) ///
    (scatteri `c' 3450 "Median",mlabangle(45) mlabsize(8)) ///
    , legend(off)


    For more information see:
    Stata tip 21 SJ 5 No. 2 pp282-284
    Stata 9 graphics manual



    MATA (September 2006)

    Mata is Stata 9's new matrix programming language.
    If you haven't had a look at Mata yet, then here are some examples of what you can use it for:

    Example 1
    Sorting rows in alphabetical order (statalist-digest V4 #2451)
    (the user written program moremata must first be installed)


    clear

    input str20 x1 str20 x2 str20 x3
    "massagli,mark" "wood,j." "dessent,harold"
    "beletz,elaine" "carter,annie" "curtis,barbara"
    "bradshaw,joe" "brown,arnold" "dunaway,lowell"
    "schneider,mark" "mullins,bobby" "sump,lawrence"
    end

    list

    tempfile foo

    mata
    C= J(3,1,"")     //creates a new vector
    A = st_sdata(.,.)'    // a transpose view of the data in stata

    for (i = 1; i <=cols(A); i++) {
    A = sort(A,i)
    C = C,A[.,i]
    }

    C=C[.,(2::cols(A)+1)]'
    mm_outsheet("`foo'",C, mode="r")
    end
    insheet using `foo', clear tab
    list


    Example 2
    xpose using mata (statalist-digest V4 #2328)
    (the user written program moremata must first be installed)


    clear
    tempfile tmp1

    input str16 v1 str2 v2 str2 v3 str2 v4
    "Sex" M M M
    "Age" 47 66 56 "Left eye"
    "Right eye" Y Y Y
    "Lower eyelid" Y Y Y
    "Upper eyelid"
    "Lateral canthus"
    "Medial canthus" Y Y
    "Recurrent lesion"
    "Primary lesion" Y Y Y
    end

    list

    mata
    A = st_sdata(.,.)'
    mm_outsheet("`tmp1'",A, mode="r")
    end

    insheet using "`tmp1'",clear
    l, ab(15) noobs


    Mata can do much more. To learn more see:

    Translating Fortran
    SJ 5(3), 3rd quarter 2005, 421 - 441

    Using views onto data
    SJ 5(4), 4th quarter 2005, 567 - 573

    Creating new variables (Sounds boring, isn't)
    SJ 6(1), 1st quarter 2006, 112 - 123

    Interactive use
    SJ 6(3), 3rd quarter 2006, 387 — 396

    Various responses on Statalist
    Mata Stata 9 reference Manual



    numlabel (August 2006)

    numlabel is a command that prefixes numeric values to value labels.

    Without numlabel



    numlabel , add //adding numlabel



    For more information see help numlabel and the Stata 9 Data Management Manual



    viewsource (July 2006)

    viewsource is a command that allows a file located on the adopath to be viewed in the Stata viewer.
    Example: To view the code for the t test type viewsource ttest.ado


    For more information see help viewsource and the Stata 9 Programming Manual



    datasignature - Determine whether data have changed (June 2006)

    If you have updated Stata 9 to the latest update (17 May 2006) you will find that a new Stata command has been added: datasignature. (to find out what has been added with the update, using the pulldown menu: Help>what's new or type whatsnew on the Stata command line)

    Datasignature give a number based on the following:
    1. The number of observations and number of variables in the data.
    2. The values of the variables.
    3. The names of the variables.
    4. The order in which the variables occur in the dataset if varlist is not specified, or in varlist if it is.
    5. The storage formats of the individual variables.

    Datasignature can be used for the following:

    Examples of interactive use
    1. checking with previous datasignature to see if the data has changed.
    2. checking if you are working with the same dataset as your colleges.


    For more information see help datasignature



    Simple Thematic Mapping(May 2006)

    tmap is a user written Stata program that allows you to map your data.
    For more information on using tmap see:
    FAQ
    Stata Journal Vol 4 - No 4

    Some shape data sources for Australia:
    AEC
    VDS Technologies
    Maps based on postcode can be purchased.

    I mapped the Victoria electoral map using the following for actual population. Other maps can be generated by adding your own data and then mapping this.

    *-----------------------start do file------------------------------------
    clear
    cd "C:\ASTATA INFO\learning\tmap" //where the data has been downloaded to
    set matsize 3000
    mif2dta VIC20030129_elb, genid(id)
    use VIC20030129_elb-database
    describe
    tmap choropleth actual, id(id) map("VIC20030129_elb-Coordinates.dta") palette(Reds)
    exit
    data downloaded from
    http://www.aec.gov.au/_content/Who/profiles/gis/gis_datadownload.htm
    *-----------------------end do file------------------------------------




    Separate (April 2006)

    A useful Stata command for creating separate new variables based on either an expression or a variable is separate
    Eg.
    separate mpg, by(mpg>20)
    will create 2 new variables one being mpg<=20 and the other mpg>20

    other examples are:
    separate mpg, by(mpg)
    separate mpg, by(mpg) gen(MPG)

    For more information on the separate command see the Stata 9 Data Management manual or online by typing help separate.


    Macro Expressions (March 2006)

    Rather than typing
    local a=r(N)
    forvalues i = 1/`a' {
    }

    The above code can be reduced to
    forvalues i = 1/`= r(N)' {
    }

    For more information on macro expressions see the Stata 9 Users guide [U] 18.3.8



    inlist() & inrange() functions (Febuarary 2006)
    Stata has many functions that make using Stata easier. Eg.

    count if mpg==22 | mpg==25 | mpg==34 | mpg==45
    can be written as:
    count if inlist(mpg,22,25,34,45)

    Another function is inrange() eg.
    count if inrange(mpg, 23,34)

    Instead of
    count if mpg>=23 & mpg<=34
    These functions can be used after the if qualifier with commands such as generate, list, summarize etc., or after assert,

    Examples:
    assert inlist(mpg,22,25,34,425)
    generate mpg1=mpg if inlist(mpg,22,25,34,45)
    list mpg if inlist(mpg,22,25,34,45)
    list mpg if inlist(mpg,22,25,34,45) | inlist(mpg,15,26,35,55) ///use 2 inlist functions when the list exceeds the max. allowed for 1 function


    set trace on (January 2006)
    local all `"`all' `"`=`v'[`i']'"'"'
    set trace off

    with our data a section of the trace will look like this:
    - local all `"`all' `"`=`v'[`i']'"'"'
    = local all `" `"Volvo 260"' `"11995"' `"17"' `"5"'"'
    - set trace off

    The first line is the line being executed. It has a - in front of it to indicate it is being executed. The second line is after macro substitution has occurred. It has a = in front of it to indicate that substitution has occurred.

    See also the user written command: tr
    To install this command: ssc install tr

    For more information on trace see the Stata 9 programming manual. Also see the Stata command pause.


    Do-file Editor(December 2005)
    When typing commands in the Stata Do-Editor, individual commands or a selection of commands can be run by highlighting the section that you would like to run and then pressing the do icon. This allows you to try out your file section by section.


    Regular Expressions(November 2005)

    Regular expressions allow the matching of complex text patterns. Regular expression commands have been included in Stata 9 with the commands:
    regexm - regular expression match
    regexs - return nth subexpression from match
    regexr - replace match expression with new string

    For example
    In the following example if you wish to have the day as a separate variable in the following data set:

    clear
    
    input ///
    str25 date
    "12jan2003"
    "1april1995"
    "17may1977"
    "02september2000"
    end
    
    list
    
    The following could the used:
    gen day=regexs(1) if regexm(date, "(^[0-9]+)")

    breaking this down:
    regexm: match expression
    ^: start at the beginning (LHS)of the string
    [0-9]: the first character to be any numbers 0 to 9
    +: one or more of the previous ie. characters between 0 and 9 (stops when a letter comes up eg. j of jan)
    ( ): the brackets around indicates the subexpression. In this case there is only one group hence regexs uses 1
    regexs(): returns subexpression ie. first subexpression


    The ouput from the regular expression:
    . gen day=regexs(1) if regexm(date, "(^[0-9]+)")
    
    . list
    
         +-----------------------+
         |            date   day |
         |-----------------------|
      1. |       12jan2003    12 |
      2. |      1april1995     1 |
      3. |       17may1977    17 |
      4. | 02september2000    02 |
         +-----------------------+
    

    Another example:
    We have some text that includes citations. We wish to create a new variable that contains the text of the last citation. In this case the last citation is not at the end of the text so it is useful to reverve the text and then look for the desired pattern.
    clear
    
    input ///
    id  str200 cit_1
    1   "EP696218-A -- WO9215370-A   SUND _SUND-Individual_"
    2   "WO9425112-A -- GB298635-A"
    3   "EP578126-A -- CH180906-A    AGE_OK"
    4   "EP562128-A -- DE1684639-A"
    5   "WO9318277-A -- DK137935-B"
    6   "US4434855-A   SEC OF NAVY _USNA_"
    end
    list
    
    gen kk1=reverse(regexs(1)) if regexm(reverse(cit_1), "([A-Z][-][0-9]+[A-Z]+)")
    
    list
    
    For a FAQ on regular expression go here



    Text Editors(October 2005)
    The text editor that comes with Stata is fine for small programs. However, as the size of the program increases other text editors are often used to make programming easier. For a discussion on various text editors go here


    The function sum() (September 2005)

    sum(x) returns the running sum of x. A basic use of sum() would be: generate running_tot =sum(1)

    Another example of the use of sum() is: given the data below you need to create a new
    variable that starts with zero and goes to zero for changes in id and increases by 1 for changes in var2.

    id   var2

    1    7
    1    7
    1    7
    1    7
    1    7
    1    7
    1    8
    1    8
    2    8
    2    8
    2    1
    bysort id: gen running _tot=sum(var2[_n]!=var2[_n-1])

    further information can be found by typing help sum() on the Stata command line


    clonevar (July 2005)

    Stata 9 has a useful commands that generates an exact copy of an existing variable.

    eg clonevar MPG=mpg

    for more information see help clonevar



    Docking Stata 9 Windows (June 2005)
    The UCLA site has much useful information on using Stata. If you are new to Stata 9 then the movie on docking windows will be useful. To see this go here

    Getting the path and file name onto the Stata command line (March 2005)
    Stata 8 has a handy way of getting files names complete with the path onto the command line. Rather than typing folders, sub folders, and file name use the pull down menu File/Filename, click onto the required file and path and file name will be shown on the command line; enclosed in quotation marks. This is particularly handy when the path consists of many sub directories with long names. You can then add commands such as cd, use to the command line.

    Tabout - a user written command (February 2005)
    tabout- produces publication quality tables from Stata, with the output exported to a text file. It can be exported as tab-delimited, html code or LaTeX/TeX code. -tabout- provides extensive user control over formating of data and labels and generates table headers automatically

    ssc install tabout

    (or: ssc install tabout, replace).

    To make learning the syntax easy, an example file which can be used as a tutorial is available here

     


    window command (Feburary 2005)

    The window command can be be useful for adding your frequently used commands to the pull down menu, pushing commands to the review window and displaying the current file in the top left hand corner of the Stata window and a lot more.

    To have your current file name displayed on the Stata window you can add the following to your do file:

    window manage maintitle "`c(filename)'"

    See your programming manual for further details on the window command


    ds - Describing Variables and Saving Results (January 2004)

    ds lists the variable names of the dataset currently in memory in a compact form. The command is useful if you require a list of variables that satisfies certain criteria. The list that results is saved in r(varlist) which can be used in other commands eg.
    (Using the auto dataset supplied with Stata)

    use "c:/stata8/auto.dta", clear
    ds, not(vall origin)
    list `r(varlist)'
    ds m*
    list `r(varlist)'

    See describe in the Stata reference manual for more details.
    Also see statalist-digest V4 #1701 & #1607

     


    WORKING IN ROWS (December 2004)

    The egen command has a number of functions that make it easier to work with data in rows. Rather than using xpose or reshape to convert the data to columns these commands may be able to be used.

    Egen's row functions"

    rfirst(varlist)
    may not be combined with by. It gives the first nonmissing value in
    varlist for each observation (row). If all values in varlist are
    missing for an observation, newvar is set to missing.

    rlast(varlist)
    may not be combined with by. It gives the last nonmissing value in
    varlist for each observation (row). If all values in varlist are
    missing for an observation, newvar is set to missing.

    rmax(varlist)
    may not be combined with by. It gives the maximum value (ignoring
    missing values) in varlist for each observation (row). If all values in
    varlist are missing for an observation, newvar is set to missing.

    rmean(varlist)
    may not be combined with by. It creates the (row) means of the
    variables in varlist, ignoring missing values; for example, if three
    variables are specified and, in some observations, one of the variables
    is missing, in those observations newvar will contain the mean of the
    two variables that do exist. Other observations will contain the mean
    of all three variables. Where none of the variables exist, newvar is
    set to missing.

    rmin(varlist)
    may not be combined with by. It gives the minimum value in varlist for
    each observation (row). If all values in varlist are missing for an
    observation, newvar is set to missing.

    rmiss(varlist)
    may not be combined with by. It gives the number of missing variables
    in varlist for each observation (row). String variables -- if specified
    -- are counted as containing missing when their value is ""; numeric
    variables are counted as containing missing when their value is system
    missing (.) or extended missing (.a, ..., .z).

    robs(varlist) [, strok]
    may not be combined with by. It gives the number of nonmissing values
    in varlist for each observation (row) -- this is the value used by
    rmean() for the denominator in the mean calculation.

    String variables may not be specified unless option strok is also
    specified. If strok is specified, string variables will be counted as
    containing missing values when they contain ""; numeric variables will
    be counted as containing missing when their value is ., as usual.

    rsd(varlist)
    may not be combined with by. It creates the (row) standard deviations
    of the variables in varlist, ignoring missing values. Also see rmean().

    rsum(varlist)
    may not be combined with by. It creates the (row) sum of the variables
    in varlist, treating missing as 0.

     


    A REMINDER TO START A LOG (November 2004)

    Would you like to be reminded to start a log each time that you start Stata.

    One way of doing is this is to include the command below in your profile.do file

    db log

    For information on profile see the GETTING STARTED MANUAL - More on starting and stopping Stata

     


    Version Control (October 2004)

    PROBLEM: Stata is continually being improved, meaning programs and do-files written for older versions might stop working.

    SOLUTION: Specify the version of Stata you are using at the top of programs and do-files that you write:

    ------------------------------------------ myprog.do ---
    version 8.2

    use mydata, clear
    regress ......
    ------------------------------------------ myprog.do ---


    ---------------------------------------- example.ado ---
    program myprog
    version 8.2
    ...
    end
    ---------------------------------------- example.ado ---

    For further information see the Stata programming manual

     


    Assert (September 2004)

    Assert is a useful command for verifying your data. e.g..

    assert sex=="Male" | sex=="Female"

    assert mpg<50 & mpg>10

    Also see Stata reference manual for further information.