CHP03- Data , Variable & Label

1. 查看数据

sysuse auto, clear
browse           *浏览数据
list make price mpg in 1/20  *列出数据
describe         *描述数据
describe, detail
d make price mpg
summarize        *基本统计量
summarize,detail
sum make price mpg
codebook         *列出变量信息
inspect          *描述变量的属性

2. 数据类型( storage type )

2.1. numeric

                                                         Closest to
    Storage                                              0 without
    type                 Minimum              Maximum    being 0     bytes
    ----------------------------------------------------------------------
    byte                    -127                  100    +/-1          1
    int                  -32,767               32,740    +/-1          2
    long          -2,147,483,647        2,147,483,620    +/-1          4
    float   -1.70141173319*10^38  1.70141173319*10^38    +/-10^-38     4
    double  -8.9884656743*10^307  8.9884656743*10^307    +/-10^-323    8
    ----------------------------------------------------------------------
    Precision for float  is 3.795x10^-8.
    Precision for double is 1.414x10^-16.

2.2 string

    String
    storage       Maximum
    type          length         Bytes
    -----------------------------------------
     str1             1             1
     str2             2             2
      ...             .             .
      ...             .             .
      ...             .             .
     str2045         2045           2045

     strL            2000000000     2000000000
    -----------------------------------------

2.3 datetime

2.3.1 Types of dates–human readable forms (HRFs)

    Date type         Examples of HRFs
    --------------------------------------------
    datetime          20jan2010 09:15:22.120  
       
    date              20jan2010, 20/01/2010, ...
    
    weekly date       2010w3
    monthly date      2010m1
    quarterly date    2010q1
    half-yearly date  2010h1
    yearly date       2010
    --------------------------------------------

2.3.2 Types of dates–Stata internal form (SIF)

    SIF type        Examples in SIF       Units
    -----------------------------------------------------------------
    datetime/c      1,579,598,122,120     milliseconds since 
                                          01jan1960 00:00:00.000, 
                                          assuming 86,400 s/day
    
    datetime/C      1,579,598,146,120     milliseconds since 
                                          01jan1960 00:00:00.000, 
                                          adjusted for leap seconds*
                                          
    date                       18,282     days since 01jan1960
                                          (01jan1960 = 0)
                                          
    weekly date                 2,601     weeks since 1960w1
    monthly date                  600     months since 1960m1
    quarterly date                200     quarters since 1960q1
    half-yearly date              100     half-years since 1960h1
    yearly date                  2010     years since 0000
    -----------------------------------------------------------------
    SIF datetime/C is equivalent to coordinated universal time (UTC). 
    In UTC, leap seconds are periodically inserted because the length of the mean solar day is slowly increasing. 

2.3.3 HRF-to-SIF conversion functions

                         Function to convert
    SIF type        HRF to SIF                     Note
    --------------------------------------------------------------------
    datetime/c      tc =      clock(HRFstr, mask)  tc must be double
    datetime/C      tC =      Clock(HRFstr, mask)  tC must be double
    
    date            td =       date(HRFstr, mask)  td may be float or
                                                              long
    
    weekly date     tw =     weekly(HRFstr, mask)  tw may be float or int
    monthly date    tm =    monthly(HRFstr, mask)  tm may be float or int
    quarterly date  tq =  quarterly(HRFstr, mask)  tq may be float or int
    half-year date  th = halfyearly(HRFstr, mask)  th may be float or int
    yearly date     ty =     yearly(HRFstr, mask)  ty may be float or int
    --------------------------------------------------------------------
    Warning: To prevent loss of precision, datetime SIFs must be stored as doubles.

2.3.4 Displaying SIFs in HRF

                     Display format to 
    SIF type         present SIF in HRF
    -----------------------------------
    datetime/c            %tc
    datetime/C            %tC
    date                  %td
    weekly date           %tw
    monthly date          %tm
    quarterly date        %tq
    half-yearly date      %th
    yearly date           %ty
    -----------------------------------

2.3.5 SIF-to-SIF conversion

                | To:
    From:       |     datetime/c   datetime/C   date
    ------------+------------------------------------------
    datetime/c  |                  tC=Cofc(tc)  td=dofc(tc)
    datetime/C  |     tc=cofC(tC)               td=dofC(tC)
    date        |     tc=cofd(td)  tC=Cofd(td)  
    weekly      |                               td=dofw(tw)
    monthly     |                               td=dofm(tm)
    quarterly   |                               td=dofq(tq)
    half-yearly |                               td=dofh(th)
    yearly      |                               td=dofy(ty)
    -------------------------------------------------------
                | To:
    From:       |     weekly       monthly      quarterly 
    ------------+------------------------------------------
    date        |     tw=wofd(td)  tm=mofd(td)  tq=qofd(td)
    -------------------------------------------------------
                | To:
    From:       |     half-yearly  yearly
    ------------+------------------------------------------
    date        |     th=hofd(td)  ty=yofd(td)
    -------------------------------------------------------

2.3.6 datetime数据其他整理

2.3.7 datetime 小练习

从年月日字符串分别提取数值型年月日

generate double timestamp = date(varname, "DMY")
gen  year=year(timestamp)
gen  month=month(timestamp)
gen  day=day(timestamp)

2.4 missing values

2.4 数据压缩

2.5 更改数据存储类型

sysuse auto,clear
list gear_ratio in 1/5
d gear_ratio
recast int gear_ratio, force
d gear_ratio
list gear_ratio in 1/5

3. 数据的显示格式

4. 变量

4.1 变量生成

*Syntax
generate [type] newvar[:lblname] =exp [if] [in] [, before(varname) | after(varname)]
replace oldvar =exp [if] [in] [, nopromote]

*Examples
webuse genxmpl3,clear
generate age2 = age^2
generate int age2 = age^2
webuse genxmpl1, clear
replace age2 = age^2

4.2 变量命名

4.3 变量类型

graph LR A((Variable)) --- B((Quantitative Variable)); A((Variable)) --- C((Qualitative Variable)); B((Quantitative Variable)) --- D(Continuous Variable); B((Quantitative Variable)) --- E(Discrete Variable); C((Qualitative Variable)) --- F(Binary Variable); C((Qualitative Variable)) --- G(Nominal Variable); C((Qualitative Variable)) --- H(Ordinal Variable);

5. 标签 ( label )

5.1 数据标签

*Syntax
label data ["label"]
*Examples
sysuse auto,clear
d
label data "1978年汽车价格资料数据"
d   //注意观察数据标签

5.2 变量标签

*Syntax
label variable varname ["label"]
*Examples
sysuse auto,clear
label var price    汽车价格
label var foreign  "汽车产地(1 国外; 2 国内)"

5.3 值标签

*Syntax
* Define value label
label define lblname # "label" [# "label" ...] [, add modify replace nofix]
* Assign value label to variables
label values varlist lblname [, nofix]
* Remove value labels
label values varlist [.]
* List names of value labels
label dir
* List names and contents of value labels
label list [lblname [lblname ...]]
* Copy value label
label copy lblname lblname [, replace]
* Drop value labels
label drop {lblname [lblname ...] | _all}
* Save value labels in do-file
label save [lblname [lblname...]] using filename [, replace]

*Examples
sysuse auto,clear
* label define 标签名
* label values 变量名 标签名 /*将变量值和标签联系起来*/
label define repair  1 "好" 2 "较好" 3 "中" 4 "较差" 5 "差"
label values rep78 repair
*显示值标签
label dir
label list
label list repair
*添加和修改值标签
label def repair 5 "差", add
label def repair 3 "一般", modify
*删除值标签
label drop repair
label list

本节命令:
browse , list , describe , summarize , codebook , inspect
compress , recast , format , display , recast
gen , replace , rename , label