stata - Create 3-way percentages table -
i have 3-way table displaying column or row percentages using 3 categorical variables. command below gives counts cannot find how percentages instead.
sysuse nlsw88 table married race collgrad, col -------------------------------------------------------------------- | college graduate , race | ---- not college grad ---- ------ college grad ------ married | white black other total white black other total ----------+--------------------------------------------------------- single | 355 256 5 616 132 53 3 188 married | 862 224 12 1,098 288 50 6 344 -------------------------------------------------------------------- how can percentages?
this answer show miscellany of tricks. downside don't know easy way ask. upside these tricks easy understand , useful.
let's use example, excellent purpose.
. sysuse nlsw88, clear (nlsw, 1988 extract) tip #1 can calculate percent variable yourself. focus on % single. in data set married binary, won't show complementary percent. once have calculated it, can (a) rely on fact constant within groups used define (b) tabulate directly. find tabdisp underrated users. it's billed programmer's command, not difficult use @ all. tabdisp lets set display format on fly; no harm , might useful other commands assign 1 directly using format.
. egen pcsingle = mean(100 * (1 - married)), by(collgrad race) . tabdisp collgrad race, c(pcsingle) format(%2.1f) -------------------------------------- | race college graduate | white black other -----------------+-------------------- not college grad | 29.2 53.3 29.4 college grad | 31.4 51.5 33.3 -------------------------------------- . format pcsingle %2.1f tip #2 user-written command groups offers different flexibility. groups can installed ssc (strictly, must installed before can use it). it's wrapper various kinds of tables, using list display engine.
. * installation once . ssc inst groups . groups collgrad race pcsingle +-------------------------------------------------------+ | collgrad race pcsingle freq. percent | |-------------------------------------------------------| | not college grad white 29.2 1217 54.19 | | not college grad black 53.3 480 21.37 | | not college grad other 29.4 17 0.76 | | college grad white 31.4 420 18.70 | | college grad black 51.5 103 4.59 | |-------------------------------------------------------| | college grad other 33.3 9 0.40 | +-------------------------------------------------------+ we can improve on that. can set better header text using characteristics. (in practice, these can less constrained variable names need shorter variable labels.) can use separators calling standard list options.
. char pcsingle[varname] "% single" . char collgrad[varname] "college?" . groups collgrad race pcsingle , subvarname sepby(collgrad) +-------------------------------------------------------+ | college? race % single freq. percent | |-------------------------------------------------------| | not college grad white 29.2 1217 54.19 | | not college grad black 53.3 480 21.37 | | not college grad other 29.4 17 0.76 | |-------------------------------------------------------| | college grad white 31.4 420 18.70 | | college grad black 51.5 103 4.59 | | college grad other 33.3 9 0.40 | +-------------------------------------------------------+ tip #3 wire display formats variable making string equivalent. don't illustrate fully, use when want combine display of counts numerical results decimal places in tabdisp. format(%2.1f) , format(%3.2f) might fine variables (and incidentally important detail number of decimal places) lead display of count of 42 42.0 or 42.00, pretty silly. format() option of tabdisp not reach string , change contents; doesn't know string variable contains or came from. so, strings shown tabdisp come, want.
. gen s_pcsingle = string(pcsingle, "%2.1f") . char s_pcsingle[varname] "% single" groups has option save tabulated fresh dataset.
tip #4 have total category, temporarily double data. clone of original relabelled total category. may need calculations, nothing there amounts rocket science: smart high school student figure out. here concrete example line-by-line study beats lengthy explanations.
. preserve . local np1 = _n + 1 . expand 2 (2,246 observations created) . replace race = 4 in `np1'/l (2,246 real changes made) . label def racelbl 4 "total", modify . drop pcsingle . egen pcsingle = mean(100 * (1 - married)), by(collgrad race) . char pcsingle[varname] "% single" . format pcsingle %2.1f . gen istotal = race == 4 . bysort collgrad istotal: gen total = _n . * percents of global total, need correct doubling . scalar alltotal = _n/2 . * table shows percents college & race | collgrad , collgrad | total . bysort collgrad race : gen pc = 100 * cond(istotal, total/alltotal, _n/total) . format pc %2.1f . char pc[varname] "percent" . groups collgrad race pcsingle pc , show(f) subvarname sepby(collgrad istotal) +-------------------------------------------------------+ | college? race % single percent freq. | |-------------------------------------------------------| | not college grad white 29.2 71.0 1217 | | not college grad black 53.3 28.0 480 | | not college grad other 29.4 1.0 17 | |-------------------------------------------------------| | not college grad total 35.9 76.3 1714 | |-------------------------------------------------------| | college grad white 31.4 78.9 420 | | college grad black 51.5 19.4 103 | | college grad other 33.3 1.7 9 | |-------------------------------------------------------| | college grad total 35.3 23.7 532 | +-------------------------------------------------------+ note trick of using variable not shown explicitly add separator lines.
Comments
Post a Comment