Functions

Multilevel mixedeffects models
Whether the groupings in your data arise in a nested fashion (students nested in classrooms and classrooms nested in schools) or in a nonnested fashion (elementary school crossed with middle school), you can fit a multilevel model to account for the lack of independence within these groups. Fit models for continuous, binary, count, ordinal, and survival outcomes. Estimate variances of random intercepts and random coefficients. Compute intraclass correlations. Predict random effects. Estimate relationships that are population averaged over the random effects.
Structural equation modeling (SEM)
Estimate mediation effects, analyze the relationship between an unobserved latent concept such as verbal abilities and the observed variables that measure verbal abilities, or fit a model with complex relationships among both latent and observed variables. Fit models with continuous, binary, count, and ordinal outcomes. Even fit hierarchical models with groups of correlated observations such as children within the same schools. Evaluate model fit. Compute indirect and total effects. Fit models by drawing a path diagram or using the straightforward command syntax.
General linear models
Fit one and twoway models. Or fit models with three, four, or even more factors. Analyze data with nested factors, with fixed and random factors, or with repeated measures. Use ANCOVA models when you have continuous covariates and MANOVA models when you have multiple outcome variables. Further explore the relationships between your outcome and predictors by estimating effect sizes and computing leastsquares and marginal means. Perform contrasts and pairwise comparisons. Analyze and plot interactions.
IRT (item response theory)
Explore the relationship between unobserved latent characteristics such as mathematical aptitude and the probability of correctly answering test questions (items). Or explore the relationship between teacher job satisfaction and selfreported responses to questions related to job statisfaction. IRT can be used to create measures of such unobserved traits or place individuals on a scale measuring the trait. It can also be used to select the best items for measuring a latent trait. IRT models are available for binary, graded, rated, partialcredit, and nominal response items. Visualize the relationships using item characteristic curves, and measure overall test performance using test information functions.
Linear, binary, and count regressions
Fit classical linear regression models of the relationship between a continuous outcome, such as a reading test score, and the determinants of the score, such as teaching method and the student's reading level in the previous grade. If your response is binary (for example, pass or fail test), ordinal (education level), count (number of students), or categorical (private, public, or home school), don't worry. Stata has maximum likelihood estimators—logistic, ordered logistic, Poisson, multinomial logit, and many others—that estimate the relationship between such outcomes and their determinants. A vast array of tools is available after fitting such models. Predict outcomes and their confidence intervals. Test equality of parameters. Compute linear and nonlinear combinations of parameters.
Metaanalysis
Combine results of multiple studies to estimate an overall effect. Use forest plots to visualize results. Use subgroup analysis and metaregression to explore study heterogeneity. Use funnel plots and formal tests to explore publication bias and smallstudy effects. Use trimandfill analysis to assess the impact of publication bias on results. Perform cumulative metaanalysis. Use the meta suite, or let the Control Panel interface guide you through your entire metaanalysis.
Multiple imputation
Account for missing data in your sample using multiple imputation. Choose from univariate and multivariate methods to impute missing values in continuous, censored, truncated, binary, ordinal, categorical, and count variables. Then, in a single step, estimate parameters using the imputed datasets, and combine results. Fit a linear model, logit model, Poisson model, hierarchical model, survival model, or one of the many other supported models. Use the mi command, or let the Control Panel interface guide you through your entire MI analysis.
Contrasts, marginal means, and profile plots
Quickly and easily obtain contrasts for categorical variables and their interactions. R.edlevel will give you all the contrasts of education level with a reference category. A.edlevel will give you each paired contrast with the next higher education level. There are many more named contrasts, and you can specify your own. If you don't like typing, use a dialog box to select your contrasts. Marginal means are just a simple command or mouse click away after almost any estimation command. Evaluating interaction effects, the effects of moderating variables, is just as easy. And this is not just for linear models, but for models with binary, ordinal, and count outcomes. Even for hierarchical models with correct handling of random effects. A simple command or a few mouse clicks will get you a profile plot of any of these results.
Power, precision, and sample size
Before you conduct your experiment, determine the sample size needed to detect meaningful effects without wasting resources. Do you intend to compute CIs for means or variances, or perform tests for proportions or correlations? Do you plan to fit a Cox proportionalhazards model or compare survivor functions using a logrank test? Do you want to use a CochranMantelHaenszel test of association or a CochranArmitage trend test? Use Stata's power command to compute power and sample size, create customized tables, and automatically graph the relationships between power, sample size, and effect size for your planned study. Or, use the ciwidth command to do the same but for CIs instead of hypothesis tests by computing the required sample size for the desired CI precision. Instead of commands, use the interactive Control Panel to perform your analysis.
Choice models
Model your discrete choice data. If your outcome is, for instance, highschool graduates' choices to attend college, attend a trade school, or to work, you can fit a conditional logit, multinomial probit, or mixed logit model. Is your outcome instead a ranking of prefered alternatives? Fit a rankordered probit or rankordered logit model. Regardless of the model fit, you can use the margins to easily interpret the results. Estimate how much distance to the nearest college affects the probability of enrolling in college and even the probability of going to a trade school.
Causal inference
Estimate experimentalstyle causal effects from observational data. With Stata's treatmenteffect estimators, we can use a potentialoutcomes (counterfactuals) framework to estimate, for instance, the effect of family structure on child development or the effect of unemployment on anxiety. Fit models for continuous, binary, count, fractional, and survival outcomes with binary or multivalued treatments using inverseprobability weighting (IPW), propensityscore matching, nearestneighbor matching, regression adjustment, or doubly robust estimators. If the assignment to a treatment is not independent of the outcome, you can use an endogenous treatmenteffects estimator.
Bayesian analysis
Fit Bayesian regression models using one of the Markov chain Monte Carlo (MCMC) methods. You can choose from a variety of supported models or even program your own. Extensive tools are available to check convergence, including multiple chains. Compute posterior mean estimates and credible intervals for model parameters and functions of model parameters. You can perform both interval and modelbased hypothesis testing. Compare models using Bayes factors. Compute model fit using posterior predictive values. Generate predictions.
Multivariate methods
Use multivariate analyses to evaluate relationships among variables from many different perspectives. Perform multivariate tests of means, or fit multivariate regression and MANOVA models. Explore relationships between two sets of variables, such as aptitude measurements and achievement measurements, using canonical correlation. Examine the number and structure of latent concepts underlying a set of variables using exploratory factor analysis. Or use principal component analysis to find underlying structure or to reduce the number of variables used in a subsequent analysis. Discover groupings of observations in your data using cluster analysis. If you have known groups in your data, describe differences between them using discriminant analysis.
Automated reporting and dynamic document generation
Stata is designed for reproducible research, including the ability to create dynamic documents incorporating your analysis results. Create Word or PDF files, populate Excel worksheets with results and format them to your liking, and mix Markdown, HTML, Stata results, and Stata graphs, all from within Stata.
Check out Stata's full list of features
