Information

In the subsequent sample code, the dataframe df I use contains the following variables:

Not all variables are required for each estimator.

Variable Description
id

A variable indicating the units/individual an observation belongs to in our data.*

*For repeated cross-section, the id variable should be instead the group/level of which treatment is assigned. For example, if treatment is assigned by county/state, use that as the id variable.

time A variable indicating time-periods in our study for each observation.
outcome Outcome variable for each observation.
treat Treatment variable for each observation. Should equal 1 for treated units and 0 for untreated units.
rel.time

A relative time variable that indicates for the given period \(t\) of an observation, how many time-periods away did the unit \(i\) first get the treatment. For example, if unit \(i\) is treated in 2005, and the observation \(it\) is from 2003, the relative time would be -2.** This means rel.time = 0 at the initial year of treatment for unit \(i\).

** A common question is regarding never-treated units. Generally, we do not set rel.time to NA for observations that never get treated. This can cause issues with estimation. We generally either 1) set it to a large or small number not in the range of the time variable, or 2) set it to 0 (as in csdid).

group

Variable specifying if a unit is part of the treatment group or never-treated (control group). For units never receiving treatment, they get value 0, and for units that do end up receiving treatment sometime within the study, they get value 1.

This is different from treat. For never-treated units, treat = group always. But for units who at sometime receive treatment, group = 1 for all time periods (including periods before receiving treatment). treat = 1 is only after receiving treatment, treat = 0 is before receiving treatment.

covar (optional) covariate(s) to condition for parallel trends.