Information

In the subsequent sample code, the dataframe df I use contains the following variables:

Not all variables are required for each estimator.

Variable	Description
`id`	A variable indicating the units/individual an observation belongs to in our data.* *For repeated cross-section, the `id` variable should be instead the group/level of which treatment is assigned. For example, if treatment is assigned by county/state, use that as the `id` variable.
`time`	A variable indicating time-periods in our study for each observation.
`outcome`	Outcome variable for each observation.
`treat`	Treatment variable for each observation. Should equal 1 for treated units and 0 for untreated units.
`rel.time`	A relative time variable that indicates for the given period \(t\) of an observation, how many time-periods away did the unit \(i\) first get the treatment. For example, if unit \(i\) is treated in 2005, and the observation \(it\) is from 2003, the relative time would be -2. This means `rel.time = 0` at the initial year of treatment for unit \(i\). A common question is regarding never-treated units. Generally, we do not set `rel.time` to `NA` for observations that never get treated. This can cause issues with estimation. We generally either 1) set it to a large or small number not in the range of the `time` variable, or 2) set it to 0 (as in csdid).
`group`	Variable specifying if a unit is part of the treatment group or never-treated (control group). For units never receiving treatment, they get value 0, and for units that do end up receiving treatment sometime within the study, they get value 1. This is different from `treat`. For never-treated units, `treat = group` always. But for units who at sometime receive treatment, `group = 1` for all time periods (including periods before receiving treatment). `treat = 1` is only after receiving treatment, `treat = 0` is before receiving treatment.
`covar`	(optional) covariate(s) to condition for parallel trends.