-U pyfixest pip install
pyfixest
Python package for TWFE, 2-stage DiD (Gardner 2021), and interaction-weighted (Sun and Abraham 2021).
pyfixest is a port of the fixest package for python, with slighty different syntax and offerings. pyfixest can implement the standard TWFE estimator for difference-in-differences. Documentation can be found here.
Install the package by inputting the following into the terminal:
sample code
Start by loading packages and the data:
import pyfixest as pf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
= pd.read_csv('df.csv') df
We can use the pf.feols()
function to run the TWFE estimation for the ATT:
= pf.feols(
mod = "outcome ~ treat + covar | id + time",
fml = df,
data = {"CRV1": "id"}, # change "id", do not touch "CRV1"
vcov
) mod.summary()
#> ###
#>
#> Estimation: OLS
#> Dep. var.: outcome, Fixed effects: id+time
#> Inference: CRV1
#> Observations: 950
#>
#> | Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% |
#> |:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
#> | treat | -3.683 | 0.361 | -10.200 | 0.000 | -4.400 | -2.966 |
#> | covar | 1.018 | 0.032 | 31.414 | 0.000 | 0.954 | 1.083 |
#> ---
#> RMSE: 1.609 R2: 0.725 R2 Within: 0.623
Dynamic treatment effects for a TWFE event study can be calculated as follows:
= pf.feols(
mod = "outcome ~ i(rel_time, group, ref = -1) + covar | id + time",
fml = df,
data = {"CRV1": "id"}, # change "id", do not touch "CRV1"
vcov )
The iplot option within pyfixest is flawed, so we will have to manually plot them with matplotlib.
# save the results into tidy
= mod.tidy()
res
# drop covariates from result dataframe (if required)
= res.drop('covar')
res
# select needed columns
= res[['Estimate', '2.5%', '97.5%']]
res
# no t=-1 estimate, so we will need to add it
= pd.DataFrame({
new 'Estimate': [0.00],
'2.5%': [0.00],
'97.5%': [0.00]
})= res.iloc[:8] # split dataframe so only pre-treat
pre = res.iloc[8:] # split dataframe so only post-treat
post = pd.concat([pre, new, post], ignore_index = True) # stick t=-1 between pre and post
plot_df
# rel_time variable
= np.arange(-9, 9, dtype=np.int64)
rel_time 'rel_time'] = rel_time # add to dataframe
plot_df[
# plotting time
= plot_df['rel_time']
x = plot_df['Estimate']
y = [y - plot_df['2.5%'], plot_df['97.5%'] - y] # err distances
yerr
= plt.subplots()
fig, ax = "black", s = 15) # s is for size
ax.scatter(x, y, color = yerr, fmt = 'none', color = 'black', ecolor = 'black', capsize = 0)
ax.errorbar(x, y, yerr = -0.5, color = "gray")
ax.axvline(x = 0, color = "gray")
ax.axhline(y True, linewidth = 0.3, alpha = 0.5, color = "gray")
ax.grid('Event Study Estimates')
ax.set_title('Time to initial treatment')
ax.set_xlabel('Estimate')
ax.set_ylabel( plt.show()