Inference after ranking#
This is a template for regression analysis after ranking. It estimates the parameters using conditionally quantile-unbiased estimates and “almost” quantile-unbiased hybrid estimates.
Click the badge below to use this template on your own data. This will open the notebook in a Jupyter binder.
Instructions:
Upload a file named
data.csv
to this folder with your conventional estimates. Opendata.csv
to see an example. In this file, we named our dependent variable “dep_variable”, and have estimated parameters named “policy0”,…, “policy9”. The first column ofdata.csv
contains the conventional estimates \(m\) of the unknown parameters. The remaining columns contain consistent estimates of the covariance matrix \(\Sigma\). Indata.csv
, \(m=(0, 1,..., 9)\) and \(\Sigma = I\).Modify the code if necessary.
Run the notebook.
Citations#
@techreport{andrews2019inference,
title={Inference on winners},
author={Andrews, Isaiah and Kitagawa, Toru and McCloskey, Adam},
year={2019},
institution={National Bureau of Economic Research}
}
@article{andrews2022inference,
Author = {Andrews, Isaiah and Bowen, Dillon and Kitagawa, Toru and McCloskey, Adam},
Title = {Inference for Losers},
Journal = {AEA Papers and Proceedings},
Volume = {112},
Year = {2022},
Month = {May},
Pages = {635-42},
DOI = {10.1257/pandp.20221065},
URL = {https://www.aeaweb.org/articles?id=10.1257/pandp.20221065}
}
Runtime warnings and long running times#
If you are estimating the effects of many policies or the policy effects are close together, you may see RuntimeWarning
messages and experience long runtimes. Runtime warnings are common, usually benign, and can be safely ignored.
[1]:
import matplotlib.pyplot as plt
import seaborn as sns
from multiple_inference.bayes import Improper
from multiple_inference.rank_condition import RankCondition
data_file = "data.csv"
alpha = .05
conventional_model = Improper.from_csv(data_file, sort=True)
ranked_model = RankCondition.from_csv(data_file, sort=True)
sns.set()
We’ll start by summarizing and plotting the conventional estimates.
[2]:
conventional_results = conventional_model.fit(title="Conventional estiamtes")
conventional_results.summary(alpha=alpha)
[2]:
coef | pvalue (1-sided) | [0.025 | 0.975] | |
---|---|---|---|---|
policy9 | 9.000 | 0.000 | 7.040 | 10.960 |
policy8 | 8.000 | 0.000 | 6.040 | 9.960 |
policy7 | 7.000 | 0.000 | 5.040 | 8.960 |
policy6 | 6.000 | 0.000 | 4.040 | 7.960 |
policy5 | 5.000 | 0.000 | 3.040 | 6.960 |
policy4 | 4.000 | 0.000 | 2.040 | 5.960 |
policy3 | 3.000 | 0.001 | 1.040 | 4.960 |
policy2 | 2.000 | 0.023 | 0.040 | 3.960 |
policy1 | 1.000 | 0.159 | -0.960 | 2.960 |
policy0 | 0.000 | 0.500 | -1.960 | 1.960 |
Dep. Variable | y |
---|
[3]:
conventional_results.point_plot(alpha=alpha)
plt.show()
One property we want our estimators to have is quantile-unbiasedness. An estimator is quantile-unbiased if the true parameter falls below its \(\alpha\)-quantile estimate with probability \(\alpha\) given its estimated rank. For example, the true effect of the top-performing treatment should fall below its median estimate half the time.
Similarly, we want confidence intervals to have correct conditional coverage. Correct conditional coverage means that the parameter should fall within our \(\alpha\)-level confidence interval with probability \(1-\alpha\) given its estimated rank. For example, the true effect of the top-performing treatment should fall within its 95% confidence interval 95% of the time.
Below, we compute the optimal quantile-unbiased estimates and conditionally correct confidence intervals for each parameter given its rank.
[4]:
conditional_results = ranked_model.fit(title="Conditional estimates")
conditional_results.summary(alpha=alpha)
/home/docs/checkouts/readthedocs.org/user_builds/dsbowen-conditional-inference/envs/latest/lib/python3.10/site-packages/multiple_inference-1.2.0-py3.10.egg/multiple_inference/stats.py:468: RuntimeWarning: divide by zero encountered in log
/home/docs/checkouts/readthedocs.org/user_builds/dsbowen-conditional-inference/envs/latest/lib/python3.10/site-packages/multiple_inference-1.2.0-py3.10.egg/multiple_inference/stats.py:476: RuntimeWarning: divide by zero encountered in scalar divide
/home/docs/checkouts/readthedocs.org/user_builds/dsbowen-conditional-inference/envs/latest/lib/python3.10/site-packages/scipy/optimize/_numdiff.py:576: RuntimeWarning: invalid value encountered in subtract
df = fun(x) - f0
[4]:
coef (median) | pvalue (1-sided) | [0.025 | 0.975] | |
---|---|---|---|---|
policy9 | 8.686 | 0.000 | 5.067 | 10.933 |
policy8 | 8.000 | 0.001 | 4.078 | 11.922 |
policy7 | 7.000 | 0.001 | 3.078 | 10.922 |
policy6 | 6.000 | 0.003 | 2.078 | 9.922 |
policy5 | 5.000 | 0.009 | 1.078 | 8.922 |
policy4 | 4.000 | 0.023 | 0.078 | 7.922 |
policy3 | 3.000 | 0.058 | -0.922 | 6.922 |
policy2 | 2.000 | 0.136 | -1.922 | 5.922 |
policy1 | 1.000 | 0.285 | -2.922 | 4.922 |
policy0 | 0.314 | 0.406 | -1.933 | 3.933 |
Dep. Variable | y |
---|
[5]:
conditional_results.point_plot(alpha=alpha)
plt.show()
Conditional inference is a strict requirement. Conditionally quantile-unbiased estimates can be highly variable. And conditionally correct confidence intervals can be unrealistically long. We can often obtain more reasonable estimates by focusing on unconditional inference instead of conditional inference.
Imagine we ran our randomized control trial 10,000 times and want to estimate the effect of the top-performing treatment. We need conditional inference if we’re interested the subset of trials where a specific parameter \(k\) was the top performer. However, we can use unconditional inference if we’re only interested in being right “on average” across all 10,000 trials.
Below, we use hybrid estimates to compute approximately quantile-unbiased estimates and unconditionally correct confidence intervals for each parameter.
If you don’t know whether you need conditional or unconditional inference, use unconditional inference.
[6]:
hybrid_results = ranked_model.fit(beta=.005, title="Hybrid estimates")
hybrid_results.summary(alpha=alpha)
[6]:
coef (median) | pvalue (1-sided) | [0.025 | 0.975] | |
---|---|---|---|---|
policy9 | 8.687 | 0.005 | 5.680 | 10.977 |
policy8 | 8.000 | 0.005 | 4.680 | 11.320 |
policy7 | 7.000 | 0.005 | 3.680 | 10.320 |
policy6 | 6.000 | 0.005 | 2.680 | 9.320 |
policy5 | 5.000 | 0.005 | 1.680 | 8.320 |
policy4 | 4.000 | 0.005 | 0.680 | 7.320 |
policy3 | 3.000 | 0.055 | -0.320 | 6.320 |
policy2 | 2.000 | 0.140 | -1.320 | 5.320 |
policy1 | 1.000 | 0.288 | -2.320 | 4.320 |
policy0 | 0.313 | 0.409 | -1.977 | 3.320 |
Dep. Variable | y |
---|
[7]:
hybrid_results.point_plot(alpha=alpha)
plt.show()
[ ]: