Inference after ranking#

This is a template for regression analysis after ranking. It estimates the parameters using conditionally quantile-unbiased estimates and “almost” quantile-unbiased hybrid estimates.

Click the badge below to use this template on your own data. This will open the notebook in a Jupyter binder.

Binder

Instructions:

  1. Upload a file named data.csv to this folder with your conventional estimates. Open data.csv to see an example. In this file, we named our dependent variable “dep_variable”, and have estimated parameters named “policy0”,…, “policy9”. The first column of data.csv contains the conventional estimates \(m\) of the unknown parameters. The remaining columns contain consistent estimates of the covariance matrix \(\Sigma\). In data.csv, \(m=(0, 1,..., 9)\) and \(\Sigma = I\).

  2. Modify the code if necessary.

  3. Run the notebook.

Citations#

@techreport{andrews2019inference,
  title={Inference on winners},
  author={Andrews, Isaiah and Kitagawa, Toru and McCloskey, Adam},
  year={2019},
  institution={National Bureau of Economic Research}
}

@article{andrews2022inference,
  Author = {Andrews, Isaiah and Bowen, Dillon and Kitagawa, Toru and McCloskey, Adam},
  Title = {Inference for Losers},
  Journal = {AEA Papers and Proceedings},
  Volume = {112},
  Year = {2022},
  Month = {May},
  Pages = {635-42},
  DOI = {10.1257/pandp.20221065},
  URL = {https://www.aeaweb.org/articles?id=10.1257/pandp.20221065}
}

Runtime warnings and long running times#

If you are estimating the effects of many policies or the policy effects are close together, you may see RuntimeWarning messages and experience long runtimes. Runtime warnings are common, usually benign, and can be safely ignored.

[1]:
import matplotlib.pyplot as plt
import seaborn as sns

from multiple_inference.bayes import Improper
from multiple_inference.rank_condition import RankCondition

data_file = "data.csv"
alpha = .05

conventional_model = Improper.from_csv(data_file, sort=True)
ranked_model = RankCondition.from_csv(data_file, sort=True)
sns.set()

We’ll start by summarizing and plotting the conventional estimates.

[2]:
conventional_results = conventional_model.fit(title="Conventional estiamtes")
conventional_results.summary(alpha=alpha)
[2]:
Conventional estiamtes
coef pvalue (1-sided) [0.025 0.975]
policy9 9.000 0.000 7.040 10.960
policy8 8.000 0.000 6.040 9.960
policy7 7.000 0.000 5.040 8.960
policy6 6.000 0.000 4.040 7.960
policy5 5.000 0.000 3.040 6.960
policy4 4.000 0.000 2.040 5.960
policy3 3.000 0.001 1.040 4.960
policy2 2.000 0.023 0.040 3.960
policy1 1.000 0.159 -0.960 2.960
policy0 0.000 0.500 -1.960 1.960
Dep. Variable y
[3]:
conventional_results.point_plot(alpha=alpha)
plt.show()
../_images/examples_rank_conditions_4_0.png

One property we want our estimators to have is quantile-unbiasedness. An estimator is quantile-unbiased if the true parameter falls below its \(\alpha\)-quantile estimate with probability \(\alpha\) given its estimated rank. For example, the true effect of the top-performing treatment should fall below its median estimate half the time.

Similarly, we want confidence intervals to have correct conditional coverage. Correct conditional coverage means that the parameter should fall within our \(\alpha\)-level confidence interval with probability \(1-\alpha\) given its estimated rank. For example, the true effect of the top-performing treatment should fall within its 95% confidence interval 95% of the time.

Below, we compute the optimal quantile-unbiased estimates and conditionally correct confidence intervals for each parameter given its rank.

[4]:
conditional_results = ranked_model.fit(title="Conditional estimates")
conditional_results.summary(alpha=alpha)
/home/docs/checkouts/readthedocs.org/user_builds/dsbowen-conditional-inference/envs/latest/lib/python3.10/site-packages/multiple_inference-1.2.0-py3.10.egg/multiple_inference/stats.py:468: RuntimeWarning: divide by zero encountered in log
/home/docs/checkouts/readthedocs.org/user_builds/dsbowen-conditional-inference/envs/latest/lib/python3.10/site-packages/multiple_inference-1.2.0-py3.10.egg/multiple_inference/stats.py:476: RuntimeWarning: divide by zero encountered in scalar divide
/home/docs/checkouts/readthedocs.org/user_builds/dsbowen-conditional-inference/envs/latest/lib/python3.10/site-packages/scipy/optimize/_numdiff.py:576: RuntimeWarning: invalid value encountered in subtract
  df = fun(x) - f0
[4]:
Conditional estimates
coef (median) pvalue (1-sided) [0.025 0.975]
policy9 8.686 0.000 5.067 10.933
policy8 8.000 0.001 4.078 11.922
policy7 7.000 0.001 3.078 10.922
policy6 6.000 0.003 2.078 9.922
policy5 5.000 0.009 1.078 8.922
policy4 4.000 0.023 0.078 7.922
policy3 3.000 0.058 -0.922 6.922
policy2 2.000 0.136 -1.922 5.922
policy1 1.000 0.285 -2.922 4.922
policy0 0.314 0.406 -1.933 3.933
Dep. Variable y
[5]:
conditional_results.point_plot(alpha=alpha)
plt.show()
../_images/examples_rank_conditions_7_0.png

Conditional inference is a strict requirement. Conditionally quantile-unbiased estimates can be highly variable. And conditionally correct confidence intervals can be unrealistically long. We can often obtain more reasonable estimates by focusing on unconditional inference instead of conditional inference.

Imagine we ran our randomized control trial 10,000 times and want to estimate the effect of the top-performing treatment. We need conditional inference if we’re interested the subset of trials where a specific parameter \(k\) was the top performer. However, we can use unconditional inference if we’re only interested in being right “on average” across all 10,000 trials.

Below, we use hybrid estimates to compute approximately quantile-unbiased estimates and unconditionally correct confidence intervals for each parameter.

If you don’t know whether you need conditional or unconditional inference, use unconditional inference.

[6]:
hybrid_results = ranked_model.fit(beta=.005, title="Hybrid estimates")
hybrid_results.summary(alpha=alpha)
[6]:
Hybrid estimates
coef (median) pvalue (1-sided) [0.025 0.975]
policy9 8.687 0.005 5.680 10.977
policy8 8.000 0.005 4.680 11.320
policy7 7.000 0.005 3.680 10.320
policy6 6.000 0.005 2.680 9.320
policy5 5.000 0.005 1.680 8.320
policy4 4.000 0.005 0.680 7.320
policy3 3.000 0.055 -0.320 6.320
policy2 2.000 0.140 -1.320 5.320
policy1 1.000 0.288 -2.320 4.320
policy0 0.313 0.409 -1.977 3.320
Dep. Variable y
[7]:
hybrid_results.point_plot(alpha=alpha)
plt.show()
../_images/examples_rank_conditions_10_0.png
[ ]: