{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Inference after ranking\n",
    "\n",
    "This is a template for regression analysis after ranking. It estimates the parameters using conditionally quantile-unbiased estimates and \"almost\" quantile-unbiased hybrid estimates.\n",
    "\n",
    "Click the badge below to use this template on your own data. This will open the notebook in a Jupyter binder.\n",
    "\n",
    "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gl/dsbowen%2Fconditional-inference/HEAD?urlpath=lab/tree/docs/examples/rank_conditions.ipynb)\n",
    "\n",
    "Instructions:\n",
    "\n",
    "1. Upload a file named `data.csv` to this folder with your conventional estimates. Open `data.csv` to see an example. In this file, we named our dependent variable \"dep_variable\", and have estimated parameters named \"policy0\",..., \"policy9\". The first column of `data.csv` contains the conventional estimates $m$ of the unknown parameters. The remaining columns contain consistent estimates of the covariance matrix $\\Sigma$. In `data.csv`, $m=(0, 1,..., 9)$ and $\\Sigma = I$.\n",
    "2. Modify the code if necessary.\n",
    "3. Run the notebook.\n",
    "\n",
    "### Citations\n",
    "\n",
    "    @techreport{andrews2019inference,\n",
    "      title={Inference on winners},\n",
    "      author={Andrews, Isaiah and Kitagawa, Toru and McCloskey, Adam},\n",
    "      year={2019},\n",
    "      institution={National Bureau of Economic Research}\n",
    "    }\n",
    "\n",
    "    @article{andrews2022inference,\n",
    "      Author = {Andrews, Isaiah and Bowen, Dillon and Kitagawa, Toru and McCloskey, Adam},\n",
    "      Title = {Inference for Losers},\n",
    "      Journal = {AEA Papers and Proceedings},\n",
    "      Volume = {112},\n",
    "      Year = {2022},\n",
    "      Month = {May},\n",
    "      Pages = {635-42},\n",
    "      DOI = {10.1257/pandp.20221065},\n",
    "      URL = {https://www.aeaweb.org/articles?id=10.1257/pandp.20221065}\n",
    "    }\n",
    "\n",
    "### Runtime warnings and long running times\n",
    "\n",
    "If you are estimating the effects of many policies or the policy effects are close together, you may see `RuntimeWarning` messages and experience long runtimes. Runtime warnings are common, usually benign, and can be safely ignored."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "from multiple_inference.bayes import Improper\n",
    "from multiple_inference.rank_condition import RankCondition\n",
    "\n",
    "data_file = \"data.csv\"\n",
    "alpha = .05\n",
    "\n",
    "conventional_model = Improper.from_csv(data_file, sort=True)\n",
    "ranked_model = RankCondition.from_csv(data_file, sort=True)\n",
    "sns.set()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll start by summarizing and plotting the conventional estimates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "conventional_results = conventional_model.fit(title=\"Conventional estiamtes\")\n",
    "conventional_results.summary(alpha=alpha)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "conventional_results.point_plot(alpha=alpha)\n",
    "plt.show()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One property we want our estimators to have is *quantile-unbiasedness*. An estimator is quantile-unbiased if the true parameter falls below its $\\alpha$-quantile estimate with probability $\\alpha$ given its estimated rank. For example, the true effect of the top-performing treatment should fall below its median estimate half the time.\n",
    "\n",
    "Similarly, we want confidence intervals to have *correct conditional coverage*. Correct conditional coverage means that the parameter should fall within our $\\alpha$-level confidence interval with probability $1-\\alpha$ given its estimated rank. For example, the true effect of the top-performing treatment should fall within its 95% confidence interval 95% of the time.\n",
    "\n",
    "Below, we compute the optimal quantile-unbiased estimates and conditionally correct confidence intervals for each parameter given its rank."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "conditional_results = ranked_model.fit(title=\"Conditional estimates\")\n",
    "conditional_results.summary(alpha=alpha)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "conditional_results.point_plot(alpha=alpha)\n",
    "plt.show()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Conditional inference is a strict requirement. Conditionally quantile-unbiased estimates can be highly variable. And conditionally correct confidence intervals can be unrealistically long. We can often obtain more reasonable estimates by focusing on *unconditional* inference instead of *conditional* inference.\n",
    "\n",
    "Imagine we ran our randomized control trial 10,000 times and want to estimate the effect of the top-performing treatment. We need *conditional* inference if we're interested the subset of trials where a specific parameter $k$ was the top performer. However, we can use *unconditional* inference if we're only interested in being right \"on average\" across all 10,000 trials.\n",
    "\n",
    "Below, we use *hybrid estimates* to compute approximately quantile-unbiased estimates and unconditionally correct confidence intervals for each parameter.\n",
    "\n",
    "If you don't know whether you need conditional or unconditional inference, use unconditional inference."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hybrid_results = ranked_model.fit(beta=.005, title=\"Hybrid estimates\")\n",
    "hybrid_results.summary(alpha=alpha)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hybrid_results.point_plot(alpha=alpha)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "a31fe93114e6fe9c0b874076e62df141d5b35f609e1bfa94ca168a298e55e549"
  },
  "kernelspec": {
   "display_name": "Python 3.9.0 ('conditional-inference')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.11"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}