multiple_inference.confidence_set#

Simultaneous confidence sets and multiple hypothesis testing.

References

@article{storey2003statistical,
    title={Statistical significance for genomewide studies},
    author={Storey, John D and Tibshirani, Robert},
    journal={Proceedings of the National Academy of Sciences},
    volume={100},
    number={16},
    pages={9440--9445},
    year={2003},
    publisher={National Acad Sciences}
}

@article{romano2005stepwise,
    title={Stepwise multiple testing as formalized data snooping},
    author={Romano, Joseph P and Wolf, Michael},
    journal={Econometrica},
    volume={73},
    number={4},
    pages={1237--1282},
    year={2005},
    publisher={Wiley Online Library}
}

@techreport{mogstad2020inference,
    title={Inference for ranks with applications to mobility across neighborhoods and academic achievement across countries},
    author={Mogstad, Magne and Romano, Joseph P and Shaikh, Azeem and Wilhelm, Daniel},
    year={2020},
    institution={National Bureau of Economic Research}
}

Classes

ConfidenceSetResults(*args[, n_samples])

Results for simultaneous confidence sets.

ConfidenceSet(mean, cov[, X, endog_names, ...])

Model for simultaneous confidence sets.

AverageComparison(*args, **kwargs)

Compare each parameter to the average value across all parameters.

BaselineComparison(*args, baseline, **kwargs)

Compare parameters to a baseline parameter.

PairwiseComparisonResults(*args[, ...])

Results of pairwise comparisons.

PairwiseComparison(mean, cov[, X, ...])

Compute pairwise comparisons.

MarginalRankingResults(model, *args[, n_samples])

Marginal ranking results.

MarginalRanking(mean, cov[, X, endog_names, ...])

Estimate rankings with marginal confidence intervals.

SimultaneousRankingResults(model, *args[, ...])

Simultaneous ranking results.

SimultaneousRanking(mean, cov[, X, ...])

Estimate rankings with simultaneous confidence intervals.

class multiple_inference.confidence_set.AverageComparison(*args, **kwargs)[source]#

Compare each parameter to the average value across all parameters.

Subclasses ConfidenceSet.

Parameters:

Examples

import numpy as np
from multiple_inference.confidence_set import AverageComparison

x = np.arange(-1, 2)
cov = np.identity(3) / 10
model = AverageComparison(x, cov)
results = model.fit()
print(results.summary())
                     Confidence set results
================================================================
   coef (conventional) pvalue qvalue 0.95 CI lower 0.95 CI upper
----------------------------------------------------------------
x0              -1.000  0.000  0.000        -1.604        -0.396
x1               0.000  1.000  1.000        -0.604         0.604
x2               1.000  0.000  0.000         0.396         1.604
===============
Dep. Variable y
---------------
class multiple_inference.confidence_set.BaselineComparison(*args, baseline: int | str, **kwargs)[source]#

Compare parameters to a baseline parameter.

Subclasses ConfidenceSet.

Parameters:

baseline (Union[int, str]) – Index or name of the baseline parameter.

Examples

import numpy as np
from multiple_inference.confidence_set import BaselineComparison

x = np.arange(-1, 2)
cov = np.identity(3) / 10
model = BaselineComparison(x, cov, exog_names=["x0", "x1", "x2"], baseline="x0")
results = model.fit()
print(results.summary())
Confidence set results
================================================================
   coef (conventional) pvalue qvalue 0.95 CI lower 0.95 CI upper
----------------------------------------------------------------
x1               1.000  0.045  0.012         0.022         1.978
x2               2.000  0.000  0.000         1.022         2.978
===============
Dep. Variable y
---------------
class multiple_inference.confidence_set.ConfidenceSet(mean: Sequence[float], cov: ndarray, X: ndarray | None = None, endog_names: str | None = None, exog_names: Sequence[str] | None = None, columns: Sequence[int] | Sequence[str] | Sequence[bool] | None = None, sort: bool = False, random_state: int = 0)[source]#

Model for simultaneous confidence sets.

Examples

import numpy as np
from multiple_inference.confidence_set import ConfidenceSet

x = np.arange(-1, 2)
cov = np.identity(3) / 10
model = ConfidenceSet(x, cov)
results = model.fit()
print(results.summary())
Confidence set results
================================================================
   coef (conventional) pvalue qvalue 0.95 CI lower 0.95 CI upper
----------------------------------------------------------------
x0              -1.000  0.004  0.002        -1.755        -0.245
x1               0.000  1.000  1.000        -0.755         0.755
x2               1.000  0.004  0.002         0.245         1.755
===============
Dep. Variable y
---------------
print(results.test_hypotheses())
    param>0  param<0
x0    False     True
x1    False    False
x2     True    False
class multiple_inference.confidence_set.ConfidenceSetResults(*args: Any, n_samples: int = 10000, **kwargs: Any)[source]#

Results for simultaneous confidence sets.

Subclasses multiple_inference.base.ResultsBase.

Parameters:

n_samples (int, optional) – Number of samples to draw when approximating the confidence set. Defaults to 10000.

test_hypotheses(alpha: float = 0.05, columns: Sequence[int] | Sequence[str] | Sequence[bool] | None = None, two_tailed: bool = True, fast: bool | str = 'auto') DataFrame | Series[source]#

Test the null hypothesis that the parameter is equal to 0.

Parameters:
  • alpha (float, optional) – Significance level. Defaults to 0.05.

  • columns (ColumnsType, optional) – Selected columns. Defaults to None.

  • two_tailed (bool, optional) – Run two-tailed hypothesis tests. Set to False to run one-tailed hypothesis tests. Defaults to True.

  • fast (Union[bool, str], optional) – Avoid the stepdown procedure. Defaults to “auto”.

Returns:

Results dataframe (if two-tailed) or series

(if one-tailed).

Return type:

Union[pd.DataFrame, pd.Series]

class multiple_inference.confidence_set.MarginalRanking(mean: Sequence[float], cov: ndarray, X: ndarray | None = None, endog_names: str | None = None, exog_names: Sequence[str] | None = None, columns: Sequence[int] | Sequence[str] | Sequence[bool] | None = None, sort: bool = False, random_state: int = 0)[source]#

Estimate rankings with marginal confidence intervals.

Subclasses ConfidenceSet.

Examples

import numpy as np
from multiple_inference.confidence_set import MarginalRanking

x = np.arange(-1, 2)
cov = np.diag([1, 2, 3]) / 10
model = MarginalRanking(x, cov)
results = model.fit()
print(results.summary())
                  Marginal ranking
=========================================================
   rank (conventional) pvalue 0.95 CI lower 0.95 CI upper
---------------------------------------------------------
x0               3.000    nan         2.000         3.000
x1               2.000    nan         1.000         3.000
x2               1.000    nan         1.000         2.000
===============
Dep. Variable y
---------------
class multiple_inference.confidence_set.MarginalRankingResults(model: MarginalRanking, *args: Any, n_samples: int = 10000, **kwargs: Any)[source]#

Marginal ranking results.

class multiple_inference.confidence_set.PairwiseComparison(mean: Sequence[float], cov: ndarray, X: ndarray | None = None, endog_names: str | None = None, exog_names: Sequence[str] | None = None, columns: Sequence[int] | Sequence[str] | Sequence[bool] | None = None, sort: bool = False, random_state: int = 0)[source]#

Compute pairwise comparisons.

Examples

import numpy as np
from multiple_inference.confidence_set import PairwiseComparison

x = np.arange(-1, 2)
cov = np.identity(3) / 10
model = PairwiseComparison(x, cov)
results = model.fit()
print(results.summary())
Pairwise comparisons
======================================================================
        delta (conventional) pvalue qvalue 0.95 CI lower 0.95 CI upper
----------------------------------------------------------------------
x1 - x0                1.000  0.061  0.017        -0.031         2.031
x2 - x0                2.000  0.000  0.000         0.969         3.031
x2 - x1                1.000  0.061  0.017        -0.031         2.031
===============
Dep. Variable y
---------------
print(results.test_hypotheses())
       x0     x1     x2
x0  False  False   True
x1  False  False  False
x2  False  False  False

This means that parameter x2 is significantly greater than x0.

class multiple_inference.confidence_set.PairwiseComparisonResults(*args: Any, n_samples: int = 10000, groups: ndarray | None = None, **kwargs: Any)[source]#

Results of pairwise comparisons.

Subclasses ConfidenceSetResults.

Parameters:
  • n_samples (int, optional) – Number of samples to draw to obtain critical values. Defaults to 10000.

  • groups (np.ndarray, optional) – (# params,) array of parameter groups. Defaults to None.

Raises:

ValueError – Length of groups must match the number of parameters.

hypothesis_heatmap(*args: Any, title: str | None = None, axes=None, triangular: bool = False, **kwargs: Any)[source]#

Create a heatmap of pairwise hypothesis tests.

Parameters:
  • title (str, optional) – Title.

  • axes (Union[AxesSubplot, Sequence[AxesSubplot]], optional) – Axes to write on. Defaults to None.

  • triangular (bool, optional) – Display the results in a triangular (as opposed to square) output. Usually, you should set this to True if and only if your columns are sorted. Defaults to False.

Returns:

Array of axes.

Return type:

np.ndarray

test_hypotheses(alpha: float = 0.05, columns: Sequence[int] | Sequence[str] | Sequence[bool] | None = None, criterion: str = 'fwer', groups: Sequence | None = None, fast: bool | str = 'auto') DataFrame | dict[Any, DataFrame][source]#

Test pairwise hypotheses.

Parameters:
  • alpha (float, optional) – Significance level. Defaults to .05.

  • columns (ColumnsType, optional) – Selected columns. In wide format, these are the original column names (e.g., “x0”). In long format, these are the names of the differences (e.g., “x1 - x0”). Defaults to None.

  • criterion (str, optional) – “fwer” to control for the family-wise error rate (using pvalues), “fdr” to control for the false discovery rate (using qvalues). Defaults to “fwer”.

  • groups (Sequence, optional) –

  • fast (Union[bool, str], optional) – Indicates to use a fast version of the algorithm. Defaults to “auto”.

Raises:

ValueErrorcriterion must be one of “fwer” or “fdr”.

Returns:

Results dataframe if only one

group is used, mapping of group name to results dataframe if multiple groups are used.

Return type:

Union[pd.DataFrame, dict[Any, pd.DataFrame]]

Notes

When controlling for the familywise error rate, the null hypotheses are $$mu_k leq mu_j$$. When controlling for the false discovery rate, the null hypotheses are $$mu_k = mu_j$$.

class multiple_inference.confidence_set.SimultaneousRanking(mean: Sequence[float], cov: ndarray, X: ndarray | None = None, endog_names: str | None = None, exog_names: Sequence[str] | None = None, columns: Sequence[int] | Sequence[str] | Sequence[bool] | None = None, sort: bool = False, random_state: int = 0)[source]#

Estimate rankings with simultaneous confidence intervals.

Subclasses ConfidenceSet.

Examples

import numpy as np
from multiple_inference.confidence_set import SimultaneousRanking

x = np.arange(3)
cov = np.identity(3) / 10
model = SimultaneousRanking(x, cov)
results = model.fit()
print(results.summary())
                   Simultaneous ranking
=========================================================
   rank (conventional) pvalue 0.95 CI lower 0.95 CI upper
---------------------------------------------------------
x0               3.000    nan         2.000         3.000
x1               2.000    nan         1.000         3.000
x2               1.000    nan         1.000         2.000
===============
Dep. Variable y
---------------
print(results.compute_best_params())
x0    False
x1    False
x2     True
dtype: bool

This we can be 95% confident that the best (largest) parameter is x2.

class multiple_inference.confidence_set.SimultaneousRankingResults(model: SimultaneousRanking, *args: Any, n_samples: int = 10000, **kwargs: Any)[source]#

Simultaneous ranking results.

compute_best_params(n_best_params: int = 1, alpha: float = 0.05, superset: bool = True, fast: bool | str = 'auto') Series[source]#

Compute the set of best (largest) parameters.

Find the set of parameters such that the truly best n_best_params parameters are in this set with probability 1-alpha. Or, find the set of parameters such that these parameters are in the truly best n_best_params parameters with probability 1-alpha.

Parameters:
  • n_best_params (int, optional) – Number of best parameters. Defaults to 1.

  • alpha (float, optional) – Significance level. Defaults to 0.05.

  • superset (bool, optional) – Indicates that the returned set is a superset of the truly best n parameters. If False, the returned set is a subset of the truly best n parameters. Defaults to True.

  • fast (Union[bool, str], optional) – selection algorithm. Defaults to “auto”.

Returns:

Indicates which parameters are in the selected set.

Return type:

pd.Series