multiple_inference.confidence_set#

Simultaneous confidence sets and multiple hypothesis testing.

References

@article{storey2003statistical,
    title={Statistical significance for genomewide studies},
    author={Storey, John D and Tibshirani, Robert},
    journal={Proceedings of the National Academy of Sciences},
    volume={100},
    number={16},
    pages={9440--9445},
    year={2003},
    publisher={National Acad Sciences}
}

@article{romano2005stepwise,
    title={Stepwise multiple testing as formalized data snooping},
    author={Romano, Joseph P and Wolf, Michael},
    journal={Econometrica},
    volume={73},
    number={4},
    pages={1237--1282},
    year={2005},
    publisher={Wiley Online Library}
}

@techreport{mogstad2020inference,
    title={Inference for ranks with applications to mobility across neighborhoods and academic achievement across countries},
    author={Mogstad, Magne and Romano, Joseph P and Shaikh, Azeem and Wilhelm, Daniel},
    year={2020},
    institution={National Bureau of Economic Research}
}

Classes

`ConfidenceSetResults`(*args[, n_samples])	Results for simultaneous confidence sets.
`ConfidenceSet`(mean, cov[, X, endog_names, ...])	Model for simultaneous confidence sets.
`AverageComparison`(args, *kwargs)	Compare each parameter to the average value across all parameters.
`BaselineComparison`(args, baseline, *kwargs)	Compare parameters to a baseline parameter.
`PairwiseComparisonResults`(*args[, ...])	Results of pairwise comparisons.
`PairwiseComparison`(mean, cov[, X, ...])	Compute pairwise comparisons.
`MarginalRankingResults`(model, *args[, n_samples])	Marginal ranking results.
`MarginalRanking`(mean, cov[, X, endog_names, ...])	Estimate rankings with marginal confidence intervals.
`SimultaneousRankingResults`(model, *args[, ...])	Simultaneous ranking results.
`SimultaneousRanking`(mean, cov[, X, ...])	Estimate rankings with simultaneous confidence intervals.

class multiple_inference.confidence_set.AverageComparison(*args, **kwargs)[source]#

Compare each parameter to the average value across all parameters.

Subclasses ConfidenceSet.

Parameters

*args (Any) – Passed to ConfidenceSet.
**kwargs (Any) – Passed to ConfidenceSet.

Examples

import numpy as np
from multiple_inference.confidence_set import AverageComparison

x = np.arange(-1, 2)
cov = np.identity(3) / 10
model = AverageComparison(x, cov)
results = model.fit()
print(results.summary())

Confidence set results
================================================================
   coef (conventional) pvalue qvalue 0.95 CI lower 0.95 CI upper
----------------------------------------------------------------
x0              -1.000  0.000  0.000        -1.603        -0.397
x1               0.000  1.000  1.000        -0.603         0.603
x2               1.000  0.000  0.000         0.397         1.603
===============
Dep. Variable y
---------------

class multiple_inference.confidence_set.BaselineComparison(*args, baseline: Union[int, str], **kwargs)[source]#

Compare parameters to a baseline parameter.

Subclasses ConfidenceSet.

Parameters: baseline (Union[int, str]) – Index or name of the baseline parameter.

Examples

import numpy as np
from multiple_inference.confidence_set import BaselineComparison

x = np.arange(-1, 2)
cov = np.identity(3) / 10
model = BaselineComparison(x, cov, exog_names=["x0", "x1", "x2"], baseline="x0")
results = model.fit()
print(results.summary())

Confidence set results
================================================================
   coef (conventional) pvalue qvalue 0.95 CI lower 0.95 CI upper
----------------------------------------------------------------
x1               1.000  0.045  0.012         0.022         1.978
x2               2.000  0.000  0.000         1.022         2.978
===============
Dep. Variable y
---------------

class multiple_inference.confidence_set.ConfidenceSet(mean: Sequence[float], cov: ndarray, X: Optional[ndarray] = None, endog_names: Optional[str] = None, exog_names: Optional[Sequence[str]] = None, columns: Optional[Union[Sequence[int], Sequence[str], Sequence[bool]]] = None, sort: bool = False, random_state: int = 0)[source]#

Model for simultaneous confidence sets.

Examples

import numpy as np
from multiple_inference.confidence_set import ConfidenceSet

x = np.arange(-1, 2)
cov = np.identity(3) / 10
model = ConfidenceSet(x, cov)
results = model.fit()
print(results.summary())

Confidence set results
================================================================
   coef (conventional) pvalue qvalue 0.95 CI lower 0.95 CI upper
----------------------------------------------------------------
x0              -1.000  0.004  0.002        -1.755        -0.245
x1               0.000  1.000  1.000        -0.755         0.755
x2               1.000  0.004  0.002         0.245         1.755
===============
Dep. Variable y
---------------

print(results.test_hypotheses())

    param>0  param<0
x0    False     True
x1    False    False
x2     True    False

class multiple_inference.confidence_set.ConfidenceSetResults(*args: Any, n_samples: int = 10000, **kwargs: Any)[source]#

Results for simultaneous confidence sets.

Subclasses multiple_inference.base.ResultsBase.

Parameters: n_samples (int, optional) – Number of samples to draw when approximating the confidence set. Defaults to 10000.

test_hypotheses(alpha: float = 0.05, columns: Optional[Union[Sequence[int], Sequence[str], Sequence[bool]]] = None, two_tailed: bool = True) → Union[DataFrame, Series][source]#

Test the null hypothesis that the parameter is equal to 0.

Parameters

alpha (float, optional) – Significance level. Defaults to 0.05.
columns (ColumnsType, optional) – Selected columns. Defaults to None.
two_tailed (bool, optional) – Run two-tailed hypothesis tests. Set to False to run one-tailed hypothesis tests. Defaults to True.

Returns

Results dataframe (if two-tailed) or series: (if one-tailed).

Return type

Union[pd.DataFrame, pd.Series]

class multiple_inference.confidence_set.MarginalRanking(mean: Sequence[float], cov: ndarray, X: Optional[ndarray] = None, endog_names: Optional[str] = None, exog_names: Optional[Sequence[str]] = None, columns: Optional[Union[Sequence[int], Sequence[str], Sequence[bool]]] = None, sort: bool = False, random_state: int = 0)[source]#

Estimate rankings with marginal confidence intervals.

Subclasses ConfidenceSet.

Examples

import numpy as np
from multiple_inference.confidence_set import MarginalRanking

x = np.arange(-1, 2)
cov = np.diag([1, 2, 3]) / 10
model = MarginalRanking(x, cov)
results = model.fit()
print(results.summary())

                  Marginal ranking
=========================================================
   rank (conventional) pvalue 0.95 CI lower 0.95 CI upper
---------------------------------------------------------
x0               3.000    nan         2.000         3.000
x1               2.000    nan         1.000         3.000
x2               1.000    nan         1.000         2.000
===============
Dep. Variable y
---------------

class multiple_inference.confidence_set.MarginalRankingResults(model: MarginalRanking, *args: Any, n_samples: int = 10000, **kwargs: Any)[source]#: Marginal ranking results.

class multiple_inference.confidence_set.PairwiseComparison(mean: Sequence[float], cov: ndarray, X: Optional[ndarray] = None, endog_names: Optional[str] = None, exog_names: Optional[Sequence[str]] = None, columns: Optional[Union[Sequence[int], Sequence[str], Sequence[bool]]] = None, sort: bool = False, random_state: int = 0)[source]#

Compute pairwise comparisons.

Examples

import numpy as np
from multiple_inference.confidence_set import PairwiseComparison

x = np.arange(-1, 2)
cov = np.identity(3) / 10
model = PairwiseComparison(x, cov)
results = model.fit()
print(results.summary())

Pairwise comparisons
======================================================================
        delta (conventional) pvalue qvalue 0.95 CI lower 0.95 CI upper
----------------------------------------------------------------------
x1 - x0                1.000  0.061  0.017        -0.031         2.031
x2 - x0                2.000  0.000  0.000         0.969         3.031
x2 - x1                1.000  0.061  0.017        -0.031         2.031
===============
Dep. Variable y
---------------

print(results.test_hypotheses())

       x0     x1     x2
x0  False  False   True
x1  False  False  False
x2  False  False  False

This means that parameter x2 is significantly greater than x0.

class multiple_inference.confidence_set.PairwiseComparisonResults(*args: Any, n_samples: int = 10000, groups: Optional[ndarray] = None, **kwargs: Any)[source]#

Results of pairwise comparisons.

Subclasses ConfidenceSetResults.

Parameters

n_samples (int, optional) – Number of samples to draw to obtain critical values. Defaults to 10000.
groups (np.ndarray, optional) – (# params,) array of parameter groups. Defaults to None.

Raises

ValueError – Length of groups must match the number of parameters.

hypothesis_heatmap(*args: Any, title: Optional[str] = None, axes=None, triangular: bool = False, **kwargs: Any)[source]#

Create a heatmap of pairwise hypothesis tests.

Parameters

title (str, optional) – Title.
axes (Union[AxesSubplot, Sequence[AxesSubplot]], optional) – Axes to write on. Defaults to None.
triangular (bool, optional) – Display the results in a triangular (as opposed to square) output. Usually, you should set this to True if and only if your columns are sorted. Defaults to False.

Returns

Array of axes.

Return type

np.ndarray

test_hypotheses(alpha: float = 0.05, columns: ColumnsType = None, criterion: str = 'fwer', groups: Sequence = None, wide: bool = True) → Union[pd.DataFrame, dict[Any, pd.DataFrame]][source]#

Test pairwise hypotheses.

Parameters

alpha (float, optional) – Significance level. Defaults to .05.
columns (ColumnsType, optional) – Selected columns. In wide format, these are the original column names (e.g., “x0”). In long format, these are the names of the differences (e.g., “x1 - x0”). Defaults to None.
criterion (str, optional) – “fwer” to control for the family-wise error rate (using pvalues), “fdr” to control for the false discovery rate (using qvalues). Defaults to “fwer”.
groups (Sequence, optional) –
wide (bool, optional) – Return the results is wide (square) format. Ignored when controlling for the false discovery rate. Defaults to True.

Raises

ValueError – criterion must be one of “fwer” or “fdr”.

Returns

Results dataframe if only one: group is used, mapping of group name to results dataframe if multiple groups are used.

Return type

Union[pd.DataFrame, dict[Any, pd.DataFrame]]

Notes

When controlling for the familywise error rate, the null hypotheses are $$mu_k leq mu_j$$. When controlling for the false discovery rate, the null hypotheses are $$mu_k = mu_j$$.

class multiple_inference.confidence_set.SimultaneousRanking(mean: Sequence[float], cov: ndarray, X: Optional[ndarray] = None, endog_names: Optional[str] = None, exog_names: Optional[Sequence[str]] = None, columns: Optional[Union[Sequence[int], Sequence[str], Sequence[bool]]] = None, sort: bool = False, random_state: int = 0)[source]#

Estimate rankings with simultaneous confidence intervals.

Subclasses ConfidenceSet.

Examples

import numpy as np
from multiple_inference.confidence_set import SimultaneousRanking

x = np.arange(3)
cov = np.identity(3) / 10
model = SimultaneousRanking(x, cov)
results = model.fit()
print(results.summary())

                   Simultaneous ranking
=========================================================
   rank (conventional) pvalue 0.95 CI lower 0.95 CI upper
---------------------------------------------------------
x0               3.000    nan         2.000         3.000
x1               2.000    nan         1.000         3.000
x2               1.000    nan         1.000         2.000
===============
Dep. Variable y
---------------

print(results.compute_best_params())

x0    False
x1    False
x2     True
dtype: bool

This we can be 95% confident that the best (largest) parameter is x2.

class multiple_inference.confidence_set.SimultaneousRankingResults(model: SimultaneousRanking, *args: Any, n_samples: int = 10000, **kwargs: Any)[source]#

Simultaneous ranking results.

compute_best_params(n_best_params: int = 1, alpha: float = 0.05, superset: bool = True) → Series[source]#

Compute the set of best (largest) parameters.

Find the set of parameters such that the truly best n_best_params parameters are in this set with probability 1-alpha. Or, find the set of parameters such that these parameters are in the truly best n_best_params parameters with probability 1-alpha.

Parameters

n_best_params (int, optional) – Number of best parameters. Defaults to 1.
alpha (float, optional) – Significance level. Defaults to 0.05.
superset (bool, optional) – Indicates that the returned set is a superset of the truly best n parameters. If False, the returned set is a subset of the truly best n parameters. Defaults to True.

Returns

Indicates which parameters are in the selected set.

Return type

pd.Series