statsmodels.stats.gof.gof_binning_discrete

statsmodels.stats.gof.gof_binning_discrete(rvs, distfn, arg, nsupp=20)[source]

get bins for chisquare type gof tests for a discrete distribution

Parameters:
  • rvs (ndarray) – sample data

  • distname (str) – name of distribution function

  • arg (sequence) – parameters of distribution

  • nsupp (int) – number of bins. The algorithm tries to find bins with equal weights. depending on the distribution, the actual number of bins can be smaller.

Returns:

  • freq (ndarray) – empirical frequencies for sample; not normalized, adds up to sample size

  • expfreq (ndarray) – theoretical frequencies according to distribution

  • histsupp (ndarray) – bin boundaries for histogram, (added 1e-8 for numerical robustness)

Notes

The results can be used for a chisquare test

(chis,pval) = stats.chisquare(freq, expfreq)

originally written for scipy.stats test suite, still needs to be checked for standalone usage, insufficient input checking may not run yet (after copy/paste)

refactor: maybe a class, check returns, or separate binning from

test results

todo :

optimal number of bins ? (check easyfit), recommendation in literature at least 5 expected observations in each bin