# Optimal testing of discrete distributions with high probability

@article{Diakonikolas2020OptimalTO, title={Optimal testing of discrete distributions with high probability}, author={Ilias Diakonikolas and Themis Gouleakis and Daniel M. Kane and John Peebles and Eric Price}, journal={Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing}, year={2020} }

We study the problem of testing discrete distributions with a focus on the high probability regime. Specifically, given samples from one or more discrete distributions, a property P, and parameters 0< є, δ <1, we want to distinguish with probability at least 1−δ whether these distributions satisfy P or are є-far from P in total variation distance. Most prior work in distribution testing studied the constant confidence case (corresponding to δ = Ω(1)), and provided sample-optimal testers for a… Expand

#### 3 Citations

The Price of Tolerance in Distribution Testing

- Computer Science, Mathematics
- ArXiv
- 2021

We revisit the problem of tolerant distribution testing. That is, given samples from an unknown distribution p over {1, . . . , n}, is it ε1-close to or ε2-far from a reference distribution q (in… Expand

The Sample Complexity of Robust Covariance Testing

- Computer Science, Mathematics
- COLT
- 2021

The main result is that the sample complexity of covariance testing dramatically increases in the contaminated setting, and a sample complexity lower bound of Ω(d) for ǫ an arbitrarily small constant and γ = 1/2 is proved. Expand

Near-optimal learning of tree-structured distributions by Chow-Liu

- Computer Science, Mathematics
- STOC
- 2021

The upper bound is based on a new conditional independence tester that addresses an open problem posed by Canonne, Diakonikolas, Kane, and Stewart (STOC, 2018): it is proved that for three random variables X,Y,Z each over Σ, testing if I(X; Y ∣ Z) is 0 or ≥ ε is possible with O(|Σ|3/ε) samples. Expand

#### References

SHOWING 1-10 OF 56 REFERENCES

A New Approach for Testing Properties of Discrete Distributions

- Computer Science, Mathematics
- 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS)
- 2016

The sample complexity of the algorithm depends on the structure of the unknown distributions - as opposed to merely their domain size - and is significantly better compared to the worst-case optimal L1-tester in many natural instances. Expand

Optimal Algorithms for Testing Closeness of Discrete Distributions

- Computer Science, Mathematics
- SODA
- 2014

This work presents simple testers for both the e1 and e2 settings, with sample complexity that is information-theoretically optimal, to constant factors, and establishes that the sample complexity is Θ(max{n2/3/e4/3, n1/2/&epsilon2}. Expand

Near-Optimal Closeness Testing of Discrete Histogram Distributions

- Computer Science, Mathematics
- ICALP
- 2017

A new algorithm for testing the equivalence between two discrete histograms and a nearly matching information-theoretic lower bound are investigated, improving on previous work by polynomial factors in the relevant parameters. Expand

Optimal Testing for Properties of Distributions

- Computer Science, Mathematics
- NIPS
- 2015

This work provides a general approach via which sample-optimal and computationally efficient testers for discrete log-concave and monotone hazard rate distributions are obtained. Expand

Optimal Algorithms and Lower Bounds for Testing Closeness of Structured Distributions

- Mathematics, Computer Science
- 2015 IEEE 56th Annual Symposium on Foundations of Computer Science
- 2015

This work designs a sample optimal and computationally efficient algorithm for testing the equivalence of two unknown univariate distributions under the Ak-distance metric, and yields new, simple L1 closeness testers, in most cases with optimal sample complexity, for broad classes of structured distributions. Expand

Testing k-Modal Distributions: Optimal Algorithms via Reductions

- Computer Science, Mathematics
- SODA
- 2013

A new reduction-based approach for distribution-testing problems that lets us obtain all the above results in a unified way and enables us to transform various distribution testing problems for k-modal distributions over {1,..., n} to the corresponding distributionTesting problems for unrestricted distributions over a much smaller domain {1..., e} where e = O(k log n). Expand

Testing Ising Models

- Computer Science, Mathematics
- SODA
- 2018

It is demonstrated that, in this structured setting, this study of distribution testing on structured multivariate distributions can avoid the curse of dimensionality, obtaining sample and time efficient testers for independence and goodness-of-fit. Expand

Sample-Optimal Identity Testing with High Probability

- Mathematics, Computer Science
- Electron. Colloquium Comput. Complex.
- 2017

The new upper and lower bounds show that the optimal sample complexity of identity testing is $\Theta\left( \frac{1}{\epsilon^2}\left(\sqrt{n \log(1/\delta)} + \log (1/ \delta) \right)\right) for any $n, \ep silon$, and $\delta$. Expand

Testing Conditional Independence of Discrete Distributions

- Computer Science, Mathematics
- 2018 Information Theory and Applications Workshop (ITA)
- 2018

This work studies the problem of testing conditional independence for discrete distributions and develops a general theory providing tight variance bounds for specific estimators of this form, up to constant factors, for all such estimators. Expand

Testing Conditional Independence of Discrete Distributions

- Mathematics, Computer Science
- ITA
- 2018

The main algorithmic result of this work is the first conditional independence tester with sublinear sample complexity for discrete distributions over[n], and a general theory providing tight variance bounds for all such estimators is developed. Expand