Identification of Mixtures of Discrete Product Distributions in Near-Optimal Sample and Time Complexity

Spencer Gordon, Eric Jahn, Bijan Mazaheri, Yuval Rabani, Leonard Schulman

June, 2024

Abstract

We consider the problem of identifying, from statistics, a distribution of discrete random variables $X_{1}, \dots, X_{n}$ that is a mixture of $k$ product distributions. The best previous sample complexity for $n \in O (k)$ was $(1 / ζ)^{O (k^{2} \log k)}$ (under a mild separation assumption parameterized by $ζ$ ). The best known lower bound was $\exp (Ω (k))$ . It is known that $n \geq 2 k - 1$ is necessary and sufficient for identification. We show, for any $n \geq 2 k - 1$ , how to achieve sample complexity and run-time complexity $(1 / ζ)^{O (k)}$ . We also extend the known lower bound of $e^{Ω (k)}$ to match our upper bound across a broad range of $ζ$ . Our results are obtained by combining (a) a classic method for robust tensor decomposition, (b) a novel way of bounding the condition number of key matrices called Hadamard extensions, by studying their action only on flattened rank-1 tensors.

Type

Conference paper

Publication

In The 37th Annual Conference on Learning Theory

-Mixture Models