# The Sparse Hausdorff Moment Problem, with Application to Topic Models

### Abstract

We consider the problem of identifying, from its first m noisy moments, a probability distribution on $[0,1]$ of support $k<\infty$. This is equivalent to the problem of learning a distribution on m observable binary random variables $X_1,X_2,\ldots,X_m$ that are iid conditional on a hidden random variable $U$ taking values in ${1,2,\ldots,k}$. Our focus is on accomplishing this with $m=2k$, which is the minimum $m$ for which verifying that the source is a $k$-mixture is possible (even with exact statistics). This problem, so simply stated, is quite useful: e.g., by a known reduction, any algorithm for it lifts to an algorithm for learning pure topic models.

We give an algorithm for identifying a $k$-mixture using samples of $m=2k$ iid binary random variables using a sample of size $(1/w_{\min})2 \cdot (1/\zeta)\mathcal{O}(k)$ and post-sampling runtime of only $\mathcal{O}(k^2+o(1))$ arithmetic operations. Here $w_\min$ is the minimum probability of an outcome of $U$, and $\zeta$ is the minimum separation between the distinct success probabilities of the $X_i$s. Stated in terms of the moment problem, it suffices to know the moments to additive accuracy $w_{\min} \cdot \zeta \mathcal{O}(k)$. It is known that the sample complexity of any solution to the identification problem must be at least exponential in $k$. Previous results demonstrated either worse sample complexity and worse $\mathcal{O}(kc)$ runtime for some $c$ substantially larger than 2, or similar sample complexity and much worse $k \mathcal{O}(k^2)$ runtime.

##### Bijan Mazaheri
###### Ph.D Candidate in Computing and Mathematical Sciences

I am a Ph.D candidate at Caltech. My interests include mixture models, high level data fusion, and stability to distribution shift - usually through the lense of causality.