Skip to main content

The central limit theorem (CLT) is a fundamental result in probability and statistics, stating that the average of many independent variables is approximately Gaussian. The CLT underpins numerous and widely-used data analysis methods for estimation, hypothesis testing, constructing confidence intervals, and uncertainty assessment. However, the accuracy of the CLT approximation may degrade significantly in high-dimensional data problems. To address this challenge, a growing body of literature has recently emerged aimed at developing CLT bounds to support valid statistical inference in high dimensions. In this talk, I will introduce a novel and near-rate-optimal CLT for hyper-rectangles that holds under minimal conditions. As an application, I will examine ordinary least squares regression in high-dimensional and model-free settings commonly encountered in data science. I will present bounds for the Gaussian approximation error of the ordinary least squares estimator, yielding practical confidence sets with guaranteed coverage and accuracy. Our results highlight the dependence on the dimensionality and other characteristics of the data-generating distribution, enabling highdimensional and efficient inference with off-the-shelf methods.

Joint work with Heejong Bong, Arun Kumar Kuchibhotla, and Larry Wasserman.

Bio: I am a Professor in the Department of Statistics and Data Science at Carnegie Mellon University. I received my PhD in Statistics at Carnegie Mellon in 2005 under the supervision of Stephen E. Fienberg and never left. My research interests revolve mainly around the theoretical properties of statistical and machine learning models for high-dimensional data under various structural assumptions, such as sparsity or intrinsic low dimensionality. In my research work I have investigated a broad range of issues related to the feasibility of statistical inference in a variety of problems, including high-dimensional regression, time series, privacy, categorical data analysis and graphical modeling, statistical network analysis, algebraic statistics, density-based clustering, topological and geometric data analysis and change-point detection. Though predominantly of theoretical nature, my research work is motivated by highly practical problems in data science and has direct methodological implications, as it aims at deriving theoretical guarantees in support of existing methods and algorithms.

arrow-left-smallarrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-long-yellowarrow-right-smallfacet-arrow-down-whitefacet-arrow-downCheckedCheckedlink-outmag-glass