Issue Downloads
Identification and Semiparametric Efficiency Theory of Nonignorable Missing Data with a Shadow Variable
We consider identification and estimation with an outcome missing not at random (MNAR). We study an identification strategy based on a so-called shadow variable. A shadow variable is assumed to be correlated with the outcome but independent of the ...
HighlightsProblem statement
Missingness not at random (MNAR) arises in many empirical studies in biomedical, socioeconomic, and epidemiological researches. A fundamental problem of MNAR is the identification problem, that is, the parameter of interest ...
Optimistic Rates: A Unifying Theory for Interpolation Learning and Regularization in Linear Regression
We study a localized notion of uniform convergence known as an “optimistic rate” [34, 39] for linear regression with Gaussian data. Our refined analysis avoids the hidden constant and logarithmic factor in existing results, which are known to be crucial ...
HighlightsProblem Statement
Generalization theory proposes to explain the ability of machine learning models to generalize to fresh examples by bounding the gap between the test error (error on new examples) and training error (error on the data they ...
Language Models in the Loop: Incorporating Prompting into Weak Supervision
We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Rather than apply the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions ...
HighlightsProblem statement
The goal of this paper is to use large language models to create smaller, specialized models. These specialized models can be better suited to specific tasks because they are tuned for them and are less expensive to serve in ...
Principal Component Networks: Utilizing Low-Rank Activation Structure to Reduce Parameters Early in Training
Recent works show that overparameterized neural networks contain small subnetworks that exhibit comparable accuracy to the full model when trained in isolation. These results highlight the potential to reduce the computational costs of deep neural network ...
HighlightsProblem Statement
Many recent results show that large neural networks can lead to improved generalization. Yet, training these large models comes with increased computational costs. In an effort to address this issue, several works have show ...