# Study Linear Regression in Detail with Research Papers Part 1 (Artificial Intelligence) | by Monodeep Mukherjee | October 2022

- Adaptive selection of greedy forward variables for linear regression models with incomplete data using multiple imputation
**(arXiv)**

**Author :** Yong Shiuan Lee

**Summary : **Variable selection is crucial for parsimonious modeling in the age of Big Data. Missing values are common in data and complicate variable selection. The multiple imputation (MI) approach results in multiple imputed datasets for missing values and has been widely applied in various variable selection procedures. However, performing variable selection directly on the MI dataset or bootstrapped MI data may not be attractive in terms of computational cost. To quickly identify the active variables in the linear regression model, we propose the adaptive grafting procedure with three pooling rules on the MI data. The proposed methods proceed iteratively, which first finds the active variables based on the complete subset of cases, and then expands the working data matrix with both the number of active variables and the available observations. A comprehensive simulation study shows the selection accuracy in different aspects and the computational efficiency of the proposed methods. Two concrete examples illustrate the strength of the proposed methods.

**2. **Quantum Communication Complexity of Linear Regression**(****arXiv)**

**Author: A.**Shley Montanaro, Changpeng Shao

**Summary :** Dequantized algorithms show that quantum computers do not have exponential speedups for many linear algebra problems in terms of time and query complexity. In this work, we show that quantum computers can have exponential speedups in communication complexity for some fundamental linear algebra problems. We mainly focus on linear regression solving and Hamiltonian simulation. In the quantum case, the task is to prepare the quantum state of the result. To allow a fair comparison, in the classical case, the task is to sample from the result. We investigate these two problems in bipartite and multipartite models, propose near-optimal quantum protocols, and prove quantum/classical lower bounds. In this process, we propose an efficient quantum protocol for quantum singular value transformation, which is a powerful technique for designing quantum algorithms. As a result, for many linear algebra problems where quantum computers lose exponential speedups in query time and complexity, it is possible to have exponential speedups in communication complexity.

**3. **Note on centering in selecting subsamples for linear regression**(arXiv)**

**Author : **Hai Ying Wang

**Summary : **Centering is a common technique used in linear regression analysis. With data centered on responses and covariates, the ordinary least squares estimator of the slope parameter can be calculated from a model without the intercept. If a subsample is selected from centered complete data, the subsample is usually uncentered. In this case, is it still appropriate to fit a model without the intercept? The answer is yes, and we show that the least squares estimator on the slope parameter obtained from a model without the intercept is unbiased and has a variance covariance matrix plus smaller in Loewner order than that obtained from a model with the y-intercept. We further show that for non-informative weighted subsampling when a weighted least squares estimator is used, using the weighted means of the full data to shift the subsample improves estimation efficiency.