What is dimensionality reduction? – TechTalks
In partnership with Paper space
Machine learning algorithms have become famous for their ability to extract relevant information from datasets with many features, such as tables with tens of rows and images with millions of pixels. Thanks to advancements in cloud computing, you can often run very large machine learning models without noticing the computing power running in the background.
But every new feature you add to your problem adds to its complexity, making it harder to solve with machine learning algorithms. Data scientists use dimensionality reduction, a set of techniques that remove excessive and irrelevant functionality from their machine learning models.
Reducing dimensionality reduces the costs of machine learning and can sometimes solve complicated problems with simpler models.
The curse of dimensionality
Machine learning models relate functionality to results. For example, suppose you want to create a model that predicts the amount of precipitation in a month. You have a dataset of different information collected in different cities in different months. Data points include temperature, humidity, city population, traffic, number of concerts held in the city, wind speed, wind direction, air pressure, number of bus tickets purchased and the amount of precipitation. Obviously, not all of this information is relevant for forecasting precipitation.
Some functionality may have nothing to do with the target variable. Obviously, the population and the number of purchased bus tickets does not affect precipitation. Other characteristics may be correlated with the target variable, but not have a causal relationship with it. For example, the number of outdoor concerts may be correlated with the amount of precipitation, but it is not a good predictor of rain. In other cases, such as carbon emission, there may be a relationship between the characteristic and the target variable, but the effect will be negligible.
In this example, it is obvious which features are useful and which are not. in other problems, excessive characteristics may not be obvious and require further analysis of the data.
But why bother to remove the extra dimensions? When you have too many features, you will also need a more complex model. A more complex model means that you will need a lot more training data and more computing power to train your model to an acceptable level.
And because machine learning does not understand causation, models try to map any characteristic included in their data set to the target variable, even if there is no causal relationship. This can lead to inaccurate and erroneous models.
On the other hand, reducing the number of features can make your machine learning model simpler, more efficient, and less data intensive.
Problems caused by too many features are often referred to as “the curse of dimensionality,” and they are not limited to tabular data. Consider a machine learning model that categorizes images. If your dataset is made up of 100 × 100 pixel images, then your problem space has 10,000 features, one per pixel. However, even in image classification problems, some features are excessive and may be suppressed.
Dimensionality reduction identifies and removes features that adversely affect the performance of the machine learning model or do not contribute to its accuracy. There are several dimensionality techniques, each useful for certain situations.
A basic and very efficient method of dimensionality reduction is to identify and select a subset of the most relevant characteristics for the target variable. This technique is called “feature selection”. Selecting features is particularly effective when dealing with tabular data where each column represents a specific type of information.
When selecting characteristics, data scientists do two things: keep the characteristics strongly correlated with the target variable, and contribute the most to the variance of the dataset. Libraries like Python’s Scikit-learn have many good functions for analyzing, visualizing, and selecting the right functionality for machine learning models.
For example, a data scientist can use point clouds and heat maps to visualize the covariance of different characteristics. If two characteristics are strongly correlated to each other, they will have a similar effect on the target variable, and including them both in the machine learning model will be unnecessary. Therefore, you can remove any of them without causing negative impact on model performance.
The same tools can help visualize correlations between characteristics and the target variable. This allows you to remove variables that do not affect the target. For example, you might find that out of 25 characteristics in your dataset, seven of them account for 95% of the effect on the target variable. This will allow you to remove 18 features and make your machine learning model much simpler without significantly penalizing your model accuracy.
Sometimes you don’t have the option to remove individual features. But that doesn’t mean you can’t simplify your machine learning model. Projection techniques, also known as “feature extraction”, simplify a model by compressing multiple features into a smaller dimensional space.
A common example used to represent projection techniques is the “swiss roll” (shown below), a set of data points that swirl around a three-dimensional focal point. This dataset has three characteristics. The value of each point (the target variable) is measured based on its proximity along the winding path to the center of the Swiss roll. In the image below, the red dots are closer to the center and the yellow dots are further along the roll.
In its current state, creating a machine learning model that maps the characteristics of Swiss points to their value is a difficult task and would require a complex model with many parameters. But using dimensionality reduction techniques, points can be projected into a lower dimensional space that can be learned with a simple machine learning model.
There are different projection techniques. In the case of the example above, we used “locally linear integration”, an algorithm that reduces the space dimension of the problem while preserving the key elements that separate the values from the data points. When our data is processed with the LLE, the result looks like the following image, which is like an unwound version of the Swiss roll. As you can see, the dots of each color stay together. In fact, this problem can still be simplified into a single feature and modeled with linear regression, the simplest machine learning algorithm.
Although this example is hypothetical, you will often be faced with problems that can be simplified if you project the entities into a lower dimensional space. For example, “Principal Component Analysis” (PCA), a popular dimensionality reduction algorithm, has found many useful applications to simplify machine learning problems.
In the excellent book Convenient Machine Learning with Python, data scientist Aurelien Geron shows how to use PCA to reduce the MNIST dataset from 784 features (28 x 28 pixels) to 150 features while preserving 95% of the variance. This level of dimensionality reduction has a huge impact on the costs of training and operating artificial neural networks.
There are a few caveats to consider regarding projection techniques. Once you have developed a projection technique, you need to transform the new data points into lower dimensional space before running them in your machine learning model. However, the costs of this preprocessing step are not comparable to the benefits of having a lighter model. A second consideration is that transformed data points are not directly representative of their original characteristics and converting them back to the original space can be tricky and in some cases impossible. This can make it difficult to interpret the inferences made by your model.
Dimensionality reduction in the machine learning toolbox
Having too many features will make your model inefficient. But cutting by removing too many features won’t help either. Dimensionality reduction is one of the many tools data scientists can use to build better machine learning models. And as with any tool, they should be used with caution and care.