Comparison of the three characteristics selection strategies, improving the machine learning style is not a problem

What is feature selection? There are always a lot of irrelevant things to do when solving problems, then we need to find their key features for clear modeling. There are a lot of data problems that accompany this problem, which are sometimes redundant or not relevant. Feature selection is such a research field that attempts to select important features through algorithms.

Why not throw all the features directly into the machine learning model?

There are no open source datasets in the real world, and there are no more information related to the problem. Feature selection helps you maximize feature correlation while reducing non-correlation, increasing the likelihood of building a better model and reducing the overall size of the model.

Top-level feature selection method

For example, we want to predict the fare trend of the water park; for this we decided to check the weather data, ice cream sales, coffee sales and seasonal conditions.

From the table below we can see that the tickets for the summer are obviously better than the other seasons, but not for the winter. Coffee sales are stable throughout the year, and ice cream is sold during the year, but the peak season is June.

Table 1: The fictional data used in the text.

Comparison of the three characteristics selection strategies, improving the machine learning style is not a problem

Figure 1: Graphical comparison of various fictional data.

We want to predict water park fares, but it is likely that we don't need all the data to get the best results. The data has N dimensions and the K value gives the best results. But there are a lot of bonds between subsets of different sizes.

Our goal is to reduce the number of dimensions without losing predictive power. Let's take a step back and look at the tools we can use.

Exhaustive search

This technology is 100% guaranteed to find the best possible features to build a model. We think it's very feasible because it will search through all possible combinations of features and find the combination that returns the lowest point of the model.

In our example there are 15 possible combinations of features to search. I use the formula (2^n-1) to calculate the number of combinations. This method works when the number of features is small, but it is not feasible if you have 3000 features.

Fortunately, there is a slightly better way to use it.

Random feature selection

In most cases, random feature selection works well. If you want to reduce the number of features by 50%, simply select 50% of the features and delete them.

After the model is completed, verify the performance of the model and repeat the process until you are satisfied. Unfortunately, this is still a brute force approach.

What should I do when I need to deal with a large feature set and not scale down?

Minimum redundancy maximum correlation feature selection

Combine all the ideas to get our algorithm, the mRMR feature selection. The consideration behind the algorithm is to minimize the redundancy of features and maximize the association of features. Therefore, we need to calculate the redundancy and associated equations:

Comparison of the three characteristics selection strategies, improving the machine learning style is not a problem

Let's write a quick script to create mRMR with fictitious data:

Comparison of the three characteristics selection strategies, improving the machine learning style is not a problem

I don't have any expectations for the results. The sales of ice cream seem to be able to accurately model the volume of tickets, and the temperature is not. In this example, it seems that only one variable is needed to accurately model the ticket sales, but this is certainly not the case in actual problems.

DMX Controller

MA Black Horse DMX Controller lighting console


Technical Parameter
1.Intel core 3 generation processor Inter(R) core (TM) i5-3380M CPU
2. 120 GB solid state disk, 8 GB memory, corn I5 motherboard
3.standard 6 DMX output ports and MIDI interfaces, 3072 DMX channels
4. Built-in two 19-inch high-definition touch screens

5.21 program playback putter, 42 program storage function keys
6. 1 main control dimming wheel, 4 attribute coding wheel
7. 1 mian control putter, 2 AB putter
8 Hydraulic screen Angle adjustment support structure
9.size: 82*680*130mm, G, weight: about 56KG with flycase


Our company have 13 years experience of LED Display and Stage Lights , our company mainly produce Indoor Rental LED Display, Outdoor Rental LED Display, Transparent LED Display,Indoor Fixed Indoor LED Display, Outdoor Fixed LED Display, Poster LED Display , Dance LED Display ... In additional, we also produce stage lights, such as beam lights Series, moving head lights Series, LED Par Light Series and

Controller Series,DMX Controller,Console,DMX Console

Guangzhou Chengwen Photoelectric Technology co.,ltd , https://www.cwstagelight.com

Posted on