# How to Choose ML Algorithms for Regression Problems?

There’s this buzz everywhere – Machine Learning!

So, what is this “Machine Learning(ML)?”

Let’s consider a practical example. If you could imagine the probability of the outcome of a task done for the first time—Let’s say the job is to learn to ride a car. That is to say, how would you feedback yourself?. With uncertainty?

On the other hand, how would you like to pat yourself for the same task after a couple of years of practice? Probably you would have your mindset transitioned from the uncertainty parameter or a more certain one. So, how did you got that expertise in the task?

Most likely, you got experience by tweaking some parameters, and your performance improved. Right? This is Machine Learning.

A computer program is said to learn from experience(E) on some tasks (T)to give the best performing result(P).

In the same vein, machines learn by some complex mathematics concepts, and every data for them is in the form of 0 and 1. As a result, we don’t code the logic for our program; instead, we want a machine to figure out logic from the data on its own.

Furthermore, if you want to find the relation between experience, job level, rare skill and salary then you need to teach machine learning algorithms.

According to this case study, you need to tweak the features to get the labels. But, you do not code the Algorithm, and your focus should be on the data.

Therefore, the concept is **Data + Algorithm = Insights**. Secondly, Algorithms are already developed for us, and we need to know which algorithm to use for solving our problems. Let’s take a look at the regression problem and the best way to choose an algorithm.

## The Machine Learning Overview

According to Andreybu, a German scientist with more than 5 years of the machine learning experience, “If you can understand whether the machine learning task is a regression or classification problem then choosing the right algorithm is a piece of cake.”

To enumerate, the main difference between them is that the output variable in the regression is numerical (or continuous) whereas that for classification is categorical (or discrete).

### Regression in Machine Learning

To start with, the regression algorithms attempt to estimate the mapping function (f) from the input variables (x) to numerical or continuous output variables (y). Now, the output variable could be a real value, which can be an integer or a floating point value. Therefore, the regression prediction problems are usually quantities or sizes.

For example, if you are provided with a dataset about houses, and you are asked to predict their prices, that is a regression task because the price will be a continuous output.

Examples of the common regression algorithms include linear regression, Support Vector Regression (SVR), and regression trees.

### Classification in Machine Learning

By contrast, in the case of classification algorithms, y is a category that the mapping function predicts. To elaborate, for single or several input variables, a classification model will attempt to predict the value of a single or several conclusions.

For instance, if you are provided with a dataset about houses, a classification algorithm can try to predict whether the prices for the houses “sell more or less than the recommended retail price.” Here the two discrete categories: above or below the said price.

Examples of the common classification algorithms include logistic regression, Naïve Bayes, decision trees, and K Nearest Neighbors.

## Choosing the Right Algorithms

### Understand Your Data

- Take a look at the summary statistics
- Use the ‘Percentile’ parameter to identify the ranges of the data
- Averages and medians describe the central tendency
- Correlations can indicate strong relationships

### Visualize the Data

- Box plots can indicate exceptions.
- Density plots and histograms show the spread of data
- Scatter plots can describe quantity relationships

### Clean the Data

- Deal with a missing value. The result is subjected to give sensitive outcomes in the case (missing data for certain variables can result in inaccurate predictions)
- Although tree models are less sensitive to the presence of outliers, regressive models or other models that uses equations are more sensitive to exceptions
- Basically, outliers could be the result of bad data collection, or they could be legitimate extreme values

### Curate the Data

Furthermore, while converting the raw data to a polished one compliant to the models, one must take care of the following :

- Make the data easier to interpret.
- Capture more complex data.
- Focus on reducing data redundancy and dimensionality.
- Normalize the variable values.

### Categorize the Problem Through Input Variable

- You have labeled data; it’s a supervised learning problem.
- If you have unlabelled data and want to find structure, it’s an unsupervised learning problem.
- In case you want to optimize an objective function by interacting with an environment, it’s a reinforcement learning problem.

### Categorize the Problem Through Output Variable

- The output of your model is a number; it’s a regression problem.
- When the output of your model is a class, then it’s a classification problem.
- The output of your model is a set of input groups; it’s a clustering problem.

### The constraint factor

- Take a note of the storage capacity as it varies for various models.
- Does the prediction have to be fast? For instance, in real time scenarios like the classification of road signs be as fast as possible to avoid accidents.

### Finally, Find the Algorithm

Now that you have a clear picture of your data, you could implement proper tools to choose the right algorithm.

Meanwhile, for a better decision, here is a checklist of the factors for you:

- See if the model aligns to your business goal
- How much pre-processing the model requires
- Check the accuracy of the model
- How explanable the model is
- How fast the model is: How long does it take to build a model, and how long does the model take to make predictions
- The scalability of the model

To add to, one must pay attention to the complexity of the algorithm while choosing.

Generally speaking, you could measure the complexity of the model using the parameters:

- When it requires two or more than ten features to learn and predict the target
- It relies on more complex feature engineering (e.g., using polynomial terms, interactions, or principal components)
- When the scenario has more computational overhead (e.g., a single decision tree vs. a random forest of 100 trees)

Besides, the same algorithm can be made more complex manually. It purely depends on the number of parameters indulged and the scenario under consideration. For instance, you could design a regression model with more features or polynomial terms and interaction terms. Or, you could design a decision tree with less depth.

## The Common Machine Learning Algorithms

### Linear Regression

These are probably the simplest ones.

Few of the examples where linear regression is used are:

- Firstly, when it’s time to go one location to another
- Predicting sales of a particular product next month
- Impact of blood alcohol content on coordination
- Predict monthly gift card sales and improve yearly revenue projections

### Logistic Regression

Apparently, there are a lot of advantages to this algorithm—integration of more features with a nice interpretation facility, easy updating facility to annex new data.

To put it differently, you could use this for:

- Predicting customer churning.
- The particular case of credit scoring or fraud detection.
- Measuring the effectiveness of marketing campaigns.

### Decision Trees

Apparently, single trees are used rarely, but in composition, with many others, they build efficient algorithms such as Random Forest or Gradient Tree Boosting. However, one of the disadvantages is they don’t support online learning, so you have to rebuild your tree when new examples come on.

Trees are excellent for:

- Investment decisions
- Bank Loan defaulters
- Sales lead qualifications

### Naive Bayes

Most importantly, Naive Bayes is a right choice when CPU and memory resources are a limiting factor. However, Its main disadvantage is that it can’t learn interactions between features.

It can be used for:

- Face Recognition
- To mark an email as spam or not.
- Sentiment analysis and text classification.

**Conclusion**

Therefore, generally speaking, in a real-time scenario, it is somewhat hard to under the right machine learning algorithm for the purpose. However, you could use this checklist to shortlist a few algorithms at your convenience.

Moreover, opting for the right solution to a real-life problem requires expert business understanding along with the right algorithm. So, teach your data into the right algorithms, run them all in either parallel or serial, and at the end evaluate the performance of the algorithms to select the best one(s).

If you are looking to specialize in deep learning, then you may check out this course by deep learning.