Let’s consider a practical example. If you could imagine the probability of the outcome of a task done for the first time—Let’s say the job is to learn to ride a car. That is to say, how would you feedback yourself?. With uncertainty?
On the other hand, how would you like to pat yourself for the same task after a couple of years of practice? Probably you would have your mindset transitioned from the uncertainty parameter or a more certain one. So, how did you got that expertise in the task?
Most likely, you got experience by tweaking some parameters, and your performance improved. Right? This is Machine Learning.
A computer program is said to learn from experience(E) on some tasks (T)to give the best performing result(P).
In the same vein, machines learn by some complex mathematics concepts, and every data for them is in the form of 0 and 1. As a result, we don’t code the logic for our program; instead, we want a machine to figure out logic from the data on its own.
Furthermore, if you want to find the relation between experience, job level, rare skill and salary then you need to teach machine learning algorithms.
According to this case study, you need to tweak the features to get the labels. But, you do not code the Algorithm, and your focus should be on the data.
Therefore, the concept is Data + Algorithm = Insights. Secondly, Algorithms are already developed for us, and we need to know which algorithm to use for solving our problems. Let’s take a look at the regression problem and the best way to choose an algorithm.
The Machine Learning Overview
According to Andreybu, a German scientist with more than 5 years of the machine learning experience, “If you can understand whether the machine learning task is a regression or classification problem then choosing the right algorithm is a piece of cake.”
To enumerate, the main difference between them is that the output variable in the regression is numerical (or continuous) whereas that for classification is categorical (or discrete).
Regression in Machine Learning
To start with, the regression algorithms attempt to estimate the mapping function (f) from the input variables (x) to numerical or continuous output variables (y). Now, the output variable could be a real value, which can be an integer or a floating point value. Therefore, the regression prediction problems are usually quantities or sizes.
For example, if you are provided with a dataset about houses, and you are asked to predict their prices, that is a regression task because the price will be a continuous output.
By contrast, in the case of classification algorithms, y is a category that the mapping function predicts. To elaborate, for single or several input variables, a classification model will attempt to predict the value of a single or several conclusions.
For instance, if you are provided with a dataset about houses, a classification algorithm can try to predict whether the prices for the houses “sell more or less than the recommended retail price.” Here the two discrete categories: above or below the said price.
Examples of the common classification algorithms include logistic regression, Naïve Bayes, decision trees, and K Nearest Neighbors.
Choosing the Right Algorithms
Understand Your Data
Take a look at the summary statistics
Use the ‘Percentile’ parameter to identify the ranges of the data
Averages and medians describe the central tendency
Correlations can indicate strong relationships
Visualize the Data
Box plots can indicate exceptions.
Density plots and histograms show the spread of data
Scatter plots can describe quantity relationships
Clean the Data
Deal with a missing value. The result is subjected to give sensitive outcomes in the case (missing data for certain variables can result in inaccurate predictions)
Although tree models are less sensitive to the presence of outliers, regressive models or other models that uses equations are more sensitive to exceptions
Basically, outliers could be the result of bad data collection, or they could be legitimate extreme values
Curate the Data
Furthermore, while converting the raw data to a polished one compliant to the models, one must take care of the following :
Make the data easier to interpret.
Capture more complex data.
Focus on reducing data redundancy and dimensionality.
Normalize the variable values.
Categorize the Problem Through Input Variable
You have labeled data; it’s a supervised learning problem.
If you have unlabelled data and want to find structure, it’s an unsupervised learning problem.
In case you want to optimize an objective function by interacting with an environment, it’s a reinforcement learning problem.
Categorize the Problem Through Output Variable
The output of your model is a number; it’s a regression problem.
When the output of your model is a class, then it’s a classification problem.
The output of your model is a set of input groups; it’s a clustering problem.
The constraint factor
Take a note of the storage capacity as it varies for various models.
Does the prediction have to be fast? For instance, in real time scenarios like the classification of road signs be as fast as possible to avoid accidents.
Finally, Find the Algorithm
Now that you have a clear picture of your data, you could implement proper tools to choose the right algorithm.
Meanwhile, for a better decision, here is a checklist of the factors for you:
See if the model aligns to your business goal
How much pre-processing the model requires
Check the accuracy of the model
How explanable the model is
How fast the model is: How long does it take to build a model, and how long does the model take to make predictions
The scalability of the model
To add to, one must pay attention to the complexity of the algorithm while choosing.
Generally speaking, you could measure the complexity of the model using the parameters:
When it requires two or more than ten features to learn and predict the target
It relies on more complex feature engineering (e.g., using polynomial terms, interactions, or principal components)
When the scenario has more computational overhead (e.g., a single decision tree vs. a random forest of 100 trees)
Besides, the same algorithm can be made more complex manually. It purely depends on the number of parameters indulged and the scenario under consideration. For instance, you could design a regression model with more features or polynomial terms and interaction terms. Or, you could design a decision tree with less depth.
The Common Machine Learning Algorithms
These are probably the simplest ones.
Few of the examples where linear regression is used are:
Firstly, when it’s time to go one location to another
Predicting sales of a particular product next month
Impact of blood alcohol content on coordination
Predict monthly gift card sales and improve yearly revenue projections
Apparently, there are a lot of advantages to this algorithm—integration of more features with a nice interpretation facility, easy updating facility to annex new data.
To put it differently, you could use this for:
Predicting customer churning.
The particular case of credit scoring or fraud detection.
Measuring the effectiveness of marketing campaigns.
Apparently, single trees are used rarely, but in composition, with many others, they build efficient algorithms such as Random Forest or Gradient Tree Boosting. However, one of the disadvantages is they don’t support online learning, so you have to rebuild your tree when new examples come on.
Trees are excellent for:
Bank Loan defaulters
Sales lead qualifications
Most importantly, Naive Bayes is a right choice when CPU and memory resources are a limiting factor. However, Its main disadvantage is that it can’t learn interactions between features.
It can be used for:
To mark an email as spam or not.
Sentiment analysis and text classification.
Therefore, generally speaking, in a real-time scenario, it is somewhat hard to under the right machine learning algorithm for the purpose. However, you could use this checklist to shortlist a few algorithms at your convenience.
Moreover, opting for the right solution to a real-life problem requires expert business understanding along with the right algorithm. So, teach your data into the right algorithms, run them all in either parallel or serial, and at the end evaluate the performance of the algorithms to select the best one(s).