Normalisation vs Standardisation
Any machine learning algorithm, generally involves components as Optimisation procedure, cost function, modelling technique and the most important is “Dataset to learn”. Its said that any ML algo performs as good as the dataset it is fed with.
Most of the time that is spent following “Knowledge discovery from data” pipeline is data collection, cleaning and pre-processing. Data Preprocessing could involve multiple techniques as data transformation, analysing redundant data or outlier detection. All these anomalies in dataset could cause our model to under perform. There are couple of questions we should take care of before applying a model on any dataset. These are:-
- Can model handle missing values?
- What will be the effect of outliers in dataset?
- Is feature scaling required before training ML model?
Point of concern, for this post is Feature Scaling of Dataset.
Feature Scaling
Lets take one of the simplest usecase of ML Linear Regression to predict
Price of flat~ f(#rooms, #toilets, Area of flat, …)
We could note that the feature #rooms is going to be in range (0, 10] while feature like Area of flat is going to be in range [100, 1000] and when creating a model using LR…