.
Data Exploration
1) Variable identification
2) Cleaning of data
3) Transformation
4) Types of analysis
5) Missing values treatment
6) Outlier treatment
outlier = data point that doesn't seem to fit within the set.
Feature identification
select the most relevant and appropriate features of your data
The curse of dimensionality
In ML applications, we often have high-dimensional data
Feature selection and feature extraction
Feature Selection
using only variables or features that are relevant to the problem at hand
Feature extraction
focus on combining existing features into new, derived features that better represent the data while also eliminating extra or redundant dimensionality
Principal Component Analysis
sophisticated feature extraction technique
Pearson correlation analysis
finding correlation among features
Cleaning and preparing data
--Handling missing data
----Missing categorical data
----Missing numerical data
--Handling noise
--Handling outliers
--Transforming and normalizing data
[MLJS][MLJS02]
No comments:
Post a Comment