Data Mining

WHAT the book is like

What is predictive modeling? What kind of data I need to train a model? Which algorithm I should choose? How to configure the parameters? How should I know which model is the most suitable one? Most importantly, how can I build the best model?

For each question, you can find a reliable answer in our book. Here are data mining theories as well as exercises and real-world cases. This will equip you with a solid understanding about the data mining techniques and their uses, and guarantee a quick and manageable start.

WHY you need THIS book

Because you DON'T need another syntax-focused yet algorithm-absent AI handbook that is essentially a Python programming book, which arms you only with luck for model building;

Because you DON'T need the empty theory inculcation and the abstract jargons;

Because you DON'T need a mathematically high-threshold algorithm design manual that difficult to read;

Because you NEED a beginners-friendly, just-in-time book to create your best model.

HOW easy data modeling can be

You only need the high school mathematical knowledge to be a master hand of data modeling.

But, as life has no shortcuts, you can't expect to have a deep understanding about the numerous data mining algorithms by reading this one book (a PhD in statistics is surely not a waste of time). Yet on the other hand, you will be able to, in a relatively easy way, grasp the basic data mining logic, build models using proper tools, learn to evaluate and choose models and perform optimizations.


  • Chapter 1 The concept of data mining
  • Chapter 2 Data exploration
    • 2.1 The significance of data exploration
    • 2.2 Determine data type
    • 2.3 Quantitative data exploration method and graph
    • 2.4 Qualitative data exploration method and graph
    • 2.5 Variable correlation analysis and graph
  • Chapter 3 Data pre-processing
    • 3.1 Variable preliminary filtering
    • 3.2 Outlier handling
    • 3.3 Missing value handling
    • 3.4 Categorical variable handling
    • 3.5 Time variable processing
    • 3.6 Skewness handing
    • 3.7 Balanced sampling
    • 3.8 Data standardization
    • 3.9 Dataset split
  • Chapter 4 Modeling
    • 4.1 Supervised learning
    • 4.2 Common concepts
    • 4.3 Linear model
    • 4.4 Tree model
    • 4.5 Ensemble learning
    • 4.6 Deep learning
    • 4.7 Automatic modeling
  • Chapter 5 Model evaluation
    • 5.1 Classification model evaluation
      • Confusion matrix
      • Accuracy table
      • ROC, AUC
      • Gini, KS
      • Lift graph
      • Recall chart
    • 5.2 Regression model evaluation
      • Model error evaluation
      • Residual plot
      • Result comparison graph
  • Chapter 6 Model tuning
    • 6.1 Derived variable
      • Binning
      • Variable transformation
      • Variable transformation align with target
      • Variable interaction
      • Ratio
      • Date time variable
      • Other derivatives
    • 6.2 Algorithm selection and parameter tuning
      • Algorithm selection
      • Parameter tuning
    • 6.3 Appendix - Common algorithm parameters
  • Chapter 7 Comprehensive cases
    • 7.1 Classification model case
    • 7.2 Regression model case


Resouce Link