HOW MODERN DATA SCIENTIST CHANGE THE WORLD?
“Data is the New Oil”
Companies that understand their customer and increase business growth
through data is the future company
Why The World need Data Scientist?
MULTI DISCIPLINARY
We need to understand the PROBLEM
1. How the management think
2. How the customer think
3. How the market shifts
Type of Analysis
- Descriptive Analytics
Understand historical data
Look for reasons behind past success/failures - Predictive Analytic
Determine future outcome - Prescriptive Analytic (Optimization)
Goes beyond predicting future outcome
Suggest action to benefit from prediction
Goals : Get actionable insights, smarter decision, better business outcomes
The Workflow of Data Science Project
CRoss-Industry Standard Process for Data Mining
1. Understand business, problem, objective
2. Data collection Get familiar with data
3. Clean, Formatting, Blend, Sample Exploratory Data Analysis
4. Model selection, feature selection, tuning
5. Communicate insight Explanatory visualization
6. Evaluate model quality Objective met
CRISP-DM (Business Understanding)
- Determine Business Objective
Background, Business Objective - Assess Situation
Data, Resources, Assumptions - Determine Goals
Ideally with quantitative success criteria - Develop project plan
Estimate timeline, budget, methodology
Example:
- Business Hypothesis A Company wanted to know the profile of customers who have historically doing up-sell product
- Expected Output Get list of customer with high probability to up-sell
- Data Availability Data : Lifetime (2015–2020)
- Methodology Descriptive behavioral analysis of customer profile for up-sell and not up-sell
CRISP-DM (Data Understanding)
- Collect Initial Data
Initial data collection report - Describe Data Data
description report - Explore Data
Data exploration report - Verify Data Quality
Carefully document problems and issues found
Example:
- Data Sources
Users Profile, Users Transaction - Data Location
Inter Department, Across Department, External Data, Public Data - Data Format
Hard Copy, Digital Documents, Database - Data Types
Numerical, Text, Image, Audio, Video - Acquisition method
Data Warehousing, Rest API, Web Scraping
CRISP-DM (Data Preparation)
CRISP-DM
1. Modeling
- Select Modelling Technique
Assumptions, measure of accuracy - Generate Test Design
Test design - Build Model
Parameter settings, model description - Assess Model
Model assessment (iterate the above)
2. Evaluation
- Evaluate Results
Metric for evaluation - Review Process
Evaluate every step - Determine Next Steps
To deploy or not to deploy?
3. Visualization
- Know the Audience
Content will adjustable based on audience - Storytelling
Manage flow of insight - Visualization is All About Perception
• Colour • Typography • Choosing the right chart