HOW MODERN DATA SCIENTIST CHANGE THE WORLD?

Meno Triono
3 min readMay 17, 2021

“Data is the New Oil”

Companies that understand their customer and increase business growth
through data is the future company

Why The World need Data Scientist?

MULTI DISCIPLINARY

We need to understand the PROBLEM

1. How the management think

2. How the customer think

3. How the market shifts

Type of Analysis

  1. Descriptive Analytics
    Understand historical data
    Look for reasons behind past success/failures
  2. Predictive Analytic
    Determine future outcome
  3. Prescriptive Analytic (Optimization)
    Goes beyond predicting future outcome
    Suggest action to benefit from prediction

Goals : Get actionable insights, smarter decision, better business outcomes

The Workflow of Data Science Project

CRoss-Industry Standard Process for Data Mining

Developed in 1996 by big players in data analysis (SPSS, Teradata, Daimler, OCHRA, NCR)

1. Understand business, problem, objective
2. Data collection Get familiar with data
3. Clean, Formatting, Blend, Sample Exploratory Data Analysis
4. Model selection, feature selection, tuning
5. Communicate insight Explanatory visualization
6. Evaluate model quality Objective met

CRISP-DM (Business Understanding)

  1. Determine Business Objective
    Background, Business Objective
  2. Assess Situation
    Data, Resources, Assumptions
  3. Determine Goals
    Ideally with quantitative success criteria
  4. Develop project plan
    Estimate timeline, budget, methodology

Example:

  • Business Hypothesis A Company wanted to know the profile of customers who have historically doing up-sell product
  • Expected Output Get list of customer with high probability to up-sell
  • Data Availability Data : Lifetime (2015–2020)
  • Methodology Descriptive behavioral analysis of customer profile for up-sell and not up-sell

CRISP-DM (Data Understanding)

  1. Collect Initial Data
    Initial data collection report
  2. Describe Data Data
    description report
  3. Explore Data
    Data exploration report
  4. Verify Data Quality
    Carefully document problems and issues found

Example:

  • Data Sources
    Users Profile, Users Transaction
  • Data Location
    Inter Department, Across Department, External Data, Public Data
  • Data Format
    Hard Copy, Digital Documents, Database
  • Data Types
    Numerical, Text, Image, Audio, Video
  • Acquisition method
    Data Warehousing, Rest API, Web Scraping

CRISP-DM (Data Preparation)

Data understanding and preparation will usually consume half or more of your project time

CRISP-DM

1. Modeling

  • Select Modelling Technique
    Assumptions, measure of accuracy
  • Generate Test Design
    Test design
  • Build Model
    Parameter settings, model description
  • Assess Model
    Model assessment (iterate the above)

2. Evaluation

  • Evaluate Results
    Metric for evaluation
  • Review Process
    Evaluate every step
  • Determine Next Steps
    To deploy or not to deploy?

3. Visualization

  • Know the Audience
    Content will adjustable based on audience
  • Storytelling
    Manage flow of insight
  • Visualization is All About Perception
    • Colour • Typography • Choosing the right chart

--

--

Meno Triono

digital enthusiasts, ranging from digital marketers and beloving with data science