Senin, 18 November 2013

DATA MINING - Introduction



Data mining is often compared with gold mining. Large quantities of ore must be processed before the gold can be extracted. Data mining can help us find the ‘hidden gold’ of knowledge in raw data.

What is Data?
Data is what we collect and store
Knowledge is what helps us to make informed decisions
The extraction of knowledge from data is called DATA MINING
Data mining can also be defined as the exploration and analysis of large quantities of data in order
  to discover meaningful patterns and rules (Berry and Linoff, 2000).
The ultimate goal of data mining is to discover knowledge.

Big Data?
• Modern organisations must respond quickly to any change in the market
  However, an organisation must also determine which trends are relevant, and this cannot
  be accomplished without access to historical data that are stored in large databases called
  DATA WAREHOUSE

What is a data warehouse?
• The main characteristic of a data warehouse is its capacity, A data warehouseis really big – it
   includes millions, even billions, of data records
• A data warehouse is designed to support decision making in the organisation
• The information needed can be obtained with traditional query tools
• These tools might also help us in discovering important relationships in the data

What is the difference between a query tool and data mining?
• Traditional query tools are assumption-based
  With a data mining tool, instead of assuming certain relationships between different variables in
   a data set (and studying these relationships one at a time), we can determine the most significant
   factors that influence the outcome.

What Tasks Can Be Performed with Data Mining?
• Many problems of intellectual, economic, and business interest can be phrased in terms of the
  following six tasks:
    Classification 
    Estimation 
    Prediction 
    Affinity grouping
    Clustering 

Classification
• Classification consists of examining the features of a newly presented object and assigning it to
  one of a predefined set of classes

Examples of classification tasks  include: 
      •Classifying credit applicants as low, medium, or high risk
      Choosing content to be displayed on a Web page
      Determining which phone numbers correspond to fax machines
      Spotting fraudulent insurance claims

Estimation
• Estimation is similar to classification except that the target variable is numerical rather than
  categorical. 

Examples of estimation tasks include:
        Estimating the number of children in a family
        •Estimating a family’s total household income
        Estimating the lifetime value of a customer
        Estimating the probability that someone will respond to a balance transfer solicitation.

Prediction
• Prediction is similar to classification and estimation, except that for prediction, the results lie in
  the future

Examples of prediction tasks include:
          Predicting the size of the balance that will be transferred if a credit card prospect accepts a
           balance transfer offer
          Predicting which customers will leave within the next 6 months
          Predicting which telephone subscribers will order a value-added service such as three-way
           calling or voice mail

Affinity Grouping or Association Rules
The task of affinity grouping is to determine which things go together. 

Examples of Association tasks include:

          Affinity grouping is one simple approach to generating rules from data. If two items, say cat
           foodand kitty litter, occur together frequently enough, we can generate two association
           rules:
          People who buy cat food also buy kitty litter with probability P1.
          People who buy kitty litter also buy cat food with probability P2.

Clustering
Clustering refers to the grouping of records, observations, or cases into classes of similar objects.
 A cluster is a collection of records that are similar to one another, and dissimilar to records in other
 clusters.

For example, clustering might be the first step in a market segmentation effort: Instead of trying
 to come up with a one-size-fits-all rule for “what kind of promotion do customers respond to best,”
 first divide the customer base into clusters or people with similar buying habits, and then ask what 
 kind of promotion works best for each cluster

Why Now?
Most of the data mining techniques have existed, at least as academic algorithms, for years or 
 decades. However, it is only in the last decade that commercial data mining has caught on in a big 
 way. This is due to the convergence of several factors:
          The data is being produced.
          The data is being warehoused.
          Computing power is affordable.
          Interest in customer relationship management is strong.
          Commercial data mining software products are readily available.

Data Mining Today
 Supermarket become information broker
 Cross Selling
 Holding On to Good Customer
 Weeding out Bad Customer
 Revolutioning an Industry
 Etc

What are we going to learn :
Association
Cluster
Classification



0 komentar: