Data mining is often compared with gold mining. Large quantities of ore must be processed before the gold can be extracted. Data mining can help us find the ‘hidden gold’ of knowledge in raw data.
What is Data?
•Data is what we collect and store
•Knowledge is what helps us to make informed decisions
•The extraction of knowledge from data is called DATA MINING
•Data mining can also be defined as the exploration and analysis of large quantities of data in order
to discover meaningful patterns and rules (Berry and Linoff, 2000).
•The ultimate goal of data mining is to discover knowledge.
Big Data?
• Modern organisations must respond quickly to any change in the market
However, an organisation must also determine which trends are relevant, and this cannot
be accomplished without access to historical data that are stored in large databases called
DATA WAREHOUSE
What is a data warehouse?
• The main characteristic of a data warehouse is its capacity, A data warehouseis really big – it
includes millions, even billions, of data records
• A data warehouse is designed to support decision making in the organisation
• The information needed can be obtained with traditional query tools
• These tools might also help us in discovering important relationships in the data
What is the difference between a query tool and data mining?
• Traditional query tools are assumption-based
• With a data mining tool, instead of assuming certain relationships between different variables in
a data set (and studying these relationships one at a time), we can determine the most significant
factors that influence the outcome.
What Tasks Can Be Performed with Data Mining?
• Many problems of intellectual, economic, and business interest can be phrased in terms of the
following six tasks:
•Classification
•Estimation
•Prediction
•Affinity grouping
•Clustering
Classification
• Classification consists of examining the features of a newly presented object and assigning it to
one of a predefined set of classes
•Examples of classification tasks include:
•Classifying credit applicants as low, medium, or high risk
•Choosing content to be displayed on a Web page
•Determining which phone numbers correspond to fax machines
•Spotting fraudulent insurance claims
Estimation
• Estimation is similar to classification except that the target variable is numerical rather than
categorical.
•Examples of estimation tasks include:
•Estimating the number of children in a family
•Estimating a family’s total household income
•Estimating the lifetime value of a customer
•Estimating the probability that someone will respond to a balance transfer solicitation.
Prediction
• Prediction is similar to classification and estimation, except that for prediction, the results lie in
the future
•Examples of prediction tasks include:
•Predicting the size of the balance that will be transferred if a credit card prospect accepts a
balance transfer offer
•Predicting which customers will leave within the next 6 months
•Predicting which telephone subscribers will order a value-added service such as three-way
calling or voice mail
Affinity Grouping or Association Rules
•The task of affinity grouping is to determine which things go together.
•Examples of Association tasks include:
•Affinity grouping is one simple approach to generating rules from data. If two items, say cat
foodand kitty litter, occur together frequently enough, we can generate two association
rules:
•People who buy cat food also buy kitty litter with probability P1.
•People who buy kitty litter also buy cat food with probability P2.
Clustering
•Clustering refers to the grouping of records, observations, or cases into classes of similar objects.
A cluster is a collection of records that are similar to one another, and dissimilar to records in other
clusters.
•For example, clustering might be the first step in a market segmentation effort: Instead of trying
to come up with a one-size-fits-all rule for “what kind of promotion do customers respond to best,”
first divide the customer base into clusters or people with similar buying habits, and then ask what
kind of promotion works best for each cluster
Why Now?
•Most of the data mining techniques have existed, at least as academic algorithms, for years or
decades. However, it is only in the last decade that commercial data mining has caught on in a big
way. This is due to the convergence of several factors:
•The data is being produced.
•The data is being warehoused.
•Computing power is affordable.
•Interest in customer relationship management is strong.
•Commercial data mining software products are readily available.
Data Mining Today
•Supermarket become information broker
•Cross Selling
•Holding On to Good Customer
•Weeding out Bad Customer
•Revolutioning an Industry
•Etc
What are we going to learn :
•Association
•Cluster
•Classification
0 komentar:
Posting Komentar