DATA MINING - Introduction ~ STMIK Triguna Dharma 7SIC5 Class

Data mining is often compared with gold mining. Large quantities of ore must be processed before the gold can be extracted. Data mining can help us ﬁnd the ‘hidden gold’ of knowledge in raw data.

What is Data?

•Data is what we collect and store

•Knowledge is what helps us to make informed decisions

•The extraction of knowledge from data is called DATA MINING

•Data mining can also be deﬁned as the exploration and analysis of large quantities of data in order

to discover meaningful patterns and rules (Berry and Linoff, 2000).

•The ultimate goal of data mining is to discover knowledge.

Big Data?

• Modern organisations must respond quickly to any change in the market

However, an organisation must also determine which trends are relevant, and this cannot

be accomplished without access to historical data that are stored in large databases called

DATA WAREHOUSE

What is a data warehouse?

• The main characteristic of a data warehouse is its capacity, A data warehouseis really big – it

includes millions, even billions, of data records

• A data warehouse is designed to support decision making in the organisation

• The information needed can be obtained with traditional query tools

• These tools might also help us in discovering important relationships in the data

What is the difference between a query tool and data mining?

• Traditional query tools are assumption-based

• With a data mining tool, instead of assuming certain relationships between different variables in

a data set (and studying these relationships one at a time), we can determine the most signiﬁcant

factors that inﬂuence the outcome.

What Tasks Can Be Performed with Data Mining?

• Many problems of intellectual, economic, and business interest can be phrased in terms of the

following six tasks:

•Classification

•Estimation

•Prediction

•Affinity grouping

•Clustering

Classification

• Classification consists of examining the features of a newly presented object and assigning it to

one of a predefined set of classes

•Examples of classification tasks include:

•Classifying credit applicants as low, medium, or high risk

•Choosing content to be displayed on a Web page

•Determining which phone numbers correspond to fax machines

•Spotting fraudulent insurance claims

Estimation

• Estimation is similar to classiﬁcation except that the target variable is numerical rather than

categorical.

•Examples of estimation tasks include:

•Estimating the number of children in a family

•Estimating a family’s total household income

•Estimating the lifetime value of a customer

•Estimating the probability that someone will respond to a balance transfer solicitation.

Prediction

• Prediction is similar to classiﬁcation and estimation, except that for prediction, the results lie in

the future

•Examples of prediction tasks include:

•Predicting the size of the balance that will be transferred if a credit card prospect accepts a

balance transfer offer

•Predicting which customers will leave within the next 6 months

•Predicting which telephone subscribers will order a value-added service such as three-way

calling or voice mail

Affinity Grouping or Association Rules

•The task of affinity grouping is to determine which things go together.

•Examples of Association tasks include:

•Affinity grouping is one simple approach to generating rules from data. If two items, say cat

foodand kitty litter, occur together frequently enough, we can generate two association

rules:

•People who buy cat food also buy kitty litter with probability P1.

•People who buy kitty litter also buy cat food with probability P2.

Clustering

•Clustering refers to the grouping of records, observations, or cases into classes of similar objects.

A cluster is a collection of records that are similar to one another, and dissimilar to records in other

clusters.

•For example, clustering might be the first step in a market segmentation effort: Instead of trying

to come up with a one-size-fits-all rule for “what kind of promotion do customers respond to best,”

first divide the customer base into clusters or people with similar buying habits, and then ask what

kind of promotion works best for each cluster

Why Now?

•Most of the data mining techniques have existed, at least as academic algorithms, for years or

decades. However, it is only in the last decade that commercial data mining has caught on in a big

way. This is due to the convergence of several factors:

•The data is being produced.

•The data is being warehoused.

•Computing power is affordable.

•Interest in customer relationship management is strong.

•Commercial data mining software products are readily available.

Data Mining Today

•Supermarket become information broker

•Cross Selling

•Holding On to Good Customer

•Weeding out Bad Customer

•Revolutioning an Industry

•Etc

What are we going to learn :

•Association

•Cluster

•Classification

DOWNLOAD FILE

STMIK Triguna Dharma 7SIC5 Class

Senin, 18 November 2013

DATA MINING - Introduction

0 komentar:

Chat Box

Beranda

Pages

Label

Entri Populer

Blog Archive

IKLAN