Introduction another such as top and bottom

Introduction

 

Digging
information from the pool of data is termed as data mining. There is humungous
data available in the information industry that is useless unless converted
into beneficial information and analyzed to discover any fraudulence, buyer’s
choice, to control the manufacturing of products and understand the market
better.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Data
mining helps the entrepreneurs to know their customers better in a way of their
choices, the deals for their money, their income and criteria by which they
like to spend. It also gives an idea how often a customer likes to spend and
makes one capable to relate different people with similar choices.

Apart
from these it also assists in cooperate sector.

 

 DATA MINING TASKS

Data
mining is categorized as “Descriptive and Classification and production” on the
basis of the type of the data.

·               
Descriptive
function
It describes the basic feature of information in database such as:

-Class/concept description
-Mining of frequent patterns
-Mining of association
-Mining of correction
-Mining of clusters

1.       
CLASS/CONCEPT DESCRIPTION
Class- The products to be sold by the company, for example, clothes.
Concept- The money being spent by the customer, shoppers or the ones who buy in
budget.

They can be gathered
in two ways:

– Data Characterization: Review the data of the class to be studied namely the
‘Target class’
– Data Discrimination: Comparison of the class with a designated class.

2.          
MINING OF FREQUENT
PATTERNS
The
products (patterns) that usually are seen in transactional data are termed as
frequent patterns.

– Frequent item set: The products that are enlisted with one another such as
top and bottom wear in clothing section.
– Frequent sub sequence: The products that are generally bought with the main
item such as buying pet food followed by pet treats.
-Frequent sub structure: Graphs, trees or various other structural forms that
are attached to sub sequences.

3.        MINING
OF ASSOCIATION
The item that are generally bought together are included in this category. With
the help of this a businessman discovers a percentage of association between
products bought together such as 60 percent of times a mobile phone is bought
with a mobile cover and 40 percent of times with screen guards.

4.          
MINING OF CORRELATION
It reveals the effect of purchase of one product over another whether it has a
negative, positive or no effect at all.

5.       MINING OF CLUSTERS
It is grouping the like similar products from one another. Each cluster
varies from the other.

 

 

·               
Classification
and prediction
The
class label of some items may be unknown. Classification and prediction is one
such procedure that can be utilized to uncover the data class or concepts.
This procedure is presented as:
(a) Classification (If-Then) rules
(b) Decision trees
(c) Mathematical formulae
(d) Neural networks

 

FUNCTIONS:

-Classification: Deriving model that differentiates the class or
concept of the information. This model is based on the object with a well known
class label.
– Prediction: Regression analysis is
brought to practice to predict the numerical values that are unknown rather
than the class label. Also it is used to identify sale trends on the basis of
data available.
-Outlier analysis: The data that
does not abide by the model of data available is an outlier data.

-Evolution analysis: It refers to those subjects which are transitional
in nature.

HOW
DOES THE CLASSIFICATION WORK?

It
incorporates two stages:

– Building the
classifier or model

– Using classifier for
classification

 

BUILDING
THE CLASSIFIER

-It is a learning step

-Order calculations
assemble the classifier

-Set made from
database tuples and related class labels

-Each type is called
as classification or class are known as test/question or information points.

 

 

 

 

Figure 1

 

USING
THE CLASSIFIER

Classifier is utilized
for arrangements that include analyzing the relevance and exactness of
characterization rules and thus linking the older and new information tuples if
considered adequate.

 

Figure 2

 

 

DATA MINING
TASK PRIMITIVES

 

A data mining exercise
can be specified as a query.

-Transfer the query to
the computer.

-This query is hence
derived as data mining task primitive.

-Therefore, the
primitive develop an interactive communication with data mining system.

 

This process is
undertaken with following requirements:

 

·               
Mine
the appropriate data:

Part of database that
is of user’s interest.

It is composed of:-

database attributes
and data warehouse dimensions of interest.

 

·               
Nature
of information for mining process

It advices the
functions to be undertaken which are:

            -characterization
            -discrimination

            -association and correlation
analysis

            -classification

            -prediction

            -clustering

            -outlier analysis

            -evolution analysis

 

·               
Stored
knowledge

It permits the mining
of information at multifarious levels of contemplation.

E.g. the concept of
hierarchies.

 

·               
Effectiveness
measures and outset for evaluation for the patterns

The patterns
discovered through stored knowledge are appraised.

 

·      
Presentation
to anticipate the uncovered patterns

It alludes to the
visualization of discovered patterns by the means of rules, tables, charts,
decision trees, graphs etc.

 

ISSUES IN
DATA MINING

Data aggregation can
be complicated due unavailability of information all at a single place. It
creates a need to be collected from varied sources.

The major points of
concern are:

(i)Mining methodology
and user interest

(ii)Performance issues

(iii)Diverse data type
issues

 

The following diagram
shows issues in data mining:

 

 

Figure
3

 

DATA
WAREHOUSE

In order to back the
discussion of management following features are exhibited:

Subject oriented

Since the information
related to subject that could be sales, customer, product etc, so data
warehouse is considered as subject oriented. In addition, it does not consider
the prevalent operation but the analysis of data for decision making.

Integrated

Since  the data is collected from variable sources,
it makes it reliable for studying the data.

Time variant

The data is recognized
in relation to the past view points.

Non volatile

Data warehouse is kept
aloof the operational database. So any new information does not delete or
replace the previously stored information.

 

Data warehousing is
composed of data cleaning, integration and consolidation and is followed
through two approaches i.e. query driven and update driven viz a viz the former
builds the wrappers and integrations also called mediators and the latter makes
the data available for direct query. Update driven approach is today’s
approach.

 

APPLICATIONS

Data mining is used
in:

·      
Retail
industry

·      
Telecommunication

·      
Financial
data analysis

·      
Intrusion
detection

·      
Biological
data analysis

·      
Other
scientific applications

·               
Data
mining in banking/finance

In
financial arena data mining is reliable to predict payment of the loans and
analysis of the credit policy and detect any fraudulence.

·               
Data
mining in marketing

Similarly
in retail industry it helps in better understanding of customers, products,
sales, etc.

·               
Data
mining in healthcare

It
helps preserve a large data as in bioinformatics that enables study in various
biological aspects such as genomics, proteomics and biomedical research.

 

 

TRENDS IN DATA MINING

There
is a constant evolution of concept in data mining such as follows:

·      
Visualization

·      
Exploring
the application

·      
Web
mining

·      
Biological
mining

·      
Privacy
protection

·      
Distributed
data mining