Ana içeriğe atla

What is data mining?



Data mining involves exploring and analyzing large amounts of data to find patterns in that data. The techniques came out of the fields of statistics and artificial intelligence (AI), with a bit of database management thrown into the mix.

Generally, the goal of the data mining is either
classification or prediction. In classification, the idea is to sort data into groups. For example, a marketer might be interested in the characteristics of those who responded versus who didn’t respond to a promotion. These are two classes.

In prediction, the idea is to predict the value of a continuous (that is, nondiscrete) variable. For example, a marketer might be interested in predicting
those who will respond to a promotion. Typical algorithms used in data mining include the following:

✓ Classification trees: 
A popular datamining technique that is used to classify a dependent categorical variable based on
measurements of one or more predictor variables. The result is a tree with nodes and links between the nodes that can be read to form if-then rules.

✓ Logistic regression: 
A statistical technique that is a variant of standard regression but
extends the concept to deal with classification. It produces a formula that predicts the probability of the occurrence as a function of the independent variables.

✓ Neural networks: 
A software algorithm that is modeled after the parallel architecture of animal brains. The network consists of input nodes, hidden layers, and output nodes. Each of the units is assigned a weight.
Data is given to the input node, and by a system of trial and error, the algorithm adjusts the weights until it meets a certain stopping criteria. Some people have likened These rules are then run over the test data set to determine how good this model is on “new data.” Accuracy measures are provided for the model.
For example, a popular technique is the confusion matrix. This matrix is a table that provides
information about how many cases were correctly versus incorrectly classified. If the this to a black–box (you don’t necessarily know what is going on inside) approach.

✓ Clustering techniques like K-nearest neighbors:
A technique that identifies groups of similar records. The K-nearest neighbor technique calculates the distances between the record and points in
the historical (training) data. It then assigns this record to the class of its nearest neighbor in a data set.

Here’s a classification tree example. Consider the situation where a telephone company wants to determine which residential customers are likely to disconnect their service. The telephone
company has information consisting of the following attributes: how long the person has had
the service, how much he spends on the service, whether he has had problems with the service,
whether he has the best calling plan for his needs, where he lives, how old he is, whether he has other services bundled together with his calling plan, competitive information concerning other carriers plans, and whether he still has the service or has disconnected the service.

Of course, you can find many more attributes than this. The last attribute is the outcome variable; this is what the software will use to classify the customers into one of the two groups — perhaps called stayers and flight risks.

The data set is broken into training data and a test data set. The training data consists of observations (called attributes) and an outcome variable (binary in the case of a classification model) — in this case, the stayers or the flight risks.

The algorithm is run over the training data and comes up with a tree that can be read like a series of rules. For example, if the customers have been with the company for more than ten years and they are over 55 years old, they are likely to remain as loyal customers. model looks good, it can be deployed on other data, as it is available (that is, using it to predict new cases of flight risk). Based on the model, the company might decide, for example, to send out special offers to those customers whom it
thinks are flight risks.

Bu blogdaki popüler yayınlar

Cloud Computing Reference Architecture: An Overview

The Conceptual Reference Model Figure 1 presents an overview of the NIST cloud computing reference architecture, which identifies the major actors, their activities and functions in cloud computing. The diagram depicts a generic high-level architecture and is intended to facilitate the understanding of the requirements, uses, characteristics and standards of cloud computing. As shown in Figure 1, the NIST cloud computing reference architecture defines five major actors: cloud consumer, cloud provider, cloud carrier, cloud auditor and cloud broker. Each actor is an entity (a person or an organization) that participates in a transaction or process and/or performs tasks in cloud computing. Table 1 briefly lists the actors defined in the NIST cloud computing reference architecture. The general activities of the actors are discussed in the remainder of this section, while the details of the architectural elements are discussed in Section 3. Figure 2 illustrates the intera

Cloud Architecture

The cloud providers actually have the physical data centers to provide virtualized services to their users through Internet. The cloud providers often provide separation between application and data. This scenario is shown in the Figure 2. The underlying physical machines are generally organized in grids and they are usually geographically distributed. Virtualization plays an important role in the cloud scenario. The data center hosts provide the physical hardware on which virtual machines resides. User potentially can use any OS supported by the virtual machines used.  Operating systems are designed for specific hardware and software. It results in the lack of portability of operating system and software from one machine to another machine which uses different instruction set architecture. The concept of virtual machine solves this problem by acting as an interface between the hardware and the operating system called as system VMs . Another category of virtual machine is called

CLOUD COMPUTING – An Overview

Resource sharing in a pure plug and play model that dramatically simplifies infrastructure planning is the promise of „cloud computing‟. The two key advantages of this model are easeof-use and cost-effectiveness. Though there remain questions on aspects such as security and vendor lock-in, the benefits this model offers are many. This paper explores some of the basics of cloud computing with the aim of introducing aspects such as: Realities and risks of the model  Components in the model  Characteristics and Usage of the model  The paper aims to provide a means of understanding the model and exploring options available for complementing your technology and infrastructure needs. An Overview Cloud computing is a computing paradigm, where a large pool of systems are connected in private or public networks, to provide dynamically scalable infrastructure for application, data and file storage. With the advent of this technology, the cost of computation, application hosting, c