Ana içeriğe atla

BİG DATA IMPLEMENTATION

To get the most business value from big data, it needs to be integrated
into your business processes. How can you take action based on your
analysis of big data unless you can understand the results in context with
your operational data? Differentiating your company as a result of making
good business decisions depends on many factors. One factor that is
becoming increasingly important is your capability to integrate internal
and external data sources comprised of both traditional relational data and
newer forms of unstructured data. While this may seem like a daunting task,
the reality is that you probably already have a lot of experience with data
integration. Don’t toss aside everything you have learned about delivering
data as a trusted source to your organization. You will want to place a high
priority on data quality as you move to make big data analytics actionable.
However, to bring your big data environments and enterprise data environments
together, you will need to incorporate new methods of integration that
support Hadoop and other nontraditional big data environments.
Two major categories of big data integration are covered in this chapter: the
integration of multiple big data sources in big data environments and the
integration of unstructured big data sources with structured enterprise data.
We cover the traditional forms of integration such as extract, transform, and
load (ETL) and new solutions designed for big data platforms.
Identifying the Data You Need Before you can begin to plan for integration of your big data, you need to take stock of the type of data you are dealing with. Many organizations are
recognizing that a lot of internally generated data has not been used to its full
potential in the past. By leveraging new tools, organizations are gaining new
insight from previously untapped sources of unstructured data in e-mails,
customer service records, sensor data, and security logs. In addition, much
interest exists in looking for new insight based on analysis of data that is
primarily external to the organization, such as social media, mobile phone
location, traffic, and weather.
Your analysis may require that you bring several of these big data sources
together. To complete your analysis, you need to move large amounts of data
from log files, Twitter feeds, RFID tags, and weather data feeds and integrate
all these elements across highly distributed data systems. After you complete
your analysis, you may need to integrate your big data with your operational
data. For example, healthcare researchers explore unstructured information
from patient records in combination with traditional medical record patient
data such as test results to begin improving patient care and improving
quality of care. Big data sources like information from medical devices and
clinical trials may be incorporated into the analysis as well.
As you begin your big data analysis, you probably do not know exactly what
you will find. Your analysis will go through several stages. You may begin
with petabytes of data, and as you look for patterns, you may narrow your
results. The following three stages are described in more detail:
✓ Exploratory stage
✓ Codifying stage
✓ Integration and incorporation stage
Exploratory stage
In the early stages of your analysis, you will want to search for patterns in the
data. It is only by examining very large volumes (terabytes and petabytes)
of data that new and unexpected relationships and correlations among elements
may become apparent. These patterns can provide insight into customer
preferences for a new product, for example. You will need a platform
such as Hadoop for organizing your big data to look for these patterns.
As described in Chapters 9 and 10, Hadoop is widely used as an underlying
building block for capturing and processing big data. Hadoop is designed
with capabilities that speed the processing of big data and make it possible
to identify patterns in huge amounts of data in a relatively short time. The
two primary components of Hadoop — Hadoop Distributed File System
(HDFS) and MapReduce — are used to manage and process your big data.

Bu blogdaki popüler yayınlar

Cloud Computing Reference Architecture: An Overview

The Conceptual Reference Model Figure 1 presents an overview of the NIST cloud computing reference architecture, which identifies the major actors, their activities and functions in cloud computing. The diagram depicts a generic high-level architecture and is intended to facilitate the understanding of the requirements, uses, characteristics and standards of cloud computing. As shown in Figure 1, the NIST cloud computing reference architecture defines five major actors: cloud consumer, cloud provider, cloud carrier, cloud auditor and cloud broker. Each actor is an entity (a person or an organization) that participates in a transaction or process and/or performs tasks in cloud computing. Table 1 briefly lists the actors defined in the NIST cloud computing reference architecture. The general activities of the actors are discussed in the remainder of this section, while the details of the architectural elements are discussed in Section 3. Figure 2 illustrates the intera

Cloud Architecture

The cloud providers actually have the physical data centers to provide virtualized services to their users through Internet. The cloud providers often provide separation between application and data. This scenario is shown in the Figure 2. The underlying physical machines are generally organized in grids and they are usually geographically distributed. Virtualization plays an important role in the cloud scenario. The data center hosts provide the physical hardware on which virtual machines resides. User potentially can use any OS supported by the virtual machines used.  Operating systems are designed for specific hardware and software. It results in the lack of portability of operating system and software from one machine to another machine which uses different instruction set architecture. The concept of virtual machine solves this problem by acting as an interface between the hardware and the operating system called as system VMs . Another category of virtual machine is called

CLOUD COMPUTING – An Overview

Resource sharing in a pure plug and play model that dramatically simplifies infrastructure planning is the promise of „cloud computing‟. The two key advantages of this model are easeof-use and cost-effectiveness. Though there remain questions on aspects such as security and vendor lock-in, the benefits this model offers are many. This paper explores some of the basics of cloud computing with the aim of introducing aspects such as: Realities and risks of the model  Components in the model  Characteristics and Usage of the model  The paper aims to provide a means of understanding the model and exploring options available for complementing your technology and infrastructure needs. An Overview Cloud computing is a computing paradigm, where a large pool of systems are connected in private or public networks, to provide dynamically scalable infrastructure for application, data and file storage. With the advent of this technology, the cost of computation, application hosting, c