Getting Performance Right

Deniz Deniz Cengiz - Ağustos 15, 2017

Just having a faster computer isn’t enough to ensure the right level of performance to handle big data. You need to be able to distribute components of your big data service across a series of nodes. See Figure 1.
In distributed computing, a node is an element contained within a cluster of systems or within a rack. A node typically includes CPU, memory, and some kind of disk.

However, a node can also be a blade CPU and memory that rely on nearby storage within a rack.

Within a big data environment, these nodes are typically clustered together to provide scale. For example, you might start out with a big data analysis and continue to add more data sources. To accommodate the growth, an organization simply adds more nodes into a cluster so that it can scale out to accommodate growing requirements. However, it isn’t enough to simply expand the number of nodes in the cluster. Rather, it is important to be able to send part of the big data analysis to different physical environments.

Where you send these tasks and how you manage them makes the difference between success and failure.
In some complex situations, you may want to execute many different algorithms in parallel, even within the same cluster, to achieve the speed of analysis required. Why would you execute different big data algorithms in parallel within the same rack? The closer together the distributions of functions
are, the faster they can execute. Although it is possible to distribute big data analysis across networks to take advantage of available capacity, you must do this type of distribution based on requirements for performance. In some situations, the speed of processing takes a back seat. However, in other
situations, getting results fast is the requirement. In this situation, you want to make sure that the networking functions are in close proximity to each other. In general, the big data environment has to be optimized for the type of analytics task.

Therefore, scalability is the lynchpin of making big data operate successfully. Although it would be theoretically possible to operate a big data environment within a single large environment, it is not practical. To understand the needs for scalability in big data, one only has to look at cloud scalability and understand both the requirements and the approach. Like cloud computing, big data requires the inclusion of fast networks and inexpensive clusters of hardware that can be combined in racks to increase performance. These clusters are supported by software automation that enables dynamic scaling and load balancing.

The design and implementations of MapReduce are excellent examples of how distributed computing can make big data operationally visible and affordable. For more information on MapReduce, refer to Chapter 8. In essence, we are at one of the unique turning points in computing where technology
concepts come together at the right time to solve the right problems.

Combining distributed computing, improved hardware systems, and practical solutions such as MapReduce and Hadoop is changing data management in profound ways.

Bu Blogda Ara

dipl.-Ing. Deniz Cengiz

Getting Performance Right

Yorumlar

Yorum Gönder

En çok okunanlar

Cloud Computing Reference Architecture: An Overview

Cloud Architecture

Teknolojik Altyapıdan Ne Anlıyoruz?

Run SAP İş Ortağı Programı, En İyi Çözüm Operasyonunu Nasıl Sağlar?

CLOUD COMPUTING – An Overview

BİG DATA MANAGEMENT

Artırılmış Gerçeklik nedir ve hangi alanlarda kullanılıyor?

KÖRLER ÜLKESİNE KRAL OLMAK

Blockchain, sözleşmelerin dijital koda yerleştirildiği ve şeffaf paylaşılan veri tabanlarına depolandığı, silinmesi, değiştirilmesi ve düzeltilmesinden korunan bir dünyayı hayal edebiliriz.

Bilgi Sisteminin Yazılım Yetenek Olgunluk Modeli ile İlişkisi