Ana içeriğe atla

Ten Big Data Best Practices

While we are at an early stage in the evolution of big data, it is never too early to get started with good practices so that you can leverage what you are learning and the experience you are gaining. As with every important emerging technology, it is important to understand why you need to leverage the technology and have a concrete plan in place. In this chapter, we provide you with the top-ten best practices you need to understand as you begin the journey to manage big data.

Understand Your Goals
Many organizations start their big data journey by experimenting with a single project that might provide some concrete benefit. By selecting a project, you have the freedom of testing without risking capital expenditures. However, if all you end up doing is a series of one-off projects, you will likely not have a good plan in place when you begin to understand the value of leveraging big data
in the company. Therefore, after you conclude some experiments and have a good initial understanding of what might be possible, you need to set some goals — both short- and long-term. What do you hope to accomplish with big data? Could parts of your business be more profitable with the infusion of more data to predict customer behavior or buying patterns? It is important to have a collaboration between IT and business units to come up with welldefined goals.

After you understand the goals you have for leveraging big data, your work is just beginning. You now need to get to the meat of the issues. You need to involve all the stakeholders in the business. Big data affects every aspect of your organization, including the historical data that you already store, the information sources managed by different business units. New data sources may be considered in some business areas that few managers are even aware of. Getting a task force together is a great way to get representatives of the business together so that they can see how their data management issues
are related. This team can evolve into a team that can help various business units with best practices. The task force should have representatives from upper-management leaders who are setting business strategy and direction.

Establish a Road Map
At this stage, you have experimented with big data and determined your company’s goals and objectives. You have a good understanding of what upper management and business units need to accomplish. It is time to establish a road map. Your road map is your action plan. You clearly can’t do all the projects and meet all the demands from your company simultaneously. Your road map needs to begin with the set of foundational services that can help your company get started. Part of your road map should include the existing data services. Make sure that your road map has benchmarks that are reasonable and achievable. If you take on too much, you will not be able to demonstrate to management that you are executing well. Therefore, you don’t need a ten-year road map. Begin with a one- to two-year road map with well-defined goals and outcomes. You should include both business and technical goals as part of the road map.

Discover Your Data
No company ever complains that it has too little data. In reality, companies are swimming in data. The problem is that companies often don’t know how to use that data pragmatically to be able to predict the future, execute on important business processes, or simply gain new insights. The goal of your big data strategy and plan should be to find a way to leverage data for more predictable business outcomes. But you need to walk before you run. We recommend that you start by embarking on a discovery process. You need to get a handle on what data you already have, where it is, who owns and controls it, and how it is currently used. What are the third-party data sources that your company relies on? This process will give you a lot of insights.

For example, it will let you know how many data sources you have and how much overlap exists. This process will also help you to understand the gaps in knowledge about those sources. You might discover that lots of duplicate data exists in one area of the business and almost no data exists in another area. You might discover that you are dependent on third-party data that isn’t as accurate as it should be. Spend the time you need to do this discovery process because it will be the foundation for your planning and execution of your big data strategy.

Figure Out What Data You Don’t Have
Now that you have discovered what data you have, it is time to think about what is missing. Take advantage of the task force you have set up. Business leaders are your best source of information. These leaders will understand better than anyone else what is keeping them from making even better decisions.

When you start this process of determining what you need and what is missing, it is good to encourage people to think out of the box. For example, you might want to ask something like this: “If you could have any information at any speed to support the business and cost were no issue, what would you want?” This doesn’t mean that cost isn’t an issue. Rather, you are looking for management to think out of the box about what could really change the business.
With the innovation happening in the data space, some of these wild ideas and hopes are actually possible.

Understand the Technology Options
At this point, you understand your company’s goals, you have an understanding of what data you have, and you know what data is missing. But how do you take actions to execute your strategy? You have to know what technologies are available and how they might be able to assist your company to produce better outcomes.
Therefore, do your homework. Begin to understand the value of  technologies such as Hadoop, streaming data offerings, and complex event-processing products. You should look at different types of databases such as in-memory databases, spatial databases, and so on. You should get familiar with the tools and techniques that are emerging as part of the big data
ecosystem. It is important that your team has enough of an understanding of the technology available to make well-informed choices.

Plan for Security in Context with Big Data
While companies always list security of data as one of the most important issues they need to manage, they are often unprepared for the complexities involved in managing data that is highly distributed and highly complex. In the early stages of big data analytics, the analyst will not secure the data, because only a small portion of that data will be saved for further analysis. However, when an analyst selects an amount of data that will be brought into the company, the data has to be secured against internal and external risk.

Some of this data will have private information that must be masked so that no one without authorization has access. For security to be effective in the context of big data, you need to have a well-defined plan.

Plan a Data Governance Strategy
Information governance is the ability to create an information resource that can be trusted by employees, partners, and customers. A governance strategy is the joint responsibility of IT and the business. It is key that concrete rules exist that dictate how big data will be governed. For example, rules exist that determine how data must be protected depending on the circumstance and governmental requirements. Healthcare data must be stored so that the identity and personal data remain private.
Financial markets have their own set of data governance requirements that have to be adhered to. Problems can develop when an analyst collects and analyzes huge volumes of information
and does not remember to implement the right governance to protect that data. In addition, data sources themselves may be proprietary. When these sources are used within an organization, restrictions may exist on how much data is used and for what purposes. Accountability for managing data in the right way is the heart of a good data governance strategy.

Plan for Data Stewardship
It is easy to fall into the trap of assuming that the results of data analytics are correct. Management likes numbers and likes to make decisions based on what the numbers say. But hazards can occur if the data isn’t managed in the right way. For example, you might be using data from five or six different data sources. In a situation where a company is determining which customers are
potentially the best targets for a new product offering, a company might want to analyze 10 or 15 different sources of data to come up with the results.
Do you have common metadata across these data sources? If not, is a process in place to vet the viability of that source to make sure that it is accurate and usable? Using data sources that are based on different metadata and different assumptions can send a company off on the wrong direction. So, be careful and make sure that when you collect data that might be meaningful that it can execute in a way that helps the company make the most informed and accurate decisions. This also means understanding how to integrate these new data sources with historical data systems, such as the data warehouse.

Continually Test Your Assumptions
You will begin to find that making use of new data sources and massive amounts of data that could never be processed in the past can help make your company much better at anticipating the future. You will be able to determine the best actions to take in near real time based on what your data tells you about a customer or a decision you need to make.
Even if you have all the processes in place to ensure that you have the right controls and the right metadata defined, it is still important to test continuously. What types of outcomes are you getting from your analysis? Do the results seem accurate? If you are getting results that seem hard to believe, it is important to evaluate outcomes.
After you have more accurate data, you will be able to achieve better and more accurate outcomes. However, in some cases, you may see a problem that wasn’t apparent. Therefore, don’t just assume that the data is always right. Test your assumptions and what you know about your business.

Study Best Practices and Leverage Patterns
As the big data market matures, companies will gain more experience with best practices or techniques that are successful in getting the right results. You can access best practices in several different ways. You can meet with peers who are investigating the ways to leverage big data to gain business results. You can also look to vendors and systems integrators who have codified
best practices into patterns that are available to customers. It is always better to find ways to learn from others rather than to repeat a mistake that someone else made and learned from. As the big data market begins to mature, you will be able to leverage many more codified best practices to
make your strategy and execution plan more successful.

Bu blogdaki popüler yayınlar

Cloud Computing Reference Architecture: An Overview

The Conceptual Reference Model Figure 1 presents an overview of the NIST cloud computing reference architecture, which identifies the major actors, their activities and functions in cloud computing. The diagram depicts a generic high-level architecture and is intended to facilitate the understanding of the requirements, uses, characteristics and standards of cloud computing. As shown in Figure 1, the NIST cloud computing reference architecture defines five major actors: cloud consumer, cloud provider, cloud carrier, cloud auditor and cloud broker. Each actor is an entity (a person or an organization) that participates in a transaction or process and/or performs tasks in cloud computing. Table 1 briefly lists the actors defined in the NIST cloud computing reference architecture. The general activities of the actors are discussed in the remainder of this section, while the details of the architectural elements are discussed in Section 3. Figure 2 illustrates the intera

Cloud Architecture

The cloud providers actually have the physical data centers to provide virtualized services to their users through Internet. The cloud providers often provide separation between application and data. This scenario is shown in the Figure 2. The underlying physical machines are generally organized in grids and they are usually geographically distributed. Virtualization plays an important role in the cloud scenario. The data center hosts provide the physical hardware on which virtual machines resides. User potentially can use any OS supported by the virtual machines used.  Operating systems are designed for specific hardware and software. It results in the lack of portability of operating system and software from one machine to another machine which uses different instruction set architecture. The concept of virtual machine solves this problem by acting as an interface between the hardware and the operating system called as system VMs . Another category of virtual machine is called

CLOUD COMPUTING – An Overview

Resource sharing in a pure plug and play model that dramatically simplifies infrastructure planning is the promise of „cloud computing‟. The two key advantages of this model are easeof-use and cost-effectiveness. Though there remain questions on aspects such as security and vendor lock-in, the benefits this model offers are many. This paper explores some of the basics of cloud computing with the aim of introducing aspects such as: Realities and risks of the model  Components in the model  Characteristics and Usage of the model  The paper aims to provide a means of understanding the model and exploring options available for complementing your technology and infrastructure needs. An Overview Cloud computing is a computing paradigm, where a large pool of systems are connected in private or public networks, to provide dynamically scalable infrastructure for application, data and file storage. With the advent of this technology, the cost of computation, application hosting, c