Small Thinking About Big Data

It is time to end small thinking about big data. Instead of thinking about how to apply the insights of big data to business problems, we often hear more tactical questions, such as how to store large amounts of data or analyze it in new ways. This thinking is small because it focuses on technology and new forms of data in an isolated and abstract way.


  • Big data is really just “data.” What’s the best way to handle all our data?
  • Big data is one piece of a larger puzzle. How can we effectively combine it with existing analytics to yield the greatest impact?
  • Big data needs to enhance business operations. How can we use big data to create better products and services?

Big data doesn’t mean that we must hit the reset button. We still need to harvest information from enterprise applications and construct a comprehensive structured model of our business. We need to securely manage information as an asset. We need to control access to data to protect privacy and comply with regulations. And we need to enable everyone to explore as much data as possible. Big data doesn’t mean we flip the off switch on all past business intelligence (BI) activities. It means that we understand how to

We must remember that big data isn’t about technology; it is a movement, a mind-set, that’s ingrained in an organization. How can we channel the energy that surrounds big data into a cultural transformation? Movements don’t succeed without a compelling vision of the future. The goal should be to create a data culture, to build on what we’ve done in the past, to get everyone involved with data, and to derive more value and analytics from all the data to make business decisions. This is the real victory to do a better job with everything we have by adding new capabilities

Starting a big data movement involves challenges:

  • Transforming company culture to be data-driven and compete on analytics
  • Discovering nuggets of information about customers, products, and performance across systems and data formats (ERP, legacy systems, web logs, email, voice, text, social media, and more)
  • Making data and analytics accessible to as many people as possible

The right vision for each company will differ, but for most companies a movement should be characterized by:

  • Using business questions, not technology capabilities, to drive the architecture
  • Increasing self-service access to data to encourage data-driven decisions
  • Enabling fast, iterative discovery that allows analytical teams to “swim” in the data and see what signals or trends emerge


A Guide to REST and API Design

In his 1966 book “The Psychology of Science,” American psychologist Abraham Maslow tackled the idea that those in the field of psychology needed to approach treatment from multiple perspectives, to take on new ideas, and not just continue using the same theories and techniques created by Freud and his followers so many years ago. Acknowledging that changing your point of view can be difficult, Maslow wrote It is tempting, if the only tool you have is a hammer, to treat everything like a nail.” We have all had this experience. We get so used to the way things have been done in the past, we sometimes don’t question the reasons for doing them.

It may seem curious to refer to psychology in a work on REST and API Design, but it works to illustrate two distinctive points:

(1) that all design decisions, regardless of whether they pertain to software or architecture, should be made within the context of functional, behavioral, and social requirements—not random trends; (2) when you only know how to do one thing well, everything tends to look identical.

“If all you have is a hammer, then everything looks like a nail.”

—Abraham Maslow, The Psychology of Science


Don’t Let the Cloud Obscure IT Transparency

Business leaders look to the CIO to be the broker of cloud and other IT service providers. But this presents new challenges to all IT leaders, even those who have already achieved IT transparency across their portfolio of products and services. Instead of taking a step in the wrong direction in this new cloud era, proven IT financial management strategies, if used correctly, will empower IT leaders to succeed amidst these new challenges. This technology dossier examines the challenges of cloud transparency and sets out a roadmap for bringing performance and financial manage-ment practices to shared services organizations.

The CIO role has changed. CIO’s “State of the CIO” survey found 72 percent expect that with-in the next three to five years they will become more focused on business strategy, compared to today, where 52 percent are focused on transfor-mation and 22 percent are focused on functional issues. Obviously, this new business strategy fo-cus will include cloud. CIOs must adopt method-ologies to provide transparency and in order to succeed in this new cloud era. But it will be diffi-cult to make that transition if IT is unable to prove that it is getting the best value for IT expendi-tures in a way that allows more resources to be spent to improve efficiency and drive innovation. The ability to efficiently manage and opti-mize cloud services is a critical aspect of man-aging IT like a business. ITFM transparency ini-tiatives can demonstrate how IT is improving its collaboration with financial management and business consumers.

Automating costing, budgeting and forecast-ing, consumption reporting and demand man-agement capabilities is essential to analyzing and managing IT costs and the value of services deliv-ered. But that may be even more difficult due to differences in the way providers account for their cloud services delivery.


“If a company is using Amazon Web Services, Microsoft Azure and IBM Cloud Services, each one provides billing data differently, so it’s difficult to get a holistic view across all vendors,” said Bob Svec, SVP of ComSci and PowerSteering product lines at Upland Software. “ComSci can normalize the data across all the cloud providers and deliver various views across all vendors and lines of busi-ness. This also enhances an enterprise’s contract negotiations as they can leverage information from all vendors.”


Agility Simplicity & Modernization

CIOs today have more responsibility than ever for fulfilling the business objectives of their organizations. Among the strongest shared business objectives for IT and business leaders:

  • Increasing revenue growth
  • Reducing operating costs
  • Driving Productivity improvements

1. Greater agility, standardization and modernization within IT are key facilitators to achieving these business goals. Key Technologies are being adopted to achieve these goals–not only in the cloud and big data sets but also the implementation of the next-generation data management platforms running on interoperable and reliable operating systems and hardware.

The combined solution of SAP HANA running on Red Hat® Enterprise Linux® for SAP HANA® on Dell platforms addresses these business and technology drivers in a highly integrated, tested and high-performing solution.

Key Enablers to Achieving Business Goals

New technologies in and of themselves are important only to the extent that they produce positive business outcomes. For example, showing how in-memory technology can speed up analysis of marketing campaign results to get better insights into how advertising budgets should be realigned and distributed is a relevant business outcome aided by technology. When contemplating new technologies—and their potential value to the enterprise—it’s important to map them to these common enterprise IT goals: agility, standardization and modernization


Cloud Migration

Moving to the cloud can be cost-effective. At the same time, the challenges and risks of doing so are significant and, in many cases, are preventing companies from being as agile as they would like in terms of moving to the cloud. Data migration is the top challenge with application migration not far behind.

It has always been the case that changes to production are risky, even with applications hosted on premise. On average, approximately 60% of production-related performance issues and outages are related to changes in either hardware or software. Cloud carries even higher risk, particularly when IT organizations lack the visibility into cloud systems necessary for pre-planning and decision making.

There are a host of decisions to be made in the migration process, some common to virtually all types of cloud services and others specific to hosting services such as IaaS. IaaS, software as a service (SaaS), and PaaS migrations, for example, all require planning in a variety of key areas, including the following:

  1. Vendor selection
  2. Migration planning (if moving from on-premise to off-premise hosting)
  3. Integrations
  4. Network sizing
  5. Service Level Agreements (SLA) and associated monitoring

IaaS and PaaS consumers have these additional considerations:

  1. Cross-provider service and cost comparisons
  2. Architecture considerations, such as code and data locations
  3. Data privacy
  4. Provisioning of hardware and software
  5. Skills development
  6. Performance/availability monitoring

Data privacy and migration support are the top factors IT professionals cite as important to making vendor selections (see Figure 2). The migration process is considered to be so risky, in fact, that some consumers have opted to go the pricier service provider route since expertise in supporting data and application migrations is a key value proposition of service providers.

However, achieving the cost savings of a true “cloud” IaaS, such as AWS or Azure, requires companies to develop the expertise necessary to take a hands-on approach, which requires clear visibility into both the existing systems and the new cloud service. Such visibility is delivered by “cloud-ready” enterprise management tools, such as New Relic.

The Data Warehouse

When it comes to managing today’s data and how it is used, current data warehousing solutions simply can’t keep up. Based on assumptions and technologies from decades ago, conventional data warehouses are ill-equipped to bring together all of the data you need to analyze and to support all of the different ways in which you need to use that data.

Data Has Changed

It used to be the case that most of the data you wanted to analyze came from sources in your data center: transactional systems, enterprise resource planning (ERP) applications, customer relationship management (CRM) applications, and the like. The structure, volume, and rate of the data were all fairly predictable and well known. Today a significant and growing share of data—application logs, web applications, mobile devices, and social media—comes from outside your data center, even outside your control. And that’s without emerging new sources such as the Internet of Things. That data is also frequently stored in newer, more flexible data structures such as JSON and Avro. With data volume expected to increase 50-fold in the next decade, demands are increasing on both the systems themselves and on the people who manage and use them.

The Ways Data Is Used Have Changed

At one time, it was sufficient to load updated data once a week or overnight and then generate and publish a report or dashboard every Monday morning. Not today. The value of much of today’s data decays rapidly, making it a requirement to get data into the hands of analysts as quickly and easily as possible so that they can use the data to test hypotheses, create what-if scenarios, correlate trends, and project revenues.

Traditional Data Warehouses Can’t Keep Up

The harsh reality of data warehousing is that conventional solutions are simply too costly, inflexible, and complex for today’s—not to mention tomorrow’s—data. These solutions were designed for managing predictable, slow-moving, and easily categorized data that largely came from internal enterprise applications under your control. They require customers to purchase everything they need for peak demand up front, spending hundreds of thousands of dollars (millions in some cases) just to get started. This all but guarantees that most of the technology will sit underutilized the majority of the time. As one Director of Analytics put it, “We have to buy for the 99th percentile even though we only reach that level one day per year.”


The Operational Data Lake

For years, the ODS has been a steady and reliable data tool. An ODS is often used to offload operational reporting from expensive online transaction processing (OLTP) and data warehouse systems, thereby significantly reducing costs and preserving report performance. Similarly, an ODS can prevent real-time reports from slowing down transactions on OLTP systems. An ODS also facilitates real-time reports that draw data from multiple systems. For example, let’s say you wanted to run a report that showed customer real-time profitability:

  • An ODS would allow you to pull data from your financial system, CRM system, and supply chain system, supporting a comprehensive view of the business.
  • When compared with a data warehouse, an ODS offers a real-time view of operational data. While data warehouses keep historical data, an ODS keeps more recent data.
  • Another important use case for an ODS is supporting the ETL (Extract, Transform, Load) pipeline.

Companies use ODSs as a more cost-effective platform than data warehouses to perform data transformations and ensure data quality, such as data matching, cleansing, de-duping, and aggregation.Big data repositories such as Hadoop are causing organizations to question whether they want to keep all of the ODSs they have put in place over the years. After all, Hadoop promises many of the same benefits. Hadoop is cost-effective because it uses scale-out technology, which enables companies to spread data across commodity servers. An ODS, on the other hand, uses outdated scale-up technology. Scaling an ODS is prohibitively expensive, requiring more and more specialized hardware to get the required performance. Plus, most scale-up technologies become creaky when they exceed a terabyte of data – which is increasingly common.

What if you could upgrade the ODS so you could scale it affordably, while at the same time providing yourself with a platform to experiment with unstructured big data? This is the principle behind the operational data lake, which includes, at its heart, a Hadoop relational database management system (RDBMS).

A data lake is a repository for large quantities of both structured and unstructured data. It allows companies to store data in its native format, while maintaining the integrity of the data and allowing different users to tap into the data in its original form. The operational data lake is the structured part of the data lake, powered by an operational RDBMS built on Hadoop.

Because the operational data lake is built on Hadoop, it offers all the capabilities of Hadoop, enabling organizations to move into big data at their own pace while getting immediate value from operational data stored in Hadoop. Companies may consider adding a Hadoop-based operational data lake as an ODS replacement or, if they have already implemented Hadoop, consider adding a Hadoop RDBMS to their existing data lake. Here are some details about each of these approaches.

ODS Replacement. For organizations with an ODS in place, a hadoop-based operational  data lake allows companies to:

  • Deploy an affordable scale-out architecture
  • Leverage existing SQL expertise and applications
  • Speed up operational reports and analytics by parellezing queries
  • Augment structured data with semi-structured and unstructured stored in Hadoop.

Because operational data stores can be migrated to Hadoop, the operational data lake reduces the cost of using ODS and offers the opportunity to offload workloads from expensive data warehouses. Companies spend millions of dollars on enterprise data analytics and data warehouses. As much as half of the workloads sitting in one of those warehouses can be handled in a more cost-efficient operational data lake

Are Your Capacity Management Processes Fit for the Cloud Era?

In any utility environment (electricity, gas, water, mobile telecoms…) ensuring there is enough resource or capacity available to meet demands is critical in satisfying the needs of the consumer. At the same time, it is important to ensure that there is not an over supply of capacity in a market, as this will ultimately impact profitability and viability of the suppliers’ business model. These rules not only apply to traditional utilities, but also to the provisioning of IT infrastructure used to host applications that support the business.

Virtualization accelerated this concept through its ability to share compute resources amongst multiple application workloads, and the agility it provides to rapidly provision and reconfigure computed resources through software. Industrialization of capacity management processes in virtualized data centers presents a significant opportunity to optimize ongoing CAPEX and OPEX, whilst assuring consistent delivery of application performance. It also ensures greater business agility by reducing the lead-time to stand up new application services. Whilst virtualization is a great enabler to get a better return on investment from virtualized IT infrastructure, many Enterprise IT departments and Service Providers have relatively immature capacity management processes and are not exploiting the latest innovations that would enable them to transform their situation.


Today, many organizations apply very simple principles to determine their requirements for computed capacity in their virtualized data centers. This is typically based on a resource allocation model which takes the total amount of memory and CPU capacity allocated to all virtual machines in a compute cluster and a level of over provisioning (e.g. 2:1, 4:1, 8:1, 12:1 ) is assumed in order to calculate the requirement for physical resources.

The level of over provisioning is often directly related to different tiers of infrastructure and the service levels offered by these. For example, a Platinum Service level may be offered on a Compute Cluster that has a conservative overprovision ratio of 2:1, and a Bronze Service might be offered on a Computed Cluster with a more aggressive overprovision ratio of 12:1. The application owner will make a decision about the tier of infrastructure on which they want their application to run based on a trade off between the level of risk they are willing to assume and the cost. This process of capacity management is typically managed in spreadsheets or in simple databases that do not take into account the actual resource consumption driven by each application workload running in the operational environment.

Spreadsheet planning is often augmented with simple monitoring tools that send alerts to Operations when resources within the virtual infrastructure cross predetermined thresholds, indicating the risk of performance issues. In such circumstances, Operations will be notified of performance risks and mobilize their technical resources to investigate potential issues and devise remediation plans. A capacity management strategy based on resource allocation and over provisioning ratios is further floored because application owners typically over specify the amount of CPU and Memory resources their application is going to need when they request virtual machines. This invariably results in larger virtual machine configurations than are actually required to run the applications reliably, but because this capacity management strategy is based on allocation and not on actual utilization, it inherently corrodes the level of efficiency that can be driven from the underlying infrastructure.

Cloud Security

Cloud computing offers to change the way we use computing with the promise of significant economic and efficiency benefits. The speed of adoption depends on how trust in new cloud models can be established.Trust needs to be achieved, especially when data is stored in new ways and in new locations, including for example different countries.
Key Topics:
• What is different about cloud?
• What are the new security challenges cloud introduces?
• What can be done and what should be considered further?

Cloud computing moves us away from the traditional model, where organizations dedicate computing power to a particular business application, to a flexible model for computing where users access business applications and data in shared environments.
Cloud is a new consumption and delivery model; resources can be rapidly deployed and easily scaled (up and down), with processes, applications and services provisioned ‘on demand’. It can also enable a pay per usage model. In these models the risk profile for data and security changes and is an essential factor in deciding which cloud computing
models are appropriate for an organization.

There are existing security challenges, experienced in other computing environments, and there are new elements which are necessary to consider. The challenges include:
• Governance
• Data
• Architecture
• Applications
• Assurance

Why SSL Certificate is Critical

SSL certificates have been in use for almost 25 years, and they continue to serve a vital role in protecting data as it travels across the Internet and other networks.From online financial transactions to e-commerce to product development, SSL certificates make it possible for users around the world to communicate sensitive information with the confidence that it is safe from malicious hackers.

The Internet has evolved in numerable ways over the past decade and a half, so why do SSL certificates continue to instill trust? Simply put, SSL certificates are very effective in protecting data in transit. In fact, according to some calculations it would take about six thousand trillion years or about a million times longer than Earth has existed to crack a 128-bit encryption on SSL certificates with a brute force attack. Even so, the security industry is ever vigilant and many Certification Authorities have started phasing in 2048-bit encryption on their SSL certificates, further strengthening the protection for online data communications.

Sometimes, the first sign that there is a “lost” SSL certificate is a call from a customer who has noticed an expired certificate and asks if it really is safe to make a purchase at the website.Other times it may be something more serious, like a phishing incident that allows cybercriminals to steal sensitive customer data.Or, a security breach that occurs at a Certification Authority (CA) reverberates through an organization due to its inability to act quickly for lack of visibility into its SSL certificate inventory. Whatever the case may be, losing track of SSL certificate can cause significant financial loss and reputation damage.Fortunately, discovering and managing SSL certificates within the enterprise does not have to be complex or time-consuming.