Business Analytics with In-Memory Databases

Written by Srikanth Chintamaneni on January 24, 2012. Posted in Articles.

Abstract

The Business intelligence (BI) and Data Warehouse vendors are increasingly turning to in-memory technology in place of traditional disk-based storage to speed up implementations and extend self-service capabilities.

For years, it has been noticed that the process of creating customer data queries and building business intelligence reports has been a prolonged activity. This is because the information needed must be pulled from operational systems and then controlled in separate analytical data warehouse systems that can accept the queries. Now, however, with the advent of true ‘in-memory analytics’, a technology that will allow operational data to be held in a single database that can handle all the day-to-day customer transactions and updates, as well as analytical requests – in virtually real time.

Starting Questions

Successful Business Analytics project implementations start by asking the right questions. Here are a few that should be on your short list.

· How do I manage and maintain the performance of my existing reports with the ever increasing data?

· What is the cost effective alternative to data warehouses that provide the ability to analyze very large data sets, but is much simpler to set up and administer?

· What can I do today to support near-real time reporting requirements and not relying heavily on IT departments?

· How can I demonstrate value to my company to extend real-time ad-hoc query capabilities for high volume transaction functionalities such as Financial Services?

· How do I minimize the administration overhead and yet provide a transparent reporting environment to end user?

The purpose of this article is to put both BI technologies in perspective, in-memory and disk-based, explain the differences between them, and finally explain, in simple terms, why disk-based BI technology is not on its way to extinction. Rather, explain the requisites for considering an in-memory database BI solution.

But before we get to that, let us understand the differences between disk-based and in-memory databases.

Disk-based and In-memory Databases

Database irrespective of disk-based or in-memory, we are talking about where the data resides while it is actively being queried by an application: with disk-based databases, the data is queried while stored on disk and with in-memory databases; the data being queried is first loaded into RAM (Random Access Memory).

Disk-based databases are engineered to efficiently query data residing on the hard drive. At a very basic level, these databases assume that the entire data cannot fit inside the relatively small amount of RAM available and therefore must have very efficient disk reads in order for queries to be returned within a reasonable time frame. On the other hand, in-memory databases work under the opposite assumption that the data can fit entirely inside the RAM. The engineers of in-memory databases benefit from utilizing the fastest storage system a computer has (RAM), but have much less of it at their disposal.

The fundamental trade-off with disk-based and in-memory technologies is faster reads and limited amounts of data versus slower reads and practically unlimited amounts of data. These are two critical considerations for business intelligence applications, as it is important both to have fast query response times and to have access to as much data as possible.

Fast analysis, better insight and rapid deployment with minimal IT involvement!

What is it?

As the name suggests, the key difference between conventional BI tools and in-memory products is that the former query data on disk while the later query data in random access memory (RAM). When a user runs a query against a typical data warehouse, the query normally goes to a database that reads the information from multiple tables stored on a server’s hard disk. With a server-based inmemory database, all information is initially loaded into memory. Users then query and interact with the data loaded into the machine’s memory.

BI with In-memory databases may sound like caching, a common approach to speeding query performance, but inmemory databases do not suffer from the same limitations. Caches are typically subsets of data, stored on and retrieved from disk (though some may load into RAM). The key difference is that the cached data is usually predefined and very specific, often to an individual query; but with an inmemory database, the data available for analysis is potentially as large as an entire data mart.

In-memory database is designed specifically to take advantage of the immense amount of addressable memory now available with the latest 64-bit operating systems. In-memory technology uses the multi-gigabytes of memory space available in 64-bit servers as for its data store. In-memory analysis is designed to improve the overall performance of a BI system as perceived by users, especially affecting complex queries that take a long time to process in the database or when accessing a very large database where all queries are hampered by the database size. With in-memory database, it allows data to be analyzed at both an aggregate and a detailed level without the time-consuming and costly step of developing ETL processes and data warehouses or building multidimensional OLAP cubes. Since data is kept in-memory, the response time of any calculation is lightning fast, even on extremely large data sets analyzed by multiple concurrent users.

This kind of immediate, interactive analysis is particularly important when people are trying to discover unknown patterns or learning new opportunities.

Who is it for?	Know your challenges	Finding the right mix
When selecting an in-memory solution consider one that operates seamlessly within an end-to-end BI platform where its usage is completely transparent to users and report developers Ideal for setting up departmental BI applications and for meeting the BI needs of small to medium sized businesses as it requires very little up-front effort, and no ETL Populated quickly from any database source, users can seamlessly use in-memory databases and associated meta-data layers as a source for many reports, dashboards, and analysis Look for technology that has been designed to avoid the excessive administrative burdens and can scale to enterprise levels in terms of user number, data security and data governance The leading benefits of Business analytics with in-memory databases are to deliver decision insight with the agility that businesses demand. It is a win for business users, who gain self-service analysis capabilities, and for IT departments, which can spend far less time on query analysis, cube building, aggregate table design, and other time- consuming performance-tuning tasks	Regardless of what fancy algorithm is used with an in-memory database, storing the entire dataset in RAM has a serious implication: the amount of data one can query with this technology is limited by the amount of free RAM available, and there will always be much less available RAM than available disk space Limited memory space means that the quality and effectiveness of the BI application will be hindered: the more historical data to which we have access and/or the more fields we can query, the better analysis, insight and, well, intelligence one can get to One could add more and more RAM, but then the required hardware becomes exponentially more expensive. Beyond 64GB, we can no longer use what is categorized as a personal computer but will require a full-blown server which brings us into very expensive computing territory Note that the amount of RAM required is dependent on the number of people simultaneously querying it. Having 5-10 people using the same in-memory BI application could easily double the amount of RAM required for intermediate calculations that need to be performed to generate the query results.	A key success factor in most BI solutions is having a large number of users, so we need to tread carefully when considering in-memory technology for real-world BI. Otherwise, the hardware costs may spiral beyond what the organization is willing or able to spend Some of these databases introduce additional optimizations which further improve performance. Most of them also employ compression techniques to represent even more data in the same amount of RAM The future of BI lies in technologies that leverage the respective benefits of both disk-based and in-memory technologies to deliver fast query responses and extensive multi-user access without huge hardware requirements. These types of technologies are not theoretical anymore and are already utilized by businesses worldwide. Some are designed to distribute different portions of complex queries across multiple cheaper computers (this is a good option for cloud-based BI systems) and some are designed to take advantage of 21st-century hardware (multi-core architectures, upgraded CPU cache sizes, etc.) to extract more juice from off-the-shelf computers

Summary

Business Analytics with in-memory database provides companies with a faster, more flexible, and arguably lower-cost way of accessing and processing information allowing users to get answers to business questions in seconds rather than hours. By virtue of its high performance architecture in-memory has the potential to help midsize organizations become more informed, agile and respond quicker to changing market conditions.

In addition, advances in technology and lower costs of memory and CPU make this type of technology more attractive than ever before. Matching the appropriate architectural approach with the kind of business analytics solutions needed by a midsize company has the potential to deliver benefits such as reduced time to insight, greater agility, increased self-service and lower overall IT demands.

References:

Open source In-memory Analytics – YellowFin
Extinction of traditional Business Intelligence: Elasticube Chronicles
In-Memory Data Management by Plattner/Zeier

Don’t forget to leave your comments below.

Srikanth Chintamaneni is a manager in the Information Management service line of Deloitte Consulting India Pvt. Ltd. He has over 13 years of experience in providing consulting services involving data warehouse and content management solutions in the Health care, Commercial & Consumer Finance, and Industrial Products industry segments. His capabilities support services involving data profiling, data modeling, report design, and end-to-end data warehouse implementations.