OLAP: CONCEPTS AND APPLICATIONS TO DATAWAREHOUSING
Information analysis is a vital capability for corporate databases. Fast analysis of data stored in databases or warehouses, is indispensable for businesses that wish to stay ahead in the present competitive market scene. Online analytical processing or OLAP provides real-time information analysis for business decision-making. Besides this, OLAP gives the flexibility of viewing data from several viewpoints, a property referred to as the multidimensionality of data present in existing databases. OLAP can be clearly defined in terms of five buzzwords – Fast Analysis of Shared Multidimensional Information, or FASMI for short. OLAP finds applications particularly for the analyses needs of businesses owning large data warehouses. A data warehouse stores non-operational or analytic data, which OLAP processes to provide comprehensive information resources. It can be used to generate business intelligence, the knowledge of market trends for example, that is an essential ingredient to success for any enterprise. It is the technology that promises growth, leadership and proficiency for a business organization.
Shreyansh Jain Final year
MGM’s College of Engg. And Tech.
1. BUSINESS INFORMATION NEEDS
Company XYZ is an international player in the car-manufacturing arena. It is planning to expand its horizons in global markets, with the launch of a new series of motor vehicles. Since XYZ has market holds in different parts of the world, the promotion and the subsequent launch timings is critical to its success. What it needs is the exact information about the launch period – whether other companies would be joining in the bandwagon – does the buying pattern of the targeted customers support the proposed plans and if not what changes should be brought in the local promotion policies so as to achieve the highest profits. All these questions would remain unanswered without the present state-of-the-art Decision Support systems. With the help of Decision Support systems capable of generating business knowledge, companies like XYZ can easily be leaders in their sport.
Businesses today face similar challenges, particularly owing to the fast shrinking world. Most big businesses today span several nations, and technologies like the Internet play truant to the cause of globalization. In such competitive business environments, the ones that have the best and the most information, almost certainly leads. But merely possessing information in huge quantities is not sufficient for gaining an edge over business rivals. It is like having everything and yet nothing. Analyzing this information into comprehensive, business knowledge is the need of the hour.
The following are some of the information needs for contemporary businesses:
Patterns, relationships and hierarchies in market trends, and
Business knowledge pertaining to their competitors
2. THE NEED FOR MULTIDIMENSIONAL INFORMATION
Information, as it is usually stored in the present-day databases, is in the form of relations. Tables hold data, which can be queried on request and the information, retrieved. This is sufficient as long as one does not need to attach any semantics to this data. Once a person needs to acquire context and meaning of this information, and relationships to other pieces of information, simple querying is of little help. Information, here, must be viewed as slices or dimensions. Viewing information from different points implies that one can model different scenarios based on the same set of information, by just arranging them in different analytical formats.
Multidimensionality allows for assigning relationships between seemingly unrelated information fields. It allows combining information from different sources, and relating them to obtain useful knowledge.
Consider the example of the demand versus pricing data of a particular product. This data represents the sum total of all the demand in different regions of the business. It also aggregates demands over different time periods. However, for a better understanding of the real market demand, one needs to apply the dimensions of time, as well as the specific locations to the available data. This fact brings out the need for multidimensionality of information visualization and access.
Fig.1. Simple 2 Dimensional Data
Fig.2. Multi-Dimensional Data with the Time axis
3. DATA WAREHOUSING
A data warehouse is the cohesive data model that defines the central data repository for an organization. Data warehousing is a technique used basically to separate the analytical data from the operational data in a business organization. Analytical data is that part of the information that is required to be archived for providing the decision support at a later stage of business. Data Warehousing is the collection of such analytical data arranged in a meaningful pattern such that the stress is on faster query execution unlike the Transactional Processing where the stress is on better flow of information. A Data Warehouse may be defined as a subject-oriented repository, designed with enterprise-wide access in mind.
Fig.3. Architecture of a data warehouse
The architecture of a Data warehousing system can be briefly broken up into the following components: -
Operational databases store information required for current transactions in different departments of an organization.
The various (isolated) departments of an organization have specific functions and jobs to perform which require inputs in the form of data and then processing of the data for the information required.
The master database, which stores the analytical information, extracted from the operational databases. The analytical information is that part of the operational databases which is of strategic or managerial importance, at some stage in time.
A data mart is a subset of enterprise-wide data warehouse, which typically supports an enterprise element (department, region, function, etc.). For example, for a data warehouse of a business house a likely classification would be – the customer database, a collection of sales figures, prices of the products etc. Each of these sub-groups acts as mini data warehouses.
They are the end users of the data warehouses that may be working through the data marts, and deriving necessary information for activities like decision-making, strategic planning, or business analysis. These are the groups, which require information to be retrieved or mined from the warehouse. The time required for retrieval is a major issue with most data warehouses.
A data warehouse stores tactical information that merely answers specific questions about past events. It is capable of delivering only that information that is directly related to some known facts or fields. But what if a person wants to simulate the behaviour of a system (such as a market) if certain conditions prevailed? If a person needs to predict or at least observe the trend of some variable when some affecting value is varied, then it is not possible for a conventional data warehouse to produce the information required. This is where OLAP comes into play. OLAP or online-analytical processing is a technology complementary to data warehousing, in the sense that it works on the data stored in a warehouse. Warehouses need OLAP to produce multidimensional views of the information that it stores, and OLAP needs data warehouses for the data it needs to process.
4. OLAP Defined
In contrast to the conventional data warehousing systems, where the data exists in the form of the relations, the OLAP or the On Line Analytical Processing provides a better insight into the untapped potential of data. It supports the multidimensional view of data, which provides manifold increase in the information content of same amount of data. OLAP provides fast, consistent, and efficient access to the various views of information.
OLAP may be defined in terms of just five keywords – Fast Analysis of Shared Multidimensional Information. Fast, such that the most complex queries requiring not more than 5 seconds to be processed. Analysis alludes to the process of analyzing information of all relevant kinds in order to process complex queries and establishing clear criteria for the results of such queries. The information to be used for analysis is generally obtained from a shared source, such as a data warehouse. The information may be related in more than one or two dimensions. For example, a particular set of business data may be related, variously, to sales figures, market trends, consumer buying patterns, supplier conditions and the liquidity of the business. Presented in such a multidimensional detail, such information can be useful and vital to managerial decision-making.
5. OLAP TECHNIQUES
Online analytical processing or OLAP is implemented in many different ways. However, the commonest way is to stage information retrieved from different corporate databases, such as data warehouses, is staged – stored temporarily into OLAP multi-dimensional databases for retrieval by the front-end systems. The multidimensional database is optimized for fast retrieval. Several techniques for the quickening of data retrieval and analysis are used on the procedural side of the database management.
Fig.4. OLAP Implementation
OLAP is basically concerned with multidimensionality of data. Multidimensional models take advantage of inherent relationships in data to populate data in multi-dimensional matrices called Data Cubes or Hypercubes (for more than three dimensions). A standard spreadsheet, signifying a conventional database, is a two-dimensional matrix. One example would a spreadsheet of regional sales by product for a particular time period. Products could be shown as rows with sales revenues for each region comprising the columns. Adding a time dimensional such as the organization’s fiscal quarters would produce a three-dimensional matrix, which could be represented using a cube.
Fig : 5
n figure 4 , there is a three dimensional data cube that organizes product sales data by fiscal quarters and sales regions. Each cell could contain data for specific product, specific fiscal quarter and specific region. By including additional dimensions, a data hyper-cube could be produced, although more than three dimensions cannot be graphically visualized. The data can be queried directly in any combinations of dimensions, bypassing complex database queries.
A multidimensional model of storage involves two types of tables – dimension tables and fact tables. A dimension table simply stores the tuples of attributes of the dimension. A fact table can be thought of as having tuples, one per a recorded fact. This fact contains some measured or observed variable(s), and identifies it with pointers to dimension table. The fact table contains data and the dimensions identify each tuple in that data.
Changing from one dimension hierarchy to another is easily achieved in a data cube by a technique called Pivoting (also called rotation). Several other techniques go into making a complete OLAP implementation. These vary from drilling down into finer details, from different databases, to rolling up and grouping into larger units.
The following techniques are used for OLAP implementation: -
Consolidation or Roll Up
Consolidation involves the aggregation of data. This can involve simple roll-ups or complex grouping involving inter-related data. For example, sales office can be rolled-up to district and district again to regions.
OLAP can go in reverse direction and can display detailed data that comprises consolidated data. This is called drill-down. For example, the sales by individual products or by sales-representatives that make up a region’s total sales, could be easily accessed.
Slicing and dicing
Slicing and dicing refers to the ability to look at a database from various viewpoints. One slice of sales database might show all sales of product type within a region. Another slice might show all sales by sales channel, within each product-type. Slicing and dicing is often performed along the time axis in order to analyze trends and final patterns.
6. OLAP APPLICATIONS
OLAP is widely used in several realms of data management. Some of these applications include: -
Extensive, real-time decision support system for managers
All of the above applications need ability to provide managers with the information they need to make effective decisions about an organization's strategic directions. The key indicator of a successful OLAP application is its ability to provide information, as needed, that is, its ability to provide "just-in-time" information for effective decision-making. This requires more than a base level of detailed data.
7. APPLYING OLAP TO DATA WAREHOUSING
Data warehousing is primarily used to archive data that is of the nature of business knowledge. A primary requirement is the fast analysis of this shared data, resulting in multidimensional views of the data, which in turn results in knowledge acquisition. OLAP comes into play when it is required to analyze data stored in a warehouse, so as to generate complex results, in a time-constrained environment – that is, just-in-time information. Most data warehouses support, what is called Ad hocquerying, which implies that any combination of complex queries can be executed against the stored data.
Several tradeoffs must be made when applying OLAP to warehouses, particularly because of the bulk of data that has to be analyzed. For example, an OLAP system must be able to process a complex query in the shortest possible time. For this, it can use two approaches – to have pre-calculated data, or to apply all calculations on the fly. The first approach requires huge amounts of data storage area, and hence exotic hardware support, which proves to be cost-ineffective. The second approach is to apply calculations, on the fly, which results in slower execution of the queries. Thus, an optimal approach is to pre-calculate some data, and calculate in real-time other parts of the queries. These elements need to be balanced in an optimal way for a good implementation of OLAP systems.
8. BENEFITS OF USING OLAP
OLAP holds several benefits for businesses: -
OLAP helps managers in decision-making through the multidimensional data views that it is capable of providing, thus increasing their productivity.
OLAP applications are self-sufficient owing to the inherent flexibility provided to the organized databases.
In conjunction with data warehousing, OLAP can be used to provide reduction in the application backlog, faster information retrieval and reduction in query drag..
The applications and benefits of online analytical processing (OLAP) indicate clearly that it is one of the revolutionary technologies that can be used extensively in data management. All businesses, big or small, can benefit from using OLAP, at some scale or other. The instant access to information and the apparent knowledge gained from this information is invaluable considering against the cost involved in establishing this setup.