美文网首页
数据仓库_数据粒度_数据分区

数据仓库_数据粒度_数据分区

作者: 瞧瞧秘密不说话 | 来源:发表于2017-01-11 19:22 被阅读0次

Granularity

Granularity refers to the level of detail or summarization of the units of data in the data warehouse.

The more detail there is, the lower the level of granularity. The less detail there is, the higher the level of granularity.

Granularity is the single most critical design issue in the data warehouse environment because it profoundly affects the volume of data that resides in the data warehouse and the type of query that can be answered.

The volume of data in a warehouse is traded off against the level of detail of a query.

In almost all cases, data comes into the data warehouse at too high a level of granularity. This means that the developer must spend a lot of design and development resources breaking the data apart before it can be stored in the data warehouse. Occasionally, though,data enters the warehouse at too low a level of granularity.

The Benefits of Granularity

The granular data found in the data warehouse is the key to reusability, because it can be used by many people in different ways.

Looking at the data in different ways is only one advantage of having a solid foundation. A related benefit is the ability to reconcile data, if needed.

Another related benefit of a low level of granularity is flexibility

Another benefit of granular data is that it contains a history of activities and events across the corporation.

Determining the level of granularity is the most important design issue in the data warehouse environment

An Example of Granularity

There is, then, a very good case for the compaction of data in a data warehouse. When data is compacted, significant savings can be realized in the amount of DASD used, the number of index entries required, and the processor resources required to manipulate data.

Put another way, with a very low level of granularity, you can answer practically any query. But a high level of granularity limits the number of questions that the data can handle.

Another consideration in designing granularity is determining which architectural entities will feed off the data warehouse.Each DSS architectural entity has its own unique considerations. The data warehouse must be designed to

feed the lowest level of granularity needed by any architectural entity.

Dual Levels of Granularity

Most of the time, there is a great need for efficiency in storing and accessing data, and for the ability to analyze data in great detail.

The data warehouse in this example contains two types of data—lightly summarized data and “true archival” detail data.

Light summarization data is detailed data that has been summarized only to a very small extent. For example, phone call information may be summarized by the hour. Or, bank checking information may be summarized by the day.

There is a significantly less volume of data in the lightly summarized database than there is in the detailed database.Of course, there is a limit to the level of detail that can be accessed in the lightly summarized database.

At the true archival level of data, all the detail coming from the operational environment is stored. There is truly a multitude of data at this level. For that reason, it makes sense to store the data on a medium such as magnetic tape or another bulk storage medium because the volume of data is so large.

If a pattern of searching the true archival level of data develops over time, the designer may want to create some new fields of data at the lightly summarized level, so that most of the processing can occur there.

Living Sample Database

The greatest asset of a living sample database is that it is very efficient to

access. Because its size is a fraction of the larger database from which it was

derived, it is correspondingly much more efficient to access and analyze.

If very high degrees of accuracy are desired, a useful technique is to formulate the request and go through the iterative processing on the living sample database. In doing so, the DSS analyst quickly formulates the request. Then, after several iterations of analysis have been done, when the request is understood, it is run one final time against the large database.

Partitioning as a Design Approach

A second major design issue of data in the warehouse (after granularity) is partitioning

Partitioning of data refers to the break up of data into separate physical units that can be handled independently.

It is often said that if both granularity and partitioning are done properly, then almost all other aspects of the data warehouse design and implementation come easily.

Data is partitioned when data of a like structure is divided into more than one physical unit of data. In addition, any given unit of data belongs to one and only one partition.

The choices for partitioning data are strictly up to the developer. In the data warehouse environment, however, it is almost mandatory that one of the criteria for partitioning be by date.

As a rule, it makes sense to partition data warehouse data at the application level.

The acid test for the partitioning of data is to ask the question, “Can an index be added to a partition with no discernible interruption to other operations?” If an index can be added at will, then the partition is fine enough. If an index cannot be added easily, then the partition needs to be broken down more finely.

相关文章

  • 数据仓库_数据粒度_数据分区

    Granularity Granularityrefers to the level of detail or s...

  • 数据仓库之数据粒度

    确定数据仓库中数据的恰当粒度是数据仓库开发者需要面对的一个最重要的设计问题。数据粒度主要针对指标数据的计算范围,如...

  • 大数据经典学习路线(及供参考)之 二

    2.1 数据仓库增强 2.1.1 数据仓库及数据模型入门 什么是数据仓库、数据仓库的意义、数据仓库核心概念、数据仓...

  • 数据中心建设----数据仓库中粒度的确定

    前面已经讲到了数据仓库中的粒度有关概念,但是当我们在具体的实时过程中应该怎样去确定在建数据的粒度呢,书中也给出了一...

  • 数据中心建设--数据仓库中粒度的确定

    前面已经讲到了数据仓库中的粒度有关概念,但是当我们在具体的实时过程中应该怎样去确定在建数据的粒度呢,书中也给出了一...

  • 数据仓库技术

    数据仓库基础 数据仓库的价值 数据仓库的源数据类型 数据仓库的基本架构 数据仓库的多维数据模型 数据立方体与OLA...

  • 数据仓库笔记

    数据仓库的作用 数据仓库的特点 数据仓库中的数据是面向主题的 数据仓库中的数据是集成的 数据仓库中的数据是不可更新...

  • 数据仓库

    目录一. 什么是数据仓库二. 数据仓库能干什么?三. 数据仓库的特点四. 数据仓库发展历程五. 数据库与数据仓库的...

  • 数仓相关文章索引(1)

    基本常识 数据仓库的源数据类型 数据仓库的多维数据模型 BI数据仓库数据分层 即席查询 算法架构 浅谈数据仓库的基...

  • 实训总结20170923

    数据仓库概述 什么是数据仓库? 创始人W.H.Inmon在《建立数据仓库》一书中对数据仓库的定义是:数据仓库就是面...

网友评论

      本文标题:数据仓库_数据粒度_数据分区

      本文链接:https://www.haomeiwen.com/subject/itbrbttx.html