跳至正文

3-2 Foundation:Key Concepts about Tables

Aggregation is a crucial step in business analysis, as we explained in article 3-1. In general, data aggregation is the act of calculating and summarizing many rows of data into a single row for each grouping basis.In practice, aggregations can occur at all stages of analysis and have many types.To help novices better understand the aggregation process of cases introduced in later articles, this article will thoroughly introduce fundamental concepts about tables.These concepts will assist analysts in reaching advanced “multidimensional analysis”.

1 、Start and End of Aggregation:Detail and Aggegate Table

In analysis,the “Detail Table” stores unaggregated raw data, and its function is operational recording-keeping. [Kimball & Ross,The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, Third Edition.] “Aggregate Table”,by contrast, is used to solve business problems and assist decision making.In Tableau, we often represent it using cross table, while in Excel, we use the pivot table.

The “detail table” and “aggregate table” are separately the start and end points of aggregation. These two kinds of table correspond to different business scenarios and have their own normative characteristics.

1.1 Detail Table

“Detail table” is a direct reflection of business.Each row of “Detail table” represents a complete business record,while eachfield corresponds to a column of data that delivers the same business information.

Some people might be familiar with the 5W2H analytical technique. This approch employs seven queries (What? Why? Where?When?Who? How? How much?) to gather information. The “detail table” stores the business data in a similar manner.

A sales table, for instance, displays a company’s history of transaction.It captures crucial elements in business processes,so that analysts can identify what happened in a transaction like ” when and where,what does who provide to whom” and “how much a transaction is” . All fields in the “detail table” are derived from the 5W2H elements and their attributes or extensions. [In the business system, information of a business process will be split into multiple tables in order to meet the database’s ACID principle. This article does not consider the impact of table splitting .]

Typically, data in the “detail table” is stored in a two-dimensional structure( or relational structure),consisting of columns and rows. The combination of fields that can identify a particular row in a table is the primary key, which represents the level of detail the table can reach[Let’s call it the LOD of the table].As shown in Figure 3-2-1, the primary key of the sales table could be the combination of Order ID * Product ID.

Figure 3-2-1 Sales Records-A Sample of “Detail Table”

1.2 Aggregate Table

The “aggregate table,” which is dynamic and temporary compared with the “detail table,” is created during business analysis.It is concerned with clear expression of viewpoints rather than standardized data structure.The “aggregate table” appears in several forms. In addition to the common cross table, we can also display the aggregate results with diversified visual charts, as shown in Figure 3-2-2.

Figure 3-2-2 “Aggregate Table”with Its Diversified Forms

The “aggregate table” and the corresponding business problem share the same level of detail(LOD), which is represented by the dimension fields. For instance, the LOD of the “aggregate table” for the problem “sales of each category” is “category,” and for the problem “Sales of each customer cohort and each year” ,it is “order year * customer cohort”.

In complex scenarios, the address of a business problem relies on multiple aggregate tables, and these aggregate tables are derived from multiple detail tables. In this case ,the data warehouses gradually develop to improve the efficiency of query, and maintain the stability of the business system.The data warehouse builds numerous intermediate aggregate tables, which act as temporary detail tables and play a crucial role in linking fundamental business data with sophisticated analysis problems.

2 、Technical Concepts :Physical and Logical Table

From a technical perspective, the “detail table” and “aggregate table”, can also be called the “physical table” and “logical table”.

Aggregate table exists because of analysis problems, so it is subjective, dynamic and transient. The aggregate result will change with the modification of filter conditions, and disappear with the end of user access. This kind of temporary table, which depends on the existence of subjective problems, is the “logical table”.

By contrast, “detail table” is the mapping and recording of enterprise business processes, so detail tables are objective and stable in structure. With the continuous operation of the company, new records will be added permanently stored. Taking the retail industry as an example, every new order generates a new row in the transaction table and will not change thereafter;if an order return occurs, new data will be generated in the return table rather than change the transaction table. This kind of static table, which has a relatively stable structure, is the “physical table”.

Like the relationship between the “detail table” and the “aggregate table”, the logical tables are abstractions of physical tables. In visuallization, “visual graph” is only the special presentation form of “aggregation table”, so it can be regarded as a special “logical table”; The “detail table” that visual charts depend on is the “physical table”.

As illustrated in Figure 3-2-3, there is a correspondence between the tables in the technology and analysis worlds and the behaviors of the business world.Understanding this framework is helpful to better understand data modeling and advanced calculation.

Figure 3-2-3 The Classification of Tables

3、 Fields in Tables

Physical tables will automatically contain the key components of business processes since they are the mapping of the business process. Simple abstractions of [orders] and [products], like [order ID] and [product ID], are constructed along with data recording.They will also be recorded on the physical table even though they do not directly exist in the business process.

Other fields, such as [profit rate] and [discount rate], are different from the above fields. The aggregation results of these fields are not directly calculated by themselves, but based on the recalculation of other fields. [eg. The total profit rate = the sum of profit/ the sum of sales rather than the sum of profit rates for each order] Therefore, these fields are advanced abstractions and should be stored in logical tables.

The following figure illustrates the fields recorded in different kinds of tables.

Figure 3-2-4 Fields in Tables

You can see from the above figure that certain paradigms are followed by both logical and physical tables in order to guarantee data consistency. For instance, elements that indicate attributes frequently take the form of characters, whereas fields that indicate measurements usually take the form of numbers.In fact, in order to maximize storage and enhance data accuracy, analytical tools have divided data into several categories from the outset of design.

In Tableau,we have at least 5 important data types.

  • Date and Date& time

Unlike strings such as “customer ID”, Date and Date& time have the characteristics of “continuity”, that is, they follow a logical sequence nuturally.

  • Boolean

Boolean data is a yes-or-no judgment.Because computers are binary computing devices, making effective use of this classification will increase the computational efficiency of analysis. [Since Tableau 2020.3, fields with multiple values cannot be changed to “Boolean” fields.]

  • Strings

Other character type data are collectively referred to as “strings”, such as [company name], [product ID], etc.

  • Number (integer)

Numbers without decimals like “190, 8, 35” are in this type.

  • Number (decimal)

Like “12.09, 5.21, 7.07”.

Numbers can perform arithmetic operations such as addition, subtraction, multiplication, and division, which is the foundation of aggregation. Numbers and dates,which have their own sequence (continuity),can build the coordinate axis.

Remember: Data types are not necessarily related to field classifications of dimensions and measures.

Data types such as string and number are determined at the database design stage. They objectively exist before data query and problem analysis. Although it can be adjusted as needed later, its objective attributes have not been changed.

Dimension and measure are relative to problems and aggregate tables. A field may be used as a dimension in one problem and a measure in another. Therefore, the classification of dimensions and measures is subjective.

.


了解 喜乐君 的更多信息

订阅后即可通过电子邮件收到最新文章。

了解 喜乐君 的更多信息

立即订阅以继续阅读并访问完整档案。

Continue reading