263-3010-00: Big Data
Section 2
Lessons Learned from the Past
Swiss Federal Institute of Technology Zurich
Eidgenössische Technische Hochschule Zürich
Last Edit Date: 09/24/2024
Disclaimer and Term of Use:
We do not guarantee the accuracy and completeness of the summary content. Some of the course material may not be included, and some of the content in the summary may not be correct. You should use this file properly and legally. We are not responsible for any results from using this file
This personal note is adapted from Professor Ghislain Fourny. Please contact us to delete this file if you think your rights have been violated.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Data Independence¶
Data independence refers to the separation of the logical view of data from its physical storage. Initially, it was proposed that the most intuitive logical view is in the form of tables, a format that is easily understood and has been used for thousands of years.
A relational database management system (RDBMS) provides a logical model for manipulating data, separate from the physical storage layer. In the 1970s, this physical layer was a single computer's hard drive, with storage formats irrelevant to the user. As the RDBMS and hardware are updated, the user’s interaction remains unchanged, while performance may improve automatically. Logical layer updates focus on long-term compatibility and functionality.
A datatbase management system stack can be viewed as four-layer stack:
A logical query language with which the user can query data
A logical model for the data
A physical compute layer that processes the query on an instance of the model
A physical storage layer where the data is physically stored
Relational Database Management Systems¶
Main concepts¶
Relational database management systems (RDBMS) are based on a tabular data format. The core of the relational model is thus the concept of table. The following summarize the elements in this model:
Element | Description | Example |
---|---|---|
Table | A collection of records. For example, in an employee table, each record represents a person, and in a products table, each record represents a product. Therefore, in some models that generalize tables, the term "collection" is used. | |
Attribute | A property that records can have. For example, if a record is an employee, an attribute could be their last name or city of residence. Other common synonyms for "attribute" include column, field, property, and key. | |
Row | A record in a collection. A row links properties with the values relevant to the record it represents. Common synonyms for "row" include record, entity, document, item, and the more formal term "business object," which is popular among MBA graduates. | |
Primary key | A particular attribute or set of attributes that uniquely identify a record in its table. For example, a social security number (AHV in Switzerland) can identify a person, or a code (HG, CAB) can identify a building at ETH. | |
Value | A specific part of a row. For example, the first name (a string) of a student in a table. |
Relational Properties¶
Why relational table are called relation tables?
A mathematical relation over several domains is defined as a subset of the Cartesian product of the domains. Each element in a mathematical relation, that is, a record in the table, is thus a tuple made of values picked in each domain.
In general, we can say that a relational table is made of:
A set of attributes (schema)
A set / bag / list of tuples (extension)
Mathematical representation
JSON representation
Table representation
Relational integrity¶
A collection $T$ fulfills relational integrity if all its records have identical support:
$$\forall t, u \in T, \; \text{support}(t) = \text{support}(u)$$
If a collection, in particular a table, has relational integrity, then this common support is a property of the table and contains the attributes of the table $T$: $Attributes_{T}$.
The extension of the table, sometimes denoted $Extension_{T}$ when used together with $Attributes_{T}$, is its actual content, which is $T$ itself. We use $T$ or $Extention_{T}$ interchangeably depending ono the context.
The following collection is an example that respects relational integrity: