263-3010-00: Big Data - ETH Zürich

Computer Science

Official Description from ETH Zürich:

The key challenge of the information society is to turn data into information, information into knowledge, knowledge into value. This has become increasingly complex. Data comes in larger volumes, diverse shapes, from different sources. Data is more heterogeneous and less structured than forty years ago. Nevertheless, it still needs to be processed fast, with support for complex operations.

Do you want to be able to query your own data productively and efficiently in your future semester projects, master thesis, or PhD thesis? Are you looking for something beyond the Python+Pandas hype? This courses teaches you how to do so as well as the dos and don'ts.

"Big Data" refers to the case when the amount of data is very large (100 GB and more), or when the data is not completely structured (or messy). The Big Data revolution has led to a completely new way to do business, e.g., develop new products and business models, but also to do science -- which is sometimes referred to as data-driven science or the "fourth paradigm".

Unfortunately, the quantity of data produced and available -- now in the Zettabyte range (that's 21 zeros) per year -- keeps growing faster than our ability to process it. Hence, new architectures and approaches for processing it are needed. Harnessing them must involve a deep understanding of data not only in the large, but also in the small.

The field of databases evolves at a fast pace. In order to be prepared, to the extent possible, to the (r)evolutions that will take place in the next few decades, the emphasis of the lecture will be on the paradigms and core design ideas, while today's technologies will serve as supporting illustrations thereof.

After visiting this lecture, you should have gained an overview and understanding of the Big Data landscape, which is the basis on which one can make informed decisions, i.e., pick and orchestrate the relevant technologies together for addressing each one of your projects efficiently and consistently.

Archived Document(s):

263-3010-00 Section 01 - Introduction (open in new window)

263-3010-00 Section 02 - Lessons Learned from the Past (open in new window)

263-3010-00 Section 03 - Cloud Storage (open in new window)

263-3010-00 Section 04 - Distributed File Systems (open in new window)

263-3010-00 Section 05 - Syntax (open in new window)

263-3010-00 Section 06 - Wide Column Stores (open in new window)

263-3010-00 Section 07 - Data Models and Validation (open in new window)

263-3010-00 Section 08 - Massive Parallel Processing (open in new window)

263-3010-00 Section 09 - Resource Management (open in new window)

263-3010-00 Section 10 - Generic Dataflow Management (open in new window)

263-3010-00 Section 11 - Document Stores (open in new window)

263-3010-00 Section 12 - Querying Denormalized Data (open in new window)

263-3010-00 Section 13 - Graph Databases (open in new window)

263-3010-00 Section 14 - OLAP and Data Cubes (open in new window)