Section outline
-
The Big Graph Databases course (ECE_5DA04_TP) presents the new needs raised by the heterogeneity of data, the evolution of a graphical view for connected information, the tools and techniques used to address these needs, as well as the most prominent classes of systems in the area. We will consider graph querying via structured queries, (semi)-structured search, also how reasoning is involved (on semantic graphs) and can be used to infer information. The course will also touch modern data management architectures and systems, such as in-memory databases, cloud databases, and query processing in shared-nothing, Map-Reduce clusters.Instructors:
- Ioana Manolescu, Inria and Ecole Polytechnique ioana.manolescu@inria.fr
- Madhulika Mohanty, Inria and Ecole Polytechnique madhulika.mohanty@inria.fr
- Garima Gaur, Inria and Ecole Polytechnique garima.gaur@inria.fr
-
We will start with an introduction to Big Data and its requirements.
Then, we will recall fundamental features of a Database Management System.
-
We introduce the need for graph databases, notably the insufficiences of relational database management:
- limited support for heterogeneity
- difficult to write path queries
- impossible to query the data together with the schema
- lack of support for data interoperability
- lack of support for knowledge and reasoning.
Then, we cover the two main classes of graph databases available today:
- RDF graph databases;
- Property graph databases.
We focus on the data model and query languages.
-
-
Opened: Thursday, 5 December 2024, 1:30 PMDue: Wednesday, 18 December 2024, 11:55 PM
In this lab, we will have a hands-on session to learn how to set up the Neo4j desktop application, create a graph database by bulk loading, and write Cypher queries ranging from simple queries to complex path queries.
Please submit one file, as described in the Assignment file. Use the specified naming convention, where LASTNAME and FIRSTNAME should consist of ASCII letters only.
(1) a query text file called "LASTNAME_FIRSTNAME.txt".
-
-
We will introduce the main classes of systems for working with Big Data.
Then, we will introduce the Cloud Computing paradigm, before delving into more details on how they can be leveraged to build a database, in particular in the cloud.
-
-
Opened: Thursday, 19 December 2024, 1:30 PMDue: Wednesday, 8 January 2025, 11:55 PM
-
-
We will first, finalize the discussion of massively parallel query processing on top of MapReduce (based on the slides from the previous session).
Then, we will discuss a family of cloud data service architectures, provided by major companies nowadays.
Finally, we will delve into the algorithms used to integrate heterogeneous data sources, which will be the the basis for the last session (lab).
-
-
Opened: Thursday, 16 January 2025, 12:00 AMDue: Friday, 31 January 2025, 11:59 PM
-