Topic outline

  • This course

    In this class, we will look into a number of new developments of data on the Web and on the Internet:

    Blockchains have revolutionized the exchange of money. We will discuss the theoretical foundations of blockchains and their implementation.
    Data Security
    With more and more of our lives being digitalized, we have to protect our data on the Web and offline against hackers, companies, and sometimes governments. We will discuss current best practices.
    Information extraction
    Many big companies systematically harvest information about people, events, and other entities from the Web (think of Google's knowledge graph or IBM's Watson question answering system). We will discuss the algorithms that allow going from textual content to structured information, including natural language processing.

    All material in this Moodle is for the participants of the course only, and may not be shared publicly.

  • Lecturers

    • Fabian Suchanek (professor at Télécom Paris,
    • Nicoleta Preda (associate professor at Versailles University,
    • Nedeljko Radulovic (teaching assistant, PhD student at Télécom Paris,
  • Grading

    • First session: 50% labs + 50% exam
    • Second session: 50% original labs + 50% reexam

    Each lab has equal weight.

  • Schedule

    Day 13:30-15:00 15:15-16:45
    2019-11-20 IntroductionMotivationApplicationsKnowledge Representation What the GAFA know (Fabian)
    2019-11-27 Data Security (Fabian) Lab 1: Password cracking (Fabian & Ned) — Deadline during the lab!
    2019-12-04 Formal Grammars 1, Evaluation (Fabian) Lab 2: Instance Extraction (Fabian & Ned)
    2019-12-11 Formal Grammars  2 POS TaggingDependency ParsingDisambiguation (Fabian) Lab 3: Disambiguation (Fabian & Ned)
    2019-12-18 Fact Extraction, Information Extraction by Deep Learning (Fabian) Lab 4: Deep Learning (Fabian & Ned)
    2020-01-08 Blockchains (Nicoleta) Blockchains (Nicoleta)
    2020-01-15 Lab 5: Blockchains (Nicoleta & Ned)
    ? 13:30-15:00: Exam (no material allowed)

    The schedule beyond the current point of time is tentative.

    Additional optional material: Named Entity RecognitionNLP