Key techniques for the analysis of large graphs.

This course presents different NLP methods that are inspired by the study of natural language and of the underlying cognitive processes. The techniques and concepts that will be studied have however a broader scope in artificial intelligence and are used to study reasoning, decision making and symbolic machine learning. They include:

  • Syntactic processing using context-free grammars. Basic parsing methods.
  • Knowledge representation – Meaning representation – Procedural semantics – Aspect.
  • Relevance: interest, newsworthiness, argumentative relevance and processing.

The future of Artificial Intelligence requires to think beyond massive data processing. Future intelligent systems will be able to process structures and symbolic data. 

The SD206 module (Logic and knowledge representation) is an introduction to symbolic AI.
It will introduce the Prolog programming language and fundamental notions of symbolic AI related to problem solving, formal Logic, symbolic machine learning, knowledge representation, natural language processing and Complexity.

Prolog is a unique language. It is a declarative language. With Prolog, problems are addressed in terms of constraints rather than in terms of procedures. Programmers ideally provide knowledge to the machine, which then takes over the solving job. Prolog is the best opportunity to study fundamental notions of Computer Science, such as recursivity, declarativity, unification and backtracking. Prolog has been invented to address symbolic AI problems: knowledge processing, natural language processing, reasoning. Its principles are used in ontologies, in unification grammars and in constraint programming.

Topics presented in SD206 belong to the symbolic branch of AI, which rely on the use of structures and Logic. They will be taken from the following list:

  • Prolog (recursivity, backtracking, unification)
  • Forma Logic (propositions, predicates, proof by refutation)
  • Natural language processing (DCG, parsing through unification)
  • Symbolic machine learning (symbolic induction, complexity minimum)
  • Knowledge representation (description logics, ontologies, semantic Web)
  • Problem solving

SD206 is part of the ‘Data Science’ track in Telecom ParisTech.
It finds a natural follow-up in SD213 (Symbolic Natural Language Processing, SNLP).

Le cours présentera des algorithmes pour l'analyse et l'exploration des données, en se focalisant sur les aspects pratiques et théoriques de l'exploration des grands volumes de données.
Pendant le cours, les élèves se familiariseront avec les algorithmes les plus efficaces pour le partitionnement de données, ranking, règles d'association, systèmes de recommandation, ainsi qu'avec les algorithmes pour la détection des communautés et des événements intéressants dans les réseaux sociaux. Les élèves travailleront dans un projet où ils implémenteront certains algorithmes précédemment mentionnés dans un cluster Hadoop (l'un des systèmes les plus efficaces pour traiter des grands volumes de données), et analyseront des données du monde réel.