Seminar by Julien Tissier

"Constrained learning of binary representations of semantic similarities" by Julien Tissier

at 1:30 PM

Room F021a
Building F
Laboratoire Hubert Curien
18 rue du Professeur Benoît Lauras
42000 Saint-Eienne

Seminar by Julien Tissier

Abstract

During this presentation, I will talk about how to compute language representations in order to solve some natural language processing  tasks. More specifically, I will introduce the concept of word embedding, the idea of mapping numerical values to words. I will then present two of my  contributions that I have made so far during my PhD to improve word embeddings. The first one is Dict2vec, a model that uses word definitions from lexical  dictionaries in a semi-supervised way to encode additional semantic knowledge into word representations. The word embeddings learned with Dict2vec perform around 15%-30% better on semantic similarity tasks compared to other common  embeddings of Facebook or Stanford. The second one is a novel architecture based on an autoencoder to transform real-value embeddings into binary embeddings. The produced binary vectors are much smaller (37.5 times less space in memory) and allows one to speed up computations by a factor of 30 while only leading  to a loss of ~2% in accuracy in downstream NLP tasks. Finally, I will present my future (and current) work on word disambiguation and multi-lingual Wikipedia entity hierarchization.

Bio : I recently started my 3rd year as a PhD student under the supervision of Amaury Habrard and Christophe Gravier.

This seminar will be done in english.