"Constrained learning of binary representations of semantic similarities" by Julien Tissier
The January 10, 2019
at 1:30 PM
Room F021a
Building F
Laboratoire Hubert Curien
18 rue du Professeur Benoît Lauras
42000 Saint-Eienne
Seminar by Julien Tissier
Abstract
During this presentation, I will talk about how to compute language representations in order to solve some natural language processing tasks. More specifically, I will introduce the concept of word embedding, the idea of mapping numerical values to words. I will then present two of my contributions that I have made so far during my PhD to improve word embeddings. The first one is Dict2vec, a model that uses word definitions from lexical dictionaries in a semi-supervised way to encode additional semantic knowledge into word representations. The word embeddings learned with Dict2vec perform around 15%-30% better on semantic similarity tasks compared to other common embeddings of Facebook or Stanford. The second one is a novel architecture based on an autoencoder to transform real-value embeddings into binary embeddings. The produced binary vectors are much smaller (37.5 times less space in memory) and allows one to speed up computations by a factor of 30 while only leading to a loss of ~2% in accuracy in downstream NLP tasks. Finally, I will present my future (and current) work on word disambiguation and multi-lingual Wikipedia entity hierarchization.
Bio : I recently started my 3rd year as a PhD student under the supervision of Amaury Habrard and Christophe Gravier.
This seminar will be done in english.