Data Mining for Complex Data: Documents, Graphs, Social Networks

Data Mining for complex data: documents, graphs, social networks

The information collected in digital form is growing not only in quantity but also in complexity. This means that in many domains, we are trying to capture different aspects about objects of the real world. We can have some aspects represented in tabular form, unstructured text, graphs modeling relationships between objects, images of various complexity: schemas or photographic pictures for instance and, we can have information in different media. This leads to revisit usual mining methods to be able to analyze large collections of complex data. We address this issue by finding relevant representations and similarity measures and designing methods well-suited to solve efficiently mining tasks such as prediction, clustering, classification or pattern extraction.

60

In data mining for complex data such as XML  document, graphs or social network, we are currently interested in the following topics: 

- Community detection in attributed graphs
Diverse systems such as social networks can be modeled under the form of an attributed graph where both nodes and links are described by a set of attributes. These attributes are used to characterize the entities of the network and their relationships. We propose to exploit these two kinds of information (links and attributes) to identify the important nodes and detect the community structure of the network.

- Learning dynamic net works and link prediction
Usually a social network is represented by a graph. However in practice, networks are often dynamic and must be represented by a sequence of graphs. We study the ability of probabilistic models such as Bayesian graphical models to learn such networks and capture their properties. We are also designing new algorithms to find frequent substructures (mostly subgraphs) in dynamic networks.

- Social recommendation in digital libraries
In very large digital libraries of scientific publications it is often difficult for a neophyte to find the key papers he has to read to get the best "first glance" at a subject. Using Information retrieval methods as well as text and social mining approaches, our aim is to design tools to help neophyte users in their search. http://www.istex.fr/neotex-exploration-de-documents-textuels-dun-domaine-par-un-neophyte/
This topic is also addressed in the context of the Information retrieval theme.
 

 

previous topic: Machine Learning for Natural Language Processing                                    next topic: Data Mining for Image and Video Analysis