Français Anglais
Accueil Annuaire Plan du site
Accueil > Evenements > Séminaires
Séminaire du LRI
Salle 445 - Exemplar queries on documents: Finding related information across and within documents by topological information exploitation.
Yannis Velegrakis

24 April 2017, 11h00
Salle/Bat : 445/PCRI-N
Contact :

Activités de recherche : Gestion de données du Web

Résumé :
Finding related information to some data at hand has traditionally been based on content similarity. We claim that terms used in documents should be treated differently depending on their location within the document, i.e., have different weights and/or different semantics.
In the first part of the talk, we deal with short documents and in particularly with forum posts. We describe an approach that finds related forum posts to a post at hand. It does so by treating differently the sections of the posts that have been written with different intentions in mind. We explain how text feature variations can be exploited to recognise intentions, despite the fact that they are not explicitly stated in the text. We then segment the posts according to the identified intentions and compute similarities across segments only if they have the same intention. The similarity between two posts is then computed by a combination of the similarities of their respective segments with matching intentions. We provide evidences that this approach improves significantly the performance of existing methods for finding related posts. The evidences are based on an extensive set of experimentation with real datasets and users.
In the second part of the talk we deal with full documents, and in particular wikipedia documents. We describe how topological and temporal information can be used to better identify within the document, information related to a fact at hand. We describe a new structure, called ladder, that is used to link together different parts of the document that are related contextually even if syntactically they may not match.

About the speaker:
Yannis Velegrakis is a faculty member at the University of Trento, head of the Data Management Group, and coordinator of the EIT Digital MSc program in Trento. He holds a PhD degree in Computer Science from the University of Toronto. His research areas include large scale data management (Big Data), social data analytics, integration of heterogeneous data, query answering, data quality and graph data. Prior to joining the University of Trento, he held a researcher position at AT&T Research Labs in the United States. He has also spent time as a visitor at the University of California, Santa-Cruz, the IBM Almaden Research Center, and the Center of Advanced Studies of the IBM Toronto Lab. He has been a general chair for VLDB13, and PC char for WebDB12, DESWEB10/11, SWAE07, SDSW14, and ExploreDB17. He holds 2 US patents and has been a fellow of Marie Curie and of Paris Saclay "Jean d’Alembert”.

Pour en savoir plus :
Séminaires
Building Distributed Computing Abstractions in the
Algorithmique distribuée
Tuesday 11 July 2017 - 10h30
Salle : 465 - PCRI-N
Antonella Del Pozzo .............................................

TBA
Algorithmique distribuée
Wednesday 05 July 2017 - 10h30
Salle : 465 - PCRI-N
David Doty .............................................

Salle 465 - Direct-Coupling Analysis of nucleotide
Thursday 18 May 2017 - 16h00
Salle : 465 - PCRI-N
Martin Weigt .............................................

Salle 445 - Resolving Entities in the Web of Data
Intégration de données et de connaissances
Friday 05 May 2017 - 16h30
Salle : 445 - PCRI-N
Vassilis Christophides .............................................

2017-04-28
Théorie des graphes
Friday 28 April 2017 - 14h30
Salle : 435 - PCRI-N
Evelyne Flandrin .............................................