Salle 445 - Exemplar queries on documents: Finding related information across and within documents by topological information exploitation.
24 April 2017, 11h00
Salle/Bat : /
Activités de recherche :
Gestion de données du Web
Finding related information to some data at hand has traditionally been based on content similarity. We claim that terms used in documents should be treated differently depending on their location within the document, i.e., have different weights and/or different semantics.
Pour en savoir plus :
In the first part of the talk, we deal with short documents and in particularly with forum posts. We describe an approach that finds related forum posts to a post at hand. It does so by treating differently the sections of the posts that have been written with different intentions in mind. We explain how text feature variations can be exploited to recognise intentions, despite the fact that they are not explicitly stated in the text. We then segment the posts according to the identified intentions and compute similarities across segments only if they have the same intention. The similarity between two posts is then computed by a combination of the similarities of their respective segments with matching intentions. We provide evidences that this approach improves significantly the performance of existing methods for finding related posts. The evidences are based on an extensive set of experimentation with real datasets and users.
In the second part of the talk we deal with full documents, and in particular wikipedia documents. We describe how topological and temporal information can be used to better identify within the document, information related to a fact at hand. We describe a new structure, called ladder, that is used to link together different parts of the document that are related contextually even if syntactically they may not match.
About the speaker:
Yannis Velegrakis is a faculty member at the University of Trento, head of the Data Management Group, and coordinator of the EIT Digital MSc program in Trento. He holds a PhD degree in Computer Science from the University of Toronto. His research areas include large scale data management (Big Data), social data analytics, integration of heterogeneous data, query answering, data quality and graph data. Prior to joining the University of Trento, he held a researcher position at AT&T Research Labs in the United States. He has also spent time as a visitor at the University of California, Santa-Cruz, the IBM Almaden Research Center, and the Center of Advanced Studies of the IBM Toronto Lab. He has been a general chair for VLDB13, and PC char for WebDB12, DESWEB10/11, SWAE07, SDSW14, and ExploreDB17. He holds 2 US patents and has been a fellow of Marie Curie and of Paris Saclay "Jean d’Alembert”.