Français Anglais
Accueil Annuaire Plan du site
Accueil > Evenements > Séminaires
Séminaire d'équipe(s) LaHDAK
Speeding up information extraction programs: a holistic optimizer and a learning-based approach to rank documents
Helena Galhardas

27 March 2015, 14h00 - 27 March 2015, 15h30
Salle/Bat : 445/PCRI-N
Contact :

Activités de recherche : Gestion de données du Web

Résumé :
A wealth of information produced by individuals and organizations is expressed in natural language text. Text lacks the explicit structure that is necessary to support rich querying and analysis. Information extraction systems are sophisticated software tools to discover structured information in natural language text. Unfortunately, information extraction is a challenging and time-consuming task.

In this talk, I will first present our proposal to optimize information extraction programs. It consists of a holistic approach that focuses on: (i) optimizing all key aspects of the information extraction process collectively and in a coordinated manner, rather than focusing on individual subtasks in isolation; (ii) accurately predicting the execution time, recall, and precision for each information extraction execution plan; and (iii) using these predictions to choose the best execution plan to execute a given information extraction program.

Then, I will briefly present a principled, learning-based approach for ranking documents according to their potential usefulness for an extraction task. Our online learning-to-rank methods exploit the information collected during extraction, as we process new documents and the fine-grained characteristics of the useful documents are revealed. Then, these methods decide when the ranking model should be updated, hence significantly improving the document ranking quality over time.

This is joint work with Gonçalo Simões, INESC-ID and IST/University of Lisbon, and Pablo Barrio and Luis Gravano from Columbia University, NY.

Pour en savoir plus : http://web.ist.utl.pt/helena.galhardas/
Séminaires
Heterogeneous Treatment Effects Estimation: When M
Raisonnement automatique
Thursday 02 June 2022 - 10h30
Salle : 2011 - DIG-Moulon
Naoufal Acharki .............................................

Witness Generation for JSON Schema
Langages et systèmes centrés données
Monday 30 May 2022 - 00h00
Salle : 455 - PCRI-N
Mohamed-Amine BAAZIZI .............................................

TUTORIAL CODALAB - Apprenez à organiser un challen
Wednesday 13 April 2022 - 00h00
Salle : 1 - DIG-Moulon
Adrien Pavao .............................................

Generative Neural Networks for Observational Causa
Raisonnement automatique
Thursday 07 April 2022 - 10h30
Salle : 2011 - DIG-Moulon
Diviyan Kalainathan .............................................

Datamining in Epi- and Phylogenetics
Tuesday 15 March 2022 - 11h00
Salle : 455 - PCRI-N
Thomas Haschka .............................................