Pascal Workshop and Pascal Challenge.
Type I and type II errors for Multiple
Simultaneous Hypothesis Testing
Workshop with associated Special Issue
Paris, France, May 15-16
G. Gavin, S. Gelly, Y. Guermeur, S. Lallich,
PDF version http://www.lri.fr/~teytaud/risq.pdf
HTML version http://www.lri.fr/~teytaud/risq
Videos are here: http://videolectures.net/msht07_paris/
Registration is free. Still, please email olivier.teytaud@inria.fr
to facilitate the organization of the workshop.
The preliminary schedule is available at http://www.lri.fr/~teytaud/sched.html.
(more information coming soon)
(if booking is difficult for language-reasons, please feel free of emailing "teytaud@lri.fr" for taking care of booking)
All hotels below are in the heart of Paris.
- Hotel du marais tel (33)1 42 72 30 26 -- not very close but not very far and much less expensive -- from 25 euros/night to 35 euros/night
- Hotel de Lille, 40 r Lille 75007 Paris, minimum 100 euros/night, tel. (33)1 42 61 29 09 (very very close to the workshop)
See also the list of hotels here (from 41 euros/night): http://www.dma.ens.fr/~stoltz/MFLT/Accomodation.html; not very far.
Please feel free of requesting some help (email to olivier.teytaud@inria.fr)
Multiple Simultaneous Hypothesis Testing is a main issue in many areas of information extraction:
- rule extraction [6],
- validation of genes influence [3],
- validation of spatio-temporal patterns extraction (e.g. in brain imaging [7]),
- other forms of spatial or temporal data (e.g. spatial collocation rule, [8]).
- other multiple hypothesis testing ([4]),
In all above frameworks, the goal is to extract patterns such that some quantity of interest is significantly greater than some given threshold.
- in rule extraction, the goal typically is the extraction of rules with confidence, lift and support significantly higher than a given threshold;
- in multiple hypothesis testing, the goal typically is the extraction of significant comparisons among various averages simultaneously;
- in spatio-temporal patterns extraction, the goal typically is the extraction of smooth (spatio-temporal) subsets of
with correlation significantly higher than a given threshold.
Along these lines, a type I error is to extract an entity which does not
satisfy the considered constraint while a type II error is to miss an
entity which does satisfy the constraint. How to estimate, bound, or
(even better !) reduce type I and type II errors are the
goals of the proposed challenge.
VC-theory [2], empirical process [5] and various approaches related to simultaneous hypothesis testings [4] are fully relevant, as well
as specific approaches, e.g. based on simulations, resamplings or probes [9]. The challenge consists in extending previous results to the field of simultaneous hypothesis testing, or proposing new results specifically related to this topic.
We welcome survey papers related to type I and type II errors, and papers presenting new results, proposing theoretical bounds or smart empirical experiments. In the latter case, the experimental setting as well as the algorithmic principles and explicit criteria must be carefully described and discussed; the use of publicly available software will be much appreciated.
Results combining type I and type II risk are particularly welcome. Asymptotic and non-asymptotic results are equally welcome.
Key words : Empirical process, Learning theory, Multiple hypothesis testing, Rule extraction, Bio-informatics, Statistical Validation of Information Extraction.
- Diffusion of the challenge : January 11, 2006.
- Deadline for submissions : February 10, 2007.
- Notification of acceptance of submitted results : March 2007.
- Challenge Workshop : Paris, France; May, 15-16th, 2007
Submissions (in PS or PDF) should be submitted by email to "olivier.teytaud@inria.fr"
- no fee.
- venue: Université Paris-5, 45 rue des Saints-Pères, Paris (downtown). Close to metro "Saint-Germain-des-PrĂ©s".
- large map: http://www.justfranceinparis.com/paris677map.html
- local map: http://kenobi.univ-paris5.fr/cgi-bin/WebObjects/WebSiteAdmin.woa/wa/viewArticle?type=99&typeRoot=99&couNumero=794# (click on "Centre des Saints-Pères")
- another local map: http://www.biomedicale.univ-paris5.fr/Plan-d-acces.html
- how to come:
- Bus: see map above
- Metro line 4, station "Saint-Germain-des-Prés"
- Metro line 10, station "Mabillon"
- Metro line 12, station "Rue du bac"
- RER B, to station "Denfert Rochereau" or "Chatelet Les Halles", and then Metro line 4, to station "Saint-Germain-des-Prés"
- From Orly airport: metro "Orlyval" to Antony, and then RER B to station "Denfert Rochereau" or "Chatelet Les Halles", and then Metro line 4, to station "Saint-Germain-des-Prés".
- From Roissy-Charles-De-Gaulle, RER B to station "Denfert Rochereau" or "Chatelet Les Halles", and then Metro line 4, to station "Saint-Germain-des-Prés".
- RER A, to station "Chatelet Les Halles", and then metro line 4 to station "Saint-Germain-des-Prés"
- from other places: fill in the forms at http://www.ratp.fr.
Email for any information: olivier.teytaud@inria.fr.
- Gérald Gavin (univ. Lyon 1);
- Sylvain Gelly (univ. Paris-Sud);
- Yann Guermeur (Cnrs, Loria);
- Stéphane Lallich (univ. Lyon 2);
- Jérémie Mary (univ. Lille);
- Michèle Sebag (Cnrs);
- Olivier Teytaud (Inria).
- 1
- M. Antony and P.L. Bartlett,
Neural network learning : Theoretical Foundations,
Cambridge University Press,
1999.
- 2
- V. N. Vapnik, Statistical Learning Theory, Wiley, 1998.
- 3
- Merrill D. Birkner, Katherine S. Pollard, Mark J. van der Laan, and Sandrine Dudoit, "Multiple Testing Procedures and Applications to Genomics" (January 2005). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 168. http://www.bepress.com/ucbbiostat/paper168
- 4
- J.C. Hsu, Multiple comparisons: theory and methods, Chapman & Hall, 1996.
- 5
- Van Der Vaart A., Wellner J.A. Weak Convergence and Empirical Processes. Springer series in statistics, 1996.
- 6
- R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules between Sets of Items in Large Databases. In Proceedings of SIGMOD-93, pages 207-216, 1993.
- 7
- D.Pantazis, T.-E. Nichols, S. Baillet, R.-M. Leahy, "A Comparison
of Random Field Theory and Permutation Methods for the Statistical Analysis of
MEG data", Neuroimage, 25, 355-368, April, 2005.
- 8
- M. Salmenkivi. Efficient Mining of Correlation Patterns in Spatial Point Data. In Proceedings of PKDD 2006, pages 359-370.
- 9
- H. Stoppiglia, G. Dreyfus, R. Dubois, Y. Oussar. Ranking a random feature for Variable and Feature selection. JMLR 2003.
Pascal Workshop and Pascal Challenge.
Type I and type II errors for Multiple
Simultaneous Hypothesis Testing
This document was generated using the
LaTeX2HTML translator Version 2002-2-1 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 risq.tex
The translation was initiated by Olivier Teytaud on 2007-09-20
Olivier Teytaud
2007-09-20