WAC3 - 2007

Web as Corpus 2007, UCLouvain, Louvain-la-Neuve, September 15-16 2007 (Belgium)

 o Call for papers
 o Submit a paper
 o Registration
 o Program
 o Scientific committee
 o Travel info. & venue
 o Local organisation team
 o Associated events
 o Previous Workshops
 o Pictures

 o Information
 o Scientific committee


Call for papers

3rd Web as Corpus Workshop (WAC3)
incorporating Cleaneval

Sept. 15-16, 2007
University of Louvain, Louvain-la-Neuve, Belgium

More and more people are using Web data for linguistic and NLP research. The workshop provides a venue for exploring how we can use it effectively and what we will find if we do.

We invite submissions which :

  • describe Web corpus collection projects, or modules for one part of the process (crawling, filtering, language-id, tokenising, lemmatising, POS-tagging, indexing, ...)
  • explore characteristics of Web data, from a linguistics/NLP perspective including registers, domains, frequency distributions
  • use crawled Web data for NLP purposes (with emphasis on the data rather than the use)


Anyone using web data needs to clean it, to get rid of unwanted material including, for example, HTML markup, navigation bars, advertisements. To date there has been no sharing of resources or expertise and the cleaning has often been done minimally. Cleaneval is an exercise to promote sharing and to improve our understanding of the issues. It will take the now-familiar form of an open competition and shared task. More info on Cleaneval.

Previous WAC workshops

More info on WAC1 at Corpus Linguistics conference, Birmingham, UK, July 2005.
More info on WAC2 at EACL, Trento, Italy, April 2006.

Invited speaker : Kevin Scannell

Kevin Scannell, of Saint Louis Univ., Missouri, USA, has been working with scholars of a range of smaller languages to develop web corpora for those languages : website currently lists 135 corpora/languages.


For regular papers :
Papers (6-10 pages), demos (max. 2 pages) and posters (max. 2 pages) are to be written in English and follow ACL formatting. Template files (.doc & Latex) available on the website.


Université catholique de Louvain, in the elegant new city of Louvain-la-Neuve (Belgium). Large computer rooms will be available for demo sessions.

Points of contact

Worskshop Co-chairs

Cédrick Fairon, UCLouvain, Cental, fairon@tedm.ucl.ac.be
Gilles-Maurice de Schryver, Universiteit Gent, gillesmaurice.deschryver@ugent.be

WAC3 committee

Marco Baroni, U. of Trento, Italy
Massimiliano Ciaramita, Italian National Research Council, Laboratory for Applied Ontology, Italy
Guy Deville, FUNDP, Belgium
Thierry Dutoit, FPMs, TCTS Lab, Belgium
Stefan Evert, U. of Osnabrück, Institute of Cognitive Science, Germany
Cédrick Fairon, UCLouvain, Cental, Belgium
Nuria Gala, U. de Provence, DELIC, France
Sylviane Granger, UCLouvain, Center for English Corpus Linguistics, Belgium
Gregory Grefenstette, Commissariat à l'Énergie Atomique, France
Benoît Habert, LIMSI, France
Tony Hartley, U. of Leeds, United Kingdom
Adam Kilgarriff, Lexical Computing Ltd, United Kingdom
Christophe Lejeune, ULg, CEMAD, Belgium
Sébastien Paumier, Université de Marne-la-Vallée, France
Kevin Scannell, Saint Louis University, USA
Gilles-Maurice de Schryver, Universiteit Gent, Belgium
Klaus Schulz, Ludwig-Maximilians-Universität München, Germany
Jean Senellart, Systran, France
Serge Sharoff, U. of Leeds, United Kingdom

Cleaneval committee

Marco Baroni, U. of Trento; Secretary, SIGWAC
Tony Hartley, U. of Leeds
Adam Kilgarriff, Lexical Computing Ltd; Chair, SIGWAC
Serge Sharoff, U. of Leeds

Local organisation team

Bernadette Dehottay, UCLouvain, Cental, dehottay@tedm.ucl.ac.be
Julia Medori, CENTAL, UCLouvain
Laurent Kevers, CENTAL, UCLouvain
Hubert Naets, CENTAL, UCLouvain
Isabelle Lecroart, CENTAL, UCLouvain
Claude Devis, CENTAL, UCLouvain

Contact us :
Bernadette Dehottay
Université catholique de Louvain
Centre for Natural Language Processing (CENTAL)
Place Blaise Pascal, 1
1348 Louvain-la-Neuve
Tel. +32 10 47 37 88
Fax. +32 10 47 26 06

Last update :  August, 2007