WAC3 - 2007

Web as Corpus 2007, UCLouvain, Louvain-la-Neuve, September 15-16 2007 (Belgium)

Outline Program

Fri 14 September
20:00Welcome Cocktail

Sat 15 September
Auditorium Socrate, Place Cardinal Mercier, Louvain-la-Neuve
9:30 – 9:45Opening Session
9:45 – 10:45Invited Speaker:
The Crúbadán Project: Corpus building for under-resourced languages
Kevin Scannell,
Saint Louis University, USA
10:45 – 11:15Classifying Web corpora into domain and genre using automatic feature identification
Serge Sharoff,
University of Leeds, UK
Coffee break
11:45 – 12:15A Human Evaluation of Filtering Functions for Pattern-based Extraction of Arbitrary Relations from the Web
Sebastian Blohm & Philipp Cimiano,
University of Karlsruhe, Germany
12:15 – 12:45Identification of Languages and Encodings in a Multilingual Document
Anil Kumar Singh & Jagadeesh Gorla,
International Institute of Information Technology, Hyderabad, India
Lunch time
14:30 – 16:00
Overview, data preparation, scoring, results of Cleaneval
Marco Baroni, Francis Chantree, Adam Kilgarriff & Serge Sharoff
Coffee break
16:30 – 18:00
System descriptions

Dimanche 16 septembre
Amphithéâtre Socrate, Place Cardinal Mercier, Louvain-la-Neuve
9:00 – 10:00 Invited Speaker (to be confirmed) or Panel: "A WAC search engine"
10:00 – 10:30CorpEus, a 'web as corpus' tool designed for the agglutinative nature of Basque
Igor Leturia & Antton Gurrutxaga,
Elhuyar Foundation
Iñaki Alegria & Aitzol Ezeiza,
University of the Basque Country, Spain
10:30 – 11:00Implementing a BNC-Compare-able Web Corpus
William Fletcher,
United States Naval Academy, USA
Coffee break
11:30 – 12:00Yet another web crawler
Fabrice Issac,
Université Paris 13, France
12:00 – 12:30textBox : a Tool for Written Corpus Linguistic Investigation
Emmanuel Cartier,
Université Paris 13, France
Lunch time
14:00 – 15:30
Panel: Lessons Learned, future Cleanevals
15:30 – 16:00Closing Session

Invited speaker : Kevin Scannell

Kevin Scannell, of Saint Louis Univ., Missouri, USA, has been working with scholars of a range of smaller languages to develop web corpora for those languages : website currently lists 135 corpora/languages.

Worskshop Co-chairs

Prof. Cédrick Fairon, UCLouvain, Cental, fairon@tedm.ucl.ac.be
Prof. Gilles-Maurice de Schryver, Universiteit Gent, gillesmaurice.deschryver@ugent.be

Last update :  August, 2007