CENTRE FOR NATURAL LANGUAGE PROCESSING

UCL > CENTAL > Projects > Stratego

Version française

Title

Automatic structuring of raw texts for clustering and categorization

Abstract

The main goal of the Stratego project consists in the development of tools to improve information retrieval through large scale textual databases.


More precisely, this project focus on:

  • automatic structuring of raw documents (e.g. digitalized documents) into XML documents compliant with a DTD or a XML schema,
  • automatic classification of documents in predefined categories, and
  • semi-automatic thesauri generation using specialized corpus (e.g. law texts).


Several research teams are involved in this project:

  • CENTAL (UCL) (Centre for Natural Language Processing),
  • IRIDIA (ULB) (Artificial Intelligence research laboratory of the Université Libre de Bruxelles),
  • ISYS (UCL) (Information Systems Unit) et
  • SIC (ULB) (Information and Communication Sciences Department)

Publications

Région wallonne

Wist 2

Duration

  • 36 months.
  • Start : octobre 2007.

Researchers

Advisor

Industrial partner