WORKSHOP TALN 2006
Workshop Resources in the treatment of the written language,
following the contribution of Technolangue
Thursday, April 13, 2006 Louvain, Belgium
This workshop is devoted to the resources of the written language, in Natural Language Processing.
First, we take stock of the the benefits of The French National Initiative for
Human Language Technologies Technolangue, in this part devoted to the written resources.
We also speek about the sharing of resources and the intellectual property right,
with information on nearest creation by the CNRS of a network of center of linguistic resources.
This workshop has a three-point plan :
1. A presentation of the results of Technolangue (resources of the written language) project by project.
2. Information of the network of center of linguistic resources.
3. A round table talks on the production of resources in an industrial context.
The Technolangue program
Technolangue is the French National Program on Human Language Technologies (HLT) supported by the French ministries
in charge of Research, Industry and Culture. Following a report to the Prime Minister in 2000 concerning the
major role of HLT in Information Society, the Technolangue initiative has been launched in April 2002 as a
large national programme on Language Technologies (LT) in the perspective of installing a permanent infrastructure
for Language Resources, Evaluation, Standards and Survey.
Technolangue is articulated with related existing R&D programmes in the field of Information and
Communication Technologies, so-called the Research & Innovation Technological Networks (RRIT),
mainly in three domains, Telecommunications, Software Engineering, and Audiovisual & multimedia and,
to a lesser extent, the Nanotechnologies domain.
Technolangue is a three years programme, foreseen to be achieved in late 2005 or begining 2006,
but one main goal is to assure the economic viability of a permanent infrastructure for evaluation.
A steering committe consisting of 15 members equally representing the NLP and speech communities,
companies and academic labs is in charge to supervise the review of the proposals and to steer the programme.
The overall funding budget dedicated to the programme is over 7.5 millions euros coming from the three ministries
in charge of Research, Industry and Culture. The global effort, made both by the national bodies and the
industrial sector, reaches about 11.5 M as many projects are funded on a share-cost basis.
The Technolangue programme is concerned by four action lines:
Late 2002, 27 projets have been shortlisted after the evaluation process from 52 proposals, and 21 projects
have finally been funded. More than 90 different participants are involved in the projects: 33 from industry,
39 public research centers, 11 other and 11 from outside France, which take part in the programme on a self funding basis.
- The first one aims at stimulating the production and the diffusion of Linguistic Resources and
basic language processing tools. It aims at the emergence of a "toolbox" containing the minimal linguistic
and software resources necessary for the automation of the French language. One major aspect is the wide
distribution of these resources: low costs, shareware or freeware, open source conditions
- Secondly, Technolangue addresses the Evaluation topic by funding the organization of evaluation campaigns.
More generally, the programme aims at creating a permanent infrastructure for evaluating HLT technologies for
spoken and written language.
- The third goal is to support the French participation, which is traditionally weak, in standardization
commmittes at the international level. The goal also includes the necessity to take care that the results
of the negotiations carried out are directly transferred towards the actors.
- The last action line concerns the creation of a Web Portal in HLT in order to assure a permanent technology,
scientific and industrial watch by making available the results of the projects and news concerning the HLT field,
both from academic and industrial sectors.
The 21 funded projects aim at the creation of linguistic resources and basic tools, the standardization process,
the evaluation topic and the creation of the
Website for disseminating information.
Presentation of the results of Technolangue
The workshop will begin with a presentation of the results of "Technolangue-Resources".
We hand over to Stéphane Chaudiron, as coordinator of the Technolangue program, and ve hand over to a presentation by sub-project.
This cluster concerns three projects:
Les centres de ressources numériques
As a transition between the results already obtained from Technolangue and the discussion from the round table,
Laurent Romary or Gérard Sabah will present an information on nearest creation by the CNRS of a network of center of linguistic resources.
Industrial round table
This round table will be led by Claude de Loupy, Malek Boualem and Christian Fluhr.
We debate several questions, like, for example:
- Why the industrialists also often develop again their own resources and tools, as there exist, sold or free?
- What is necessary to make to avoid that?
- Which is the place of the industrialists in calls as Technolangue for which what is product must be
placed at the disposal of the users in a free or not very expensive way?
- Which is the plane business which must be set up behind?
- Which is the return on investment?
- Which are the difficulties of intellectual property which arise?
- Where is one as regards standardization in the field of the resources?
- Which solutions would bring standardization?
- What to think of the Web like source of linguistic resources?
- The use of the Web to serve as a basis for various tasks related to NLP, is an idea recently exploited.
- This question will undoubtedly refer to the interventions at the
Atala conference on this subject.
Claude de Loupy
Malek Boualem, France Telecom
Sylvie Brunessaux, EADS Defence and Security Systems SA
Stéphane Chaudiron, Ministère chargé de la Recherche
Khalid Choukri ELRA/ELDA
Christian Fluhr, CEA
Claude de Loupy, Université Paris 10
Jacques Mathieu, Ministère chargé de l'Industrie
Denis Maurel, Université François-Rabelais de Tours, Laboratoire d'informatique
Laurent Romary, Directeur de l'information scientifique du CNRS
Gérard Sabah, Directeur de recherche, LIMSI-CNRS
For the presentation of Technolangue:
and Denis Maurel
For the round table:
Claude de Loupy