Automatic creation of a semantic metadata in Stylo articles

This project aims to help authors and editors produce and manage a semantic context that defines an article through automatic semantic enrichment. This enriched data is based on controlled languages and languages from the semantic web in the interest of discoverability defined upstream by the creators of referenced content and not downstream by search engine harvesting strategies.

Issues

In the case of scholarly information retrieval, search engines are increasingly shaped by inductive approaches that attempt to extract semantic information not determined by the creators of scholarly publications, but rather induced according to a logic of synthesis or even simplification of information without verifying its consistency with the specialized vocabulary of the field. This increasingly exposes us to the risk of losing control over the information we produce and over its meaning. In an era where generative LLMs are widely used for a variety of tasks whose implications have not yet been fully assessed, it becomes all the more urgent to reintroduce into the texts we produce semantic layers that we fully understand and have the power to shape. This is why this project focuses on experimenting with different techniques for semantic enrichment of articles in Stylo, through the use of extraction methods based not only on inductive models but also on deductive strategies, with the aim of achieving transparent production of semantically rich metadata.

Technical challenges

Analysis of existing algorithms and their theoretical and epistemological implications
Modelling of protocols for semantic enrichment of articles
Analysis of the needs of partner journals
Designing a prototype to integrate into Stylo's writing module

Research activities

Choosing the relevant metadata for semantic enrichment
Benchmarking of systems for the identification of this state-of-the-art data (e.g. NER, keyword extraction)
Prototype for integration at different times (writing, before export) of the module
Evaluation on a development instance
Deployment of the module on Stylo.

Deliverables

A prototype for semantic annotation of articles integrated into Stylo.

People

Marcello Vitali-Rosati
Director
Louis-Olivier Brassard
Doctoral student
Alexia Schneider
Doctoral student