Automatic creation of a semantic metadata in Stylo articles

This project aims to help authors and editors to produce and manage a semantic context that defines an article through automatic semantic enrichment. This enriched data is based on controlled languages and languages from the semantic web in the interest of discoverability defined upstream by the creators of referenced content and not downstream by search engine harvesting strategies.

Issues

In the case of scholarly information retrieval, search engines are increasingly shaped by inductive approaches that attempt to extract semantic information not determined by the creators of scholarly publications, but rather induced according to a logic of synthesis or even simplification of information without verifying its consistency with the specialized vocabulary of the field. This increasingly exposes us to the risk of losing control over the information we produce and over its meaning. In an era where generative LLMs are widely used for a variety of tasks whose implications have not yet been fully assessed, it becomes all the more urgent to reintroduce into the texts we produce semantic layers that we fully understand and shape. This is why this project focuses on experimenting with different techniques for semantic enrichment of articles in Stylo, through the use of extraction methods based not only on inductive models but also on deductive strategies, with the aim of achieving transparent production of semantically rich metadata.

Technical challenges

Research activities

Deliverables

A prototype for semantic annotation of articles integrated into Stylo.

People

Partners