Automatic creation of a semantic context in Stylo articles

This project aims to help authors and editors produce and manage a semantic context that defines an article. This context can then be used to enrich the article through queries on the Isidore search engine or other platforms.

Issues

The semantic description of an article is often complex. What are the main themes? How does the text relate to the discipline or field of research? Identifying the semantic field determines the possible relationships the article will have with other documents, as well as its circulation. Reflection on this context is often limited to adding a few keywords. The scarcity of this information means that inductive algorithmic approaches to extracting keywords from raw text are systematically favoured by search engines, which reduces, if not eliminates, the control authors and editors have over the meaning of the text. This project aims to develop semi-automatic semantic enrichment tools to address this issue.

Technical challenges

Analysis of existing algorithms and their theoretical and epistemological implications
Modelling of protocols for semantic enrichment of articles
Analysis of the needs of partner journals
Design of a prototype to integrate into Stylo's writing module

Research activities

First, we will select algorithms for automatic knowledge extraction and topic modelling to analyze the articles' texts. These algorithms will present users with word clouds that can be selected and editorialized. The idea is to place the choices of authors and editors at the center of the process by adapting the algorithms' responses to the contextual choices of the users.

Deliverables

A prototype for semantic annotation of articles integrated into Stylo.

People

Marcello Vitali-Rosati
Director
Pierre Lévy
Axis director