Minutes of the AI workshop of February 13th 2025

Introduction and Research Questions

Nicolas Sauret: Question: Is it not necessary to take a strong stance on the issue of AI usage, as Dominique Boullier did (see[ AOC, February 10, 2025](https:// https://aoc.media/analyse/2025/02/09/sommet-ia-la-necessaire-secession-semantique-europeenne/ ))? There is a technological and political acceleration around AI.

Example of Stylo, which positioned itself against the hegemonic textual formatting at the time.

Suggests drafting an ethical and political-ecology framework for the use of AI in Revue3.0 and more broadly in research conducted in our disciplines.

Experimentations and tinkering are at the heart of our practices, but there is a strong temptation to use LLMs without questioning them and without considering alternatives to these LLMs.

Samuel Szoniecky: It is about being able to measure the impact of AI on our work and on the environment. S. Szoniecky works on evaluating powers in human-machine-environment interactions: how to produce sustainable and mutually beneficial development? How to evaluate? Currently, his approach relies on tinkering and testing, which allow him to assess pros and cons. Ecosystemic vision: interrelation of each actor and modeling of coexistences. Current artificial instances, "agents," could be "conceptual characters," representing a viewpoint to be questioned. These agents need to be questioned, constructed, and evaluated.

Nicolas Sauret: What are the metrics, especially ecological, of the impact of these tools?

Samuel Szoniecky: Definition of powers:

Stéphane Pouyllau (in the chat): Event: "Scientific Publishing and Artificial Intelligence" Friday, March 28, 2025, from 09:15 to 12:30, in person. Event program.

Gérald Kembellec (in the chat): https://rsf.org/fr/projet-spinoza - This prototype allows journalists to quickly access precise information extracted from legal and scientific documents.

Discussion

Marcello Vitali-Rosati: There is a similarity between the current criticism of "AI" and the almost reactionary criticisms of the web in its early days. Comments such as "we no longer read, we lose x, y and z." The interest lies in questioning systemic interactions. What are the power issues: the current configuration is different from that of the early web, especially in the economic aspect. Costs and scale that can no longer be compared.

see Florence Maraninchi "Why I Don’t Use ChatGPT".

Second question: What are we talking about when we talk about AI? Conversational models? There are many models designed for specific functionalities (vector analysis, calculation, translation, etc.), but today, the understanding/development of AI is increasingly focused on prompts, even in the case of RAG.

Stéphane Pouyllau: Maraninchi’s article is missing the forest for the trees by focusing on criticizing certain uses. It would be a shame to deprive ourselves of models based on prompts. We can nuance by using, for example, small language models (SLMs). These alternatives allow us to question the position of editors and researchers regarding these technologies and address ethical/ecological questions.

What is the position of scientific publishers on the use of scientific corpora for training and RAG? The publication of the Common Corpus in March 2024 changed the game for training multilingual/French models, and many researchers have seized it.

Nicolas Sauret (in the chat): In line with Stéphane, the stance of independent press editors regarding the use of their content for LLM training. Tribune: Three conditions to guarantee the reliability of information and preserve democracy in the era of generative AI.

Samuel Szoniecky: "Agents" are not just chatbots but also all tools that allow exploiting databases/corpora.

Example: Using AI to retrieve Deleuze’s courses in audio, automatically transcribe them, represent them as diagrams, and finally question the knowledge base and precisely retrieve information using RAG. In other words, at each step of the editorial chain, there are agents.

It is important to evaluate the powers at play and specify the questions related to these agents, as well as to ask questions related to dissemination issues.

Marcello Vitali-Rosati: What you present has nothing to do with what we normally call AI. Is it necessary/useful to maintain the very generic notion of "AI" as we use it today? We put together many technologies and uses under the single term AI.

Samuel Szoniecky: This is our responsibility as specialists. It is up to us to position ourselves and affirm that AI is not limited to chatbots. This will show the relevance of our voice as researchers.

Nicolas Sauret: Are there discussions about AI usage within journals? Tests have been done with ChatGPT to my knowledge.

Aurélien Berra: Training someone to use a chatbot is not necessarily a good use of researchers’/CNRS agents’ time. The interest of Revue3.0 is to be able to meet in small groups to conduct precise experiments.

Florence Daniel (Natures Sciences Sociétés journal): Today, the experiments being conducted use ChatGPT because, despite the relevant questions raised about ecology, etc., ChatGPT accomplishes the tasks we ask of it. The objective of Revue3.0 could be to support editors in adopting new practices or tools.(Humanités Numériques)

Aurélien Berra (Digital Humanities): Revue3.0 could develop a protocol for AI use in scientific publishing.

Bertrand Gervais (Captures journal): Support for editors is necessary. How to integrate skills related to AI use? Journals do not have the means to offer this type of training.

Be careful not to prescribe; we must support/train. How to ensure that editors of journals in various disciplines are not lost regarding these tools?

Stéphane Pouyllau (in the chat): "AI processing tools in editorial workflows must be integrated into professional skills. This is a UX, UI, and prompt library issue built with journal stakeholders. This is a great project for Stylo."

Gérald Kembellec (in the chat): "What Stéphane says is what we are working on in the INTD journal."

Gérald Kembellec: We will have very technical journals with large datasets that allow RAG. In the Data-documentation research axis: how will collaborative indexing and documentation work interact and integrate with AI that can perform the same tasks?

A still cautious approach because having agents and multiple humans interact raises the question of control: layers and sub-layers of code that are not necessarily verified/verifiable, and if errors creep in, it can have significant consequences.

Nicolas Sauret (referring to Dominique Boullier’s article): Pure statistics do not work. Invisible human labor and capitalist/colonialist behind it. A project that proposes alternatives considering the classification skills specific to humans.

Gérald Kembellec: Working on two document engineering projects. Tests with generative AI for classification show the need for collaboration between a documentation specialist, stakeholders, and editors: a heuristic project that shows the impossibility of removing humans from the documentation process, who must, for example, explain the necessary rules and repeat them several times with chatbots to obtain the desired result.

Stéphane Pouyllau: Industries integrate RAG tools or fine-tune for technical documentation (e.g., the elevator industry): they have not tested their methods with public models. Their first tests were done with models not trained on their data, then, dissatisfied with the first results, they waited. They preferred to use a more recent model from Mistral to fine-tune with their data.

Marcello Vitali-Rosati: Several avenues:

- Design possible experimentation protocols (which experiments with which tools), for example, regarding evaluation.

- Ethical questions: ecological impact, power, structures (see "The Eye of the Master" by Matteo Pasquinelli). Reversal of the question: AI does not make us lose our jobs. Today’s LLMs better match our expectations for the division of labor, which is why they seem better to us. These questions could be part of the future of this workshop.

Nicolas Sauret: From an experimentation standpoint, it is necessary to ask what is worth experimenting with: some tasks can be automated very well with expert systems.

We should set up training within Revue3.0, as we do for Stylo, on AI use for research and journal activities.

Samuel Szoniecky: Play collectively to find a collective response to our questions about AI.


References and Useful Links

Collective of representative professional organizations of publishers and Geste, to which SEPM participates (Alliance, FNPS, Geste, SEPM, Spiil), "OPINION. ’Three conditions to guarantee the reliability of information and preserve democracy in the era of generative AI,’" February 5, 2025. https://www.latribune.fr/opinions/tribunes/opinion-trois-conditions-pour-garantir-la-fiabilite-l-information-et-preserver-la-democratie-a-l-heure-de-l-ia-generative-1017557.html?id=1325156934235714.

Boullier, Dominique. "AI Summit: The Necessary European Semantic Secession - AOC media." AOC media - Analyse Opinion Critique, February 9, 2025. https://aoc.media/analyse/2025/02/09/sommet-ia-la-necessaire-secession-semantique-europeenne/.

Langlais, Pierre-Carl. "Releasing Common Corpus: the largest public domain dataset for training LLMs." Accessed February 13, 2025. https://huggingface.co/blog/Pclanglais/common-corpus.

Maraninchi, Florence. "Why I Don’t Use ChatGPT." Academia (blog), February 2, 2025. https://doi.org/10.58079/1382x.

Pasquinelli, Matteo. "The Eye of the Master." Verso. Accessed February 13, 2025. https://www.versobooks.com/en-ca/products/735-the-eye-of-the-master.

Reporters Without Borders and Alliance de la presse d’information générale, Projet Spinoza: "Artificial Intelligence for Journalists, by Journalists and Media," Reporters Without Borders, April 16, 2024. https://rsf.org/fr/projet-spinoza.

- Event: "Scientific Publishing and Artificial Intelligence" Friday, March 28, 2025, from 09:15 to 12:30, in person. Event program at https://www.fnps.fr/2025/03/28/colloque-edition-ia-2025/