Minutes of the AI workshop of February 27th (Gérald Kembellec and Joaquine Barbet)

Full Workshop Recording

Introduction - Gérald Kembellec

His vision was quite focused on the quality of content versus form and the primary role of the "Author/Editor/Documentalist" relationship in the serial documentation of online journals to ensure good discoverability.

Ongoing project at INTD at CNAM: new editorial modes, particularly questioning AI.

>Multimodal intelligibility of scholarly hypertext: the role of the documentalist. A necessary collaboration for serial documentation in the scientific editorial chain. H2PTM, Oct 2021, Paris, France. https://hal.science/hal-03419892/

His stance in 2021 is presented in the cited article. A critical position that is less relevant today as it was mainly focused on the opposition of content versus form in the use of AI. However these are tools that we are bound to use if we want to move forward, as they are of our time.

The workshop addresses two points:

  1. Presentation by Joaquine Barbet of the conclusions of her project supervised by the Center for Active Pedagogical Resources (CREPAC) at CNAM Paris. Human Resources issue: human indexers responsible for synthesis were replaced for economic reasons, notably because funders favor trendy terms like "data" over "archives." Monitoring and experimentation project in library science/archival science regarding article summarization. Replacing archiving and summary production with automated solutions has produced stopgap measures that also need to be observed.
  2. A second point will address the use of semantic web models with AI methods, following an approach outlined by David Shotton’s 2009 Semantic publishing. Discussion proposed on producing high-quality scientific publications in both content and form and engaging in knowledge discovery with a commitment to quality.

Automation Project for Summary Production - Joaquine Barbet

Barbet, Joaquine. "Generative AI: the future of documentary productions? Automating summaries Explorations and proof of concept within Crepac". Thesis for the Professional Title "Documentary Engineering Project Manager", 2024. cf. Related files.

Introduction and Methodology

Joaquine Barbet: study of the potential of generative AI for documentary productions, particularly for scientific article summarization.

  1. monitoring technologies
  2. practical application, with a feasibility study

Aim to understand how to use these technologies in documentary practices, knowing that the industry quickly adopted AI, notably as a marketing argument "powered by AI": but what role could these tools have for information professionals?

At CREPAC, the project was carried out by one person full-time for 3 months. Exploratory and experimental work on automating article summaries. Documentary inflation and lack of human resources led to delays in processing articles by the human indexers.

Criteria:

Objectives:

Methodology:

inspired by the AI lab of the Library of Congress. Strategy "Understand, Experiment, Implement"

Inclusion of analysts in reflection and evaluation of test results: production of an evaluation grid to determine the quality of AI-generated summaries, evaluation with a Likert scale (A scale from psychology that rates an item from 1 to 5: from "strongly disagree" to "strongly agree"

Corpus: texts in French and English because although summaries are always in French, articles can be in English or even German. Texts of different lengths to evaluate tool performance.

Tools tested:

Frédéric Clavert (in the chat): No trials with locally installable models (like llama)

Marcello Vitali-Rosati (in the chat): my question is also: did you use the LLM or the chat application based on the LLM?

Joaquine Barbet: Legal questions: copyright issue related to the summaries produced and the articles used, particularly with the use of OpenAI and Mistral Chats. What legal exploitation of copyrighted texts? This exploitation is authorized for non-commercial use. To prevent any legal conflict, use of royalty-free articles.

Results

Joaquine Barbet: The first tests may seem impressive, but the evaluation phase shows that while the output is formally correct, ChatGPT tends to overinterpret the texts.

What role do we want generative AI to play?

Hypothesis: the role occupied by "tedious and automatable" tasks.

Therefore: is summary production a "tedious and automatable" task?

Gérald Kembellec (chat): The documentalist indexer who trains the model could be "uberized" by the tool

Marcello Vitali-Rosati (chat): in my opinion, the question is not whether AI "can" replace. Yes, it can, of course. The question is: do we "want" to replace. The question is not what is automatable, but what we want to automate. What are the values? If we seek productivity, a machine will always be faster and more efficient. But for example: no one will read the text. We will reduce ourselves to writing bullet lists, asking ChatGPT to write us an article and then asking ChatGPT to turn the article into bullet points...

Joaquine Barbet: "Turnkey" tools were quickly abandoned because these tools do not allow the same modulation as a prompt. Automation of indexing and summarization via an adapted prompt.

Gérald Kembellec: Using this type of tool is viable if the prompt is written by a subject specialist, as it must be adapted to specific needs.

Discussion

Marcello Vitali-Rosati:

Olivier Le Deuff (chat): She loves it: the LLM of love! (Elle elle l’aime: le LLM de l’amour!)

Gérald Kembellec: It is important to refocus the debate, especially because the tendency to apply the methodological rationalization of hard sciences to human sciences justifies their disappearance. Or at least leads to the conclusions of Pierre Mounier. But we are condemned to adapt to productivity/cost reduction requirements to survive as cultural/academic institutions.

Joaquine Barbet: it is not about producing summaries for the sake of producing summaries but to offer as many articles as possible to students and invite them to read through these summaries.

Gérald Kembellec: Unfortunately, economic constraints prevail in the debate.

Servanne Monjour: What is a summary? Does the prompt define it? Do you perceive a change in the function of the summary object itself? Notably because general public tools like ChatGPT tend to create pastiches, distort the text, and adopt journalistic functions. We observe in student exercises on journalistic articles using ChatGPT that the notions of conclusion, etc., is not clearly conveyed. There are a priori as many ways to summarize as there are editorial practices (and journals).

Olivier Le Deuff (chat): indicative summary that indicates the content of the document

Gérald Kembellec (chat): absolutely Olivier

Servanne Monjour (chat): ok, but in some disciplines, that’s not always the case. In literature, I often see summaries that pose the problematic without the results.

Gérald Kembellec (chat): We’re almost on methodological plagiarism

Olivier Le Deuff (chat): before, during exams, if you made one word too many in the summary, it was directly below average or even zero with some strict graders. In fact, will titles and summaries in the future be reviewed by AIs for SEO reasons, a bit like online press?

Servanne Monjour (chat): yes, I had the same issue with a professor who deducted a point for each missing dot on the Js.

Gérald Kembellec (chat): The notion of "calculated authority"

Olivier Le Deuff (chat): except that it’s not really authority but popularity in my opinion...

Joaquine Barbet: the prompt is based on the work of indexers: very short summaries. Definition of a context (which audience, which objective) and a format with realization steps: preliminary analysis before producing the summary.

Marcello Vitali-Rosati: Relevance defined by the algorithm, example of Page Ranks enforced by Google. Same logic here: the "good" summary risks becoming the one produced by ChatGPT, i.e., a self-fulfilling prophecy.

Cardon, Dominique, et Liz Carey-Libbrecht. « Inside the Mind of PageRank:A study of Google’s algorithm ». Réseaux 177, nᵒ 1 (2013): 63‑95.

Olivier Le Deuff (chat): yes but I disagree with Cardon who ultimately adopts Google’s point of view, I take Arendt’s point of view.

Davin Baragiotta (chat, question relayed orally):

With the observation of lower quality (e.g., overinterpretation by bot), what approach is planned? accept it as is (100% automatic), quality control of summaries by documentalist before publication of the summary (semi-automatic, assisted publication)?

Joaquine Barbet: it is currently an aid to summary production rather than a total replacement.

Marie-Alice Belle (editorial director for journal Meta) (chat): For some time, we have been receiving book reviews apparently signed by colleagues but which suggest that they were largely produced by an AI. The stakes are different from those of indexing, but it seemed relevant to mention the question of the use of these technologies in the dissemination of scientific knowledge.

Gérald Kembellec (chat): @Marie-Alice Belle: I am speechless

Olivier Le Deuff (chat): I will write review of l’Éloge du bug with chatgpt, but I will put it as a co-signer

Marcello Vitali-Rosati (chat): no, it’s you who signs! if you want to sign. The idea is just: you sign what you assume responsibility for

(citing previous workshop exchanges): As long as someone signs, they assume responsibility for the content produced.

Olivier Le Deuff (chat): in any case, the best is to write the reviews of your own books and have them signed by others

Gérald Kembellec (chat): Co-authorship. The author would have an "editorial" vision.

Frédéric Clavert: From the moment of signing, we assume legal and intellectual responsibility for the content (ChatGPT cannot be co-signers). In this context, the question of authorship is the question of assuming responsibility for the content.

Marcello Vitali-Rosati: it depends on the authorial function: if it is mainly about responsibility, then it doesn’t matter how the text is produced. Legal responsibility.

Marie-Alice Belle (chat): it seems to me that there is a question of recognition of authority here; a summary seems to be considered as "purely informative" text while a review is supposed to be mediated by human subjectivity. But do we always agree on these distinctions?

Olivier Le Deuff (chat): what about article evaluations or article proposals?

Frédéric Clavert: Responsibility is not authorship. For the review, we could ask the same question raised by Servanne about the summary. A review for Lectures is very close to a summary. A review for the Annales should NOT be a summary, it is stated in the instructions. (It is also more work and more satisfying to write a review for the Annales).

Gérald Kembellec: A review is an argued, subjective summary, which is theoretically not the case with generative AIs. An AI would have a rather conservative view in the sense that the model does not question but simply restitutes the content, unlike a human who has an interpretation that is not just a transcription of the content.

Marcello Vitali-Rosati: We can control its temperature and imagine that a model set to a very high temperature can produce "original" content in the sense of something less consensual. But parameterization is not at stake in mainstream applications.

Olivier Le Deuff (chat)

Frédéric Clavert (chat): It’s not so much an idea as a common practice, either through online services or things like openwebui that you can connect to ollama (locally installable models) with an interface for RAG with PDF documents, for example.

Olivier Le Deuff (chat): yes but I was referring to the idea of integrating it into the revue3.0 project. Is it an opportune path or not at all? I don’t have the answer

Frédéric Clavert (chat): @Olivier ah, oops, sorry. We’re thinking about it at JDH in any case.

Gérald Kembellec: Wishes to evoke briefly the second topic : about sub-layer and documenting the code in terms of content/form, do "sub-layer" documentation, "the digital palimpsest" to make discoverability much more effective. Initial tests: doing schema.org on some articles by disambiguating certain authors or concepts, asking ChatGPT to generate code that will be "telling" in the sense of a code designed for the improvement of discoverability for scraping bots like Google Scholar. Concept of "equipped reading" by Jean-Edouard Bigot: implemented semantic plugins that bring up complementary information to concentrate information without moving from one medium to another. Could this model be implemented with a hybridization of LLMs and Shotton-type or Web Semantic modeling?

Frédéric Clavert (chat): question of AI plugin to give more context to author.ice.s. You have to be a specialist in the subject to get correct results.

Gérald Kembellec (chat): Indeed, it needs to be verified 😉

Frédéric Clavert (chat):@ Gérald My colleague Gabor Toth does a lot of NLP with LLMs, and that would allow for schema.org, I suppose. However, the small problem is that you have to justify and evaluate the results.

References and Useful Links

Full Workshop Recordin

Methodology "Understand, Experiment, Implement" for integrating new technologies: « LC Labs AI Planning Framework - LC Labs AI Planning Framework ». Accessed February 27, 2025. https://libraryofcongress.github.io/labs-ai-framework/.

Likert scale for evaluation: Rensis Likert. A Technique for the Measurement of Attitudes. Accessed February 27, 2025. http://archive.org/details/likert-1932.

Barbet, Joaquine. « IA génératives : le futur des productions documentaires ? L’automatisation de résumé Explorations et preuve de concept au sein du Crepac » . Thesis

for the Professional Title "Documentary Engineering Project Manager".(2024) cf. Related files.

Broudoux, Évelyne, et Madjid Ihadjadene. « Comment la confiance peut-elle s’exercer dans les « autorités calculées » ? » Hermès, La Revue 88, nᵒ 2 (16 décembre 2021): 202‑6.

Broudoux, Evelyne. « La prise de conscience du pouvoir de l’outil : l’auteur face à ses pratiques ». Billet. Autorités calculées (blog), 13 mai 2013. https://doi.org/10.58079/ahei.

Cardon, Dominique, et Liz Carey-Libbrecht. « Inside the Mind of PageRank:A study of Google’s algorithm ». Réseaux 177, nᵒ 1 (2013): 63‑95.

Kembellec, Gérald. « Intelligibilité multimodale de l’hypertexte érudit : le rôle du documentaliste. Une nécessaire collaboration pour la documentarisation sérielle dans la chaîne éditoriale scientifique ». In H2PTM. Paris, France, 2021. https://hal.science/hal-03419892.

Related files