Minutes of the AI workshop of January 30, 2025 (Frédéric Clavert)

Introduction to AI Workshops by Marcello Vitali-Rosati

The idea behind these workshops is to initiate common exchanges and reflections on the implementation of algorithms of what is called AI in our writing and scientific article publication practices.

In the DH community, reflection on algorithm implementation dates back 70 years. Although the introduction of transformers has brought changes, it cannot be called a revolution. The objective of this workshop is to move away from the generality that "AI = ChatGPT" and the current trend of adopting mainstream applications without reflection.

These workshops aim to be a place for reflection on the theory and infrastructural questions related to the implementation of AI-related technologies. It is also an opportunity to reflect on the needs in this area and potentially to fund pilot projects and experiments within the framework of the Revue3.0 partnership.

Presentation by Frédéric Clavert of the "explain code" functionality of JDH

The Journal of Digital History was founded in 2020, with its first publication in 2021. The integration of AI technologies has been more focused on the readers than on the authors.

Multilayer articles :

narrative level (text)
hermeneutic
code or data

Format: Jupyter notebook (didactic format with code and markdown cells). The different layers are organized with a tagging system.

Narrative layer

Hermeneutic layer

Possibility to execute the article on mybinder to test the author’s hypotheses.

Alternative: retrieve the source code from Github.

Reproducibility issue.

New design (beta phase): additional layer with "data&code"

Latest integrations:

Possibility to run the code directly in the journal’s interface (integration of mybinder in the article’s page).
-xplain code: AI functionality providing real-time code explanation.
Use of Open Source models like Llama via [Groq.com](http://Groq.com). - Interest: the explanations provided by the authors are not always explicit. - Prompt: explain the code of the cell to a beginner → reader-oriented. Exact prompt: "PYTHON_BEGINNER": "You are a helpful assistant to explain Python code easily. You reply with very short answers.". [Example of a generated answer.](https://pad.libreon.fr/uploads/6a2c1749-f309-44c0-9a1e-f1835f4085ba.png)

Upcoming integrations:

Reader functionality: LLMs to write article summaries

Article illustrating the functionalities:

Eriksson, M., Skotare, T., & Snickars, P. (2024). Tracking and tracing audiovisual reuse: Introducing the Video Reuse Detector. Journal of Digital History, 3(1). https://doi.org/10.1515/JDH-2024-0009?locatt=label:JDHFULL

Summary on the narrative layer and not the hermeneutic layer (view renderer)

Comparison of several models shows that Gemini is the most effective for these tasks.

Elisabeth Guerard: these functionalities are for the reader, but also for the reviewers & technical reviewers.

AI usage survey @C2DH: concludes that everyone uses or will use generative AI. Frequency of use is mainly daily or weekly. Survey from a few months ago: mainly ChatGPT and Microsoft Copilot/Github as it is provided by the university and easy to access.

Discussion

Servanne Monjour: What is the process for validating the results provided by the AI?

Elisabeth Guerard:

Comparison of the summary generated by the AI with the abstract provided by the author.
No validation protocol for the "explain code" functionality.

Marcello Vitali-Rosati:

The problem with validating code explained by a probabilistic algorithm is that we fall into the paradigm of likelihood rather than truth/verification.
Example of notebookLM which does RAG (Retrieval Augmented Generation) with Gemini: the results of RAG or code explanation are of such high quality that to understand if there is a divergence from the intention, one must be a specialist.
How to distinguish "highly similar" from "true"? We are witnessing a paradigm shift in favor of the likely.

Frédéric Clavert: The explanations are aimed at beginners, i.e., they provide a starting point for understanding for people with low or no digital literacy.

These systems should not be considered as giving a definitive answer.

The purpose of the tool should be probabilistic maieutics.

Interaction with a chatbot is a discussion that has no social value.

Nicolas Sauret: Chatbots impose a design (conversational mode): we have mimicked a social discussion when it is not the case → reflections are necessary on this point.

Aurélien Barra: Interface question, the metaphor of discussion is part of the success of these tools. Oracular dimension (click→response).

Response: The explanations given by the "explain code" feature will be different with each request but will often remain very similar.

Aurélien Berra:

The new interface provides a framework over which editors have no control: it is ultimately a third instance of interpretation (editor/author + AI).
Why do we want to understand the code? What can we gain from the explanation of elements that will always remain trivial?

Frédéric Clavert:

The explanation provided encourages the reader to increase their digital literacy.

Aurélien Berra:

Couldn’t AI tools help the editorial team meet acceptance criteria?
Could a tool be developed to detect automatically generated code or text?

Frédéric Clavert:

Regarding support for editing and acceptance criteria, the criteria are quite broad (e.g., no character count)
Code evaluation question is more difficult. No guarantee that evaluators are capable of evaluating the code either.
Question of copyright / LLM: not resolved at the journal level.
Detection of AI use: copy-pasting is an authorial act.

The real work consists of dialoguing with the AI: validation after review from the human.

Nicolas Sauret: Why look for a "chatbot interface" when a static text generated in advance and integrated at publication can be sufficient?

And furthermore: what about a functionality for conversing with the article?

Frédéric Clavert: The tool is still in the reflection phase, particularly on the use of the "explain code" button. Already, the act of pressing the button creates a distinction between what is written by the author and the generated explanation.

Elisabeth Guerard: It is also the developers’ wish to experiment with the integration of a Flask API and to play with current technologies.

Servanne Monjour: The tool was implemented for readers, but who are these readers? What statistical elements do we have on the readership?

Frédéric Clavert: The JDH has data on its readership but not centered on the AI functionality. What we know is that the time spent on a page/article is 7 to 9 minutes on average compared to 1.5 minutes on average on the web. This means that readers stay and read. New version (3) appreciated.

Elisabeth Guerard (in the chat): With Matomo we can now track external links, which we could not see with Analytics, and the use of MyBinder may have been seen.

Tools mentioned

Mybinder: https://mybinder.org/ (generates Jupyter notebooks from a Git repo)

Groq: https://groq.com/ (server for AI inference)

Matomo: https://fr.matomo.org/ (alternative to Google Analytics that also allows tracking of external links used by users on the site)