Welcome to my website. I work in Computational Literary Studies, employing machine learning and NLP techniques on large digitized text collections. My research explores the formal and historical dynamics of literary evolution, examining how intertextual relations shape the formation and transformation of literary genres.
My doctoral research focuses on French narrative fiction from the late nineteenth to the first half of the twentieth century, studying the emergence and reconfiguration of novelistic subgenres such as detective and adventure fiction. Using large-scale corpora, vector-space representations, and network analysis, I model the evolution of textual structures and character archetypes—notably the figure of the detective.
Research Interests
Computational Literary Studies
Natural Language Processing
Literary Genre Evolution
Intertextuality
Network Analysis
Digital Humanities
French Literature (19th-20th c.)
Recent Updates
2025/12 |
Attending
CHR2025 in Luxembourg (December 9-12)
Software & Projects
BookNLP-fr
A tailored NLP pipeline for processing 19th and 20th century French literature. Adapts BookNLP for French novels, including named entity recognition, coreference resolution, and character extraction.
GitHub |
Publication
Refined Gallica Corpus
A curated corpus of French novels from Gallica (BnF's digital library), cleaned and processed for computational literary analysis. Includes metadata enrichment and OCR quality filtering for 19th-20th century fiction. (link TBA)
Research Replication Materials
Code and data for reproducing key findings from my publications:
Operationalizing Canonicity |
Latent Structures of Intertextuality |
Detective Archetype
Featured Publications
Operationalizing Canonicity: A Quantitative Study of French 19th and 20th Century Literature
Barré, J. et al. (2023).
Journal of Cultural Analytics, 8(1).
DOI |
PDF
This figure zooms in on a single author, Colette, and maps each of her novels as a point in a 2D space built from their writing style (using the same kind of text-based features as in the main model). The blue dots are the novels that later became canonical, the orange ones are the rest. On this map, Colette’s canonical books cluster tightly together, mostly in a region corresponding to works from the late 1920s–1930s, while earlier popular series like Claudine sit farther away. This suggests that even within one author’s oeuvre, only a subset of texts shares the stylistic “signature” that institutions tend to select and preserve as canonical.
Latent Structures of Intertextuality in French Fiction
Barré, J. (2024).
Conference on Computational Humanities Research (CHR 2024), Aarhus, Denmark.
Proceedings |
ArXiv |
PDF
This figure shows how much each novel “resembles” other novels published before and after it, and compares two groups: canonical works (blue) and the rest of the archive (orange). For every book, we turn its text into a numerical vector and measure how similar it is to novels from different years, centering the graph on its publication date (0 on the x-axis). Both curves peak around the year of publication: books look most like the novels of their own time. But after that, the blue curve stays higher than the orange one: canonical novels remain more similar to later fiction than ordinary novels do. In other words, once they appear, canonical works keep “speaking the language” of future literature for longer, while most other books fade more quickly from the intertextual landscape.
Modeling the Birth of a Literary Archetype: The Case of the Detective Figure in French Fiction
Barré, J. et al. (2025).
Conference on Computational Humanities Research (CHR2025).
arXiv |
PDF
This figure shows all the detectives in our corpus as dots in a map built from their character embeddings—numerical fingerprints that summarize the verbs, adjectives and nouns most often used around their name in the novels. Three main clusters emerge: one brings together the early, very rational “puzzle-solver” detectives of the late 19th and early 20th century, another groups the more human, empathic investigators of the interwar and mid-century period (like Maigret), and a third contains the later hardboiled and néo-polar detectives, whose language is more physical, colloquial and tied to social violence. The fact that nearby dots tend to come from similar publication years shows that these clusters trace a historical trajectory: as the way authors write about detectives changes, their embeddings drift across the map and form these three successive generations of the archetype.