EC Library Guide on generative artificial intelligence and large language models: Selected articles
Selected research papers
- Are large language models intelligent? Are humans?
Häggström, O., Computer Sciences & Mathematics Forum, 8 (1), 2023.
Claims that large language models lack intelligence are abundant in current AI discourse. To the extent that the claims are supported by arguments, these usually amount to claims that the models (a) lack common sense, (b) know only facts they have been trained with, (c) are merely matrix multiplications, (d) only predict the next word in a chain, (e) lack a world model, (f) have no grounding of symbols, (g) lack creativity, or (h) lack consciousness. Here, each of these arguments is applied, with minor modifications, to demonstrate that humans also lack intelligence. This should make us suspicious of the validity of these arguments.
- Artificial intelligence and the affective labour of understanding: The intimate moderation of a language model
Perrotta, C., Selwyn, N. and Ewin, C., New Media & Society, (Only first), 2022.
Interest in artificial intelligence (AI) language models has grown considerably following the release of ‘generative pre-trained transformer’ (GPT). Framing AI as an extractive technology, this article details how GPT harnesses human labour and sensemaking at two stages: (1) during training when the algorithm ‘learns’ biased communicative patterns extracted from the Internet and (2) during usage when humans write alongside the AI. This second phase is framed critically as a form of unequal ‘affective labour’ where the AI imposes narrow and biased conditions for the interaction to unfold, and then exploits the resulting affective turbulence to sustain its simulation of autonomous performance. Empirically, this article draws on an in-depth case study where a human engaged with an AI writing tool, while the researchers recorded the interactions and collected qualitative data about perceptions, frictions and emotions.
- ChatGPT: Not intelligent
Smith, B., in AI: From Robotics to Philosophy the Intelligent Robots of the Future – or human evolutionary development based on AI foundations, 2023.
In the book Why Machines Will Never Rule the World, Jobst Landgrebe and Barry Smith argue that the efforts of many in the artificial intelligence community to create an artificial general intelligence (AGI) are doomed to fail. Here “AGI” is defined as referring to a machine that would exhibit cognitive capacities equivalent to, or even surpassing, those of human beings. The argument for the impossibility of such a machine has the following form: 1. analysing the properties of complex systems such as the Earth’s weather system or the traffic system of Istanbul, 2. they demonstrate that there are severe limits on our ability to predict mathematically the behaviours of systems of this sort, 3. they show that these limits then determine also the abilities of computers to make such predictions.
- ChatGPT and the AI Act
Helberger, N. and Diakopoulos, N., Internet Policy Review, 12 (1), 2023.
It is not easy being a tech regulator these days. The European institutions are working hard towards finalising the AI Act in autumn, and then generative AI systems like ChatGPT come along! In this essay, we comment the European AI Act by arguing that its current risk-based approach is too limited for facing ChatGPT & co.
- ChatGPT and the future of health policy analysis: Potential and pitfalls of using ChatGPT in policymaking
Sifat, R.I., Annals of Biomedical Engineering, 51 (7), 2023.
Scholars increasingly rely on new artificial intelligence models for convenience and simple access to necessities due to the rapid evolution of scientific literature and technology. The invention of ChatGPT by OpenAI stands out as a key example of how significant advances in large language model technology have recently changed the field of artificial intelligence (AI). Since ChatGPT’s development, it has been tested by multiple sectors on various topics to see how well it functions in a natural and conversational mode. The crucial question is how much ChatGPT can influence global health policy analysis. In this article, the researcher briefly explains ChatGPT’s potential and the difficulties that users, such as researchers or policymakers, may continue to face.
- ChatGPT and the generation of digitally born “knowledge”: How does a generative AI language model interpret cultural heritage values?
Spennemann, D.H.R., Knowledge , 3 (3), 2023.
The public release of ChatGPT, a generative artificial intelligence language model, caused wide-spread public interest in its abilities but also concern about the implications of the application on academia, depending on whether it was deemed benevolent (e.g., supporting analysis and simplification of tasks) or malevolent (e.g., assignment writing and academic misconduct). While ChatGPT has been shown to provide answers of sufficient quality to pass some university exams, its capacity to write essays that require an exploration of value concepts is unknown. This paper presents the results of a study where ChatGPT-4 (released May 2023) was tasked with writing a 1500-word essay to discuss the nature of values used in the assessment of cultural heritage significance. Based on an analysis of 36 iterations, ChatGPT wrote essays of limited length with about 50% of the stipulated word count being primarily descriptive and without any depth or complexity. The concepts, which are often flawed and suffer from inverted logic, are presented in an arbitrary sequence with limited coherence and without any defined line of argument.
Given that it is a generative language model, ChatGPT often splits concepts and uses one or more words to develop tangential arguments. While ChatGPT provides references as tasked, many are fictitious, albeit with plausible authors and titles. At present, ChatGPT has the ability to critique its own work but seems unable to incorporate that critique in a meaningful way to improve a previous draft. Setting aside conceptual flaws such as inverted logic, several of the essays could possibly pass as a junior high school assignment but fall short of what would be expected in senior school, let alone at a college or university level.
- ChatGPT and the rise of large language models: The new AI-driven infodemic threat in public health
De Angelis, L., Baglivo, F., Arzilli, G., et al., Frontiers in Public Health, 11 (February), 2023.
Large Language Models (LLMs) have recently gathered attention with the release of ChatGPT, a user-centred chatbot released by OpenAI. In this Viewpoint, we retrace the evolution of LLMs to understand the revolution brought by ChatGPT. The opportunities offered by LLMs in supporting scientific research are multiple and various models have already been tested in Natural Language Processing (NLP) tasks in this domain. Alarming ethical and practical challenges emerge from the use of LLMs, particularly in the medical field for the potential impact on public health. Infodemic is a trending topic in public health and the ability of LLMs to rapidly produce vast amounts of text could leverage misinformation spread at an unprecedented scale, this could create an “AI-driven infodemic”, a novel public health threat. Policies to contrast this phenomenon need to be rapidly elaborated, the inability to accurately detect artificial-intelligence-produced text is an unresolved issue.
- Competition between AI foundation models: Dynamics and policy recommendations
Schrepel, T. and Pentland, A., MIT Connection Science Working Paper, (1), 2023.
Generative AI is set to become a critical technology for our modern economies. If we are currently experiencing a strong, dynamic competition between the underlying foundation models, legal institutions have an important role to play in ensuring that the spring of foundation models does not turn into a winter with an ecosystem frozen by a handful of players.
- The ethics of ChatGPT: Exploring the ethical issues of an emerging technology
Kshetri, N., Dwivedi, Y.K., Davenport, T.H., et al., International Journal of Information Management, 74 (February), 2024.
"Artificial Intelligence” in all its forms has emerged as a transformative technology that is in the process of reshaping many aspects of industry and wider society at a global level. It has evolved from a concept to a technology that is driving innovation, transforming productivity and disrupting existing business models across numerous sectors. The industrial and societal impact of AI is profound and multifaceted, offering opportunities for growth, efficiency, and improved healthcare, but also raising ethical and societal challenges as the method is integrated into many aspects of human life and work. This editorial is developed by contributors of the 4th Royal Society Yusef Hamied Workshop ( in 2023 devoted to Artificial Intelligence), designed to enhance collaboration between Indian and the UK scientists and to explore future research opportunities. The insights shared at the workshop are shared here.
- Exploring the effects of generative AI on academic libraries
Maurits van der Graaf, Zenodo, 2024.
What will be the effects of Generative Artificial Intelligence on Academic Libraries? This key question for everyone involved in academic libraries is addressed in this White Paper. After a description of what a librarian should know about generative AI, we discuss the effects on the environment of academic libraries: on science and on the behaviour of library users. We foresee a paradigm shift in discovery and discusse the impact on the usage of library collections. In a next step of this explorative tour, we discuss the potential uses of AI in metadata production and the opening up of heritage collections for Q&A interactions. Finally, we touch on a few issues that so far has gotten little attention in the literature: the question of allowing access by commercial AI chatbots to library collections, the need to redevelop library course on information literacy, the effects of AI on Open Science and the effects on the organisation of academic libraries.
- Extrapolation and AI transparency: Why machine learning models should reveal when they make decisions beyond their training
Cao, X. and Yousefzadeh, R., Big Data & Society, 10 (1), 2023.
The right to artificial intelligence (AI) explainability has consolidated as a consensus in the research community and policy-making. However, a key component of explainability has been missing: extrapolation, which can reveal whether a model is making inferences beyond the boundaries of its training. We report that AI models extrapolate outside their range of familiar data, frequently and without notifying the users and stakeholders. Knowing whether a model has extrapolated or not is a fundamental insight that should be included in explaining AI models in favor of transparency, accountability, and fairness. Instead of dwelling on the negatives, we offer ways to clear the roadblocks in promoting AI transparency. Our commentary accompanies practical clauses useful to include in AI regulations such as the AI Bill of Rights, the National AI Initiative Act in the United States, and the AI Act by the European Commission.
- Generative AI entails a credit–blame asymmetry
Porsdam Mann, S., Earp, B.D., Nyholm, S., et al., Nature Machine Intelligence, 5 (5), 2023.
Generative AI programs can produce high-quality written and visual content that may be used for good or ill. We argue that a credit–blame asymmetry arises for assigning responsibility for these outputs and discuss urgent ethical and policy implications focused on large-scale language models.
- Generative artificial intelligence in marketing: Applications, opportunities, challenges, and research agenda
Kshetri, N., Dwivedi, Y.K., Davenport, T.H., et al., International Journal of Information Management, (October), 2023.
While all functional areas in organizations are benefiting from the recent development in generative artificial intelligence (GAI), marketing has been particularly affected positively by this breakthrough innovation. However, scholars have not paid attention to the transformative impacts GAI has on marketing activities. This editorial article aims to fill this void. It outlines the current state of generative artificial intelligence in marketing. The article discusses the facilitators and barriers for the use of generative artificial intelligence in marketing. It highlights the effectiveness of insights generated by GAI in personalizing content and offerings and argues that marketing content generated by GAI is likely to be more personally relevant than that produced by earlier generations of digital technologies. The article explains how higher efficiency and productivity of marketing activities can be achieved by using GAI to create marketing content. It also describes the roles of insights and marketing content generated by GAI to improve the sales lead generation process. Implications for research, practice and policy are also discussed.
- Halting generative AI advancements may slow down progress in climate research
Larosa, F., Hoyas, S., García-Martínez, J., et al., Nature Climate Change, 13 (6), 2023.
Large language models offer an opportunity to advance climate and sustainability research. We believe that a focus on regulation and validation of generative artificial intelligence models would provide more benefits to society than a halt in development.
- Large language models: Their success and impac
Makridakis, S., Petropoulos, F. and Kang, Y., Forescasting, 5 (3), 2023.
ChatGPT, a state-of-the-art large language model (LLM), is revolutionizing the AI field by exhibiting humanlike skills in a range of tasks that include understanding and answering natural language questions, translating languages, writing code, passing professional exams, and even composing poetry, among its other abilities. ChatGPT has gained an immense popularity since its launch, amassing 100 million active monthly users in just two months, thereby establishing itself as the fastest-growing consumer application to date. This paper discusses the reasons for its success as well as the future prospects of similar large language models (LLMs), with an emphasis on their potential impact on forecasting, a specialized and domain-specific field.
This is achieved by first comparing the correctness of the answers of the standard ChatGPT and a custom one, trained using published papers from a subfield of forecasting where the answers to the questions asked are known, allowing us to determine their correctness compared to those of the two ChatGPT versions. Then, we also compare the responses of the two versions on how judgmental adjustments to the statistical/ML forecasts should be applied by firms to improve their accuracy. The paper concludes by considering the future of LLMs and their impact on all aspects of our life and work, as well as on the field of forecasting specifically. Finally, the conclusion section is generated by ChatGPT, which was provided with a condensed version of this paper and asked to write a four-paragraph conclusion.
- Large language models encode clinical knowledge
Singhal, K., Azizi, S., Tu, T., et al., Nature, 620 (7972), 2023.
Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of models typically rely on automated evaluations based on limited benchmarks. Here, to address these limitations, the authors present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries and a new dataset of medical questions searched online, HealthSearchQA. They propose a human evaluation framework for model answers along multiple axes including factuality, comprehension, reasoning, possible harm and bias. It shows that comprehension, knowledge recall and reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal limitations of today’s models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLMs for clinical applications.
In addition, the authors evaluate Pathways Language Model1 (PaLM, a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM2 on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA3, MedMCQA4, PubMedQA5 and Measuring Massive Multitask Language Understanding (MMLU) clinical topics6), including 67.6% accuracy on MedQA (US Medical Licensing Exam-style questions), surpassing the prior state of the art by more than 17%. However, human evaluation reveals key gaps. To resolve this, they introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains, using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians.
- Ontologies, Arguments, and Large-Language Models
Beverley, J., Franda, F., Karray, H., Maxwell, D., Benson, C. et al., 14th International Conference on Formal Ontology in Information Systems, 2024.
The explosion of interest in large language models (LLMs) has been accompanied by concerns over the extent to which generated outputs can be trusted, owing to the prevalence of bias, hallucinations, and so forth. Accordingly, there is a growing interest in knowledge representation (KR) tools – ontologies and knowledge graphs, in particular - to make LLMs more trustworthy. This rests on the long history of KR in constructing human-comprehensible justification for model outputs as well as traceability concerning the impact of evidence on other evidence. Understanding the nature of arguments and argumentation is critical to justification and traceability, especially when LLM output conflicts with what is expected by users. The central contribution of this article is to extend the Arguments Ontology (ARGO) - an ontology of terms and relations specific to the domain of argumentation and evidence broadly construed - into the space of LLM outputs in the interest of promoting justification and traceability. The authors outline a strategy for creating ARGO- based ‘blueprints’ to help LLM users explore justifications for outputs. And conclude by describing critical applications at the intersection of LLM and knowledge representation research.
- An overview of the capabilities of ChatGPT for medical writing and its implications for academic integrity
Liu, H., Azam, M., Bin Naeem, S., et al., Health Information and Libraries Journal, 40 (4), 2023.
The artificial intelligence (AI) tool ChatGPT, which is based on a large language model (LLM), is gaining popularity in academic institutions, notably in the medical field. This article provides a brief overview of the capabilities of ChatGPT for medical writing and its implications for academic integrity. It provides a list of AI generative tools, common use of AI generative tools for medical writing, and provides a list of AI generative text detection tools. It also provides recommendations for policymakers, information professionals, and medical faculty for the constructive use of AI generative tools and related technology. It also highlights the role of health sciences librarians and educators in protecting students from generating text through ChatGPT in their academic work.
- Popping the chatbot hype balloon
Goudarzi, S., Bulletin of the Atomic Scientists, 79 (5), 2023.
Since ChatGPT’s release in November 2022, artificial intelligence has come into the spotlight. Inspiring both fascination and fear, chatbots have stirred debates among researchers, developers, and policy makers. The concerns range from concrete and tangible ones—which include replication of existing biases and discrimination at scale, harvesting personal data, and spreading misinformation—to more existential fears that their development will lead to machines with human-like cognitive abilities. Understanding how chatbots work and the human labor and data involved can better help evaluate the validity of concerns surrounding these systems, which although innovative, are hardly the stuff of science fiction.
- The power of generative AI: A review of requirements, models, input–output formats, evaluation metrics, and challenges
Bandi, A., Adapa, P.V. and Kuchi, Y.E., Future Internet, 15 (8), 2023.
Generative artificial intelligence (AI) has emerged as a powerful technology with numerous applications in various domains. There is a need to identify the requirements and evaluation metrics for generative AI models designed for specific tasks. The purpose of the research aims to investigate the fundamental aspects of generative AI systems, including their requirements, models, input–output formats, and evaluation metrics. The study addresses key research questions and presents comprehensive insights to guide researchers, developers, and practitioners in the field. Firstly, the requirements necessary for implementing generative AI systems are examined and categorized into three distinct categories: hardware, software, and user experience. Furthermore, the study explores the different types of generative AI models described in the literature by presenting a taxonomy based on architectural characteristics, such as variational autoencoders (VAEs), generative adversarial networks (GANs), diffusion models, transformers, language models, normalizing flow models, and hybrid models. A comprehensive classification of input and output formats used in generative AI systems is also provided.
Moreover, the research proposes a classification system based on output types and discusses commonly used evaluation metrics in generative AI. The findings contribute to advancements in the field, enabling researchers, developers, and practitioners to effectively implement and evaluate generative AI models for various applications. The significance of the research lies in understanding that generative AI system requirements are crucial for effective planning, design, and optimal performance. A taxonomy of models aids in selecting suitable options and driving advancements. Classifying input–output formats enables leveraging diverse formats for customized systems, while evaluation metrics establish standardized methods to assess model quality and performance.
- Regulating ChatGPT and other large generative AI models
Hacker, P., Engel, A. and Mauer, M., ACM International Conference Proceeding Series, (June), 2023.
Large generative AI models (LGAIMs), such as ChatGPT, GPT-4 or Stable Diffusion, are rapidly transforming the way we communicate, illustrate, and create. However, AI regulation, in the EU and beyond, has primarily focused on conventional AI models, not LGAIMs. This paper will situate these new generative models in the current debate on trustworthy AI regulation, and ask how the law can be tailored to their capabilities. After laying technical foundations, the legal part of the paper proceeds in four steps, covering (1) direct regulation, (2) data protection, (3) content moderation, and (4) policy proposals. It suggests a novel terminology to capture the AI value chain in LGAIM settings by differentiating between LGAIM developers, deployers, professional and non-professional users, as well as recipients of LGAIM output.
The authors tailor regulatory duties to these different actors along the value chain and suggest strategies to ensure that LGAIMs are trustworthy and deployed for the benefit of society at large. Rules in the AI Act and other direct regulation must match the specificities of pre-trained models. The paper argues for three layers of obligations concerning LGAIMs (minimum standards for all LGAIMs; high-risk obligations for high-risk use cases; collaborations along the AI value chain). In general, regulation should focus on concrete high-risk applications, and not the pre-trained model itself, and should include (i) obligations regarding transparency and (ii) risk management. Non-discrimination provisions (iii) may, however, apply to LGAIM developers. Lastly, (iv) the core of the DSA's content moderation rules should be expanded to cover LGAIMs. This includes notice and action mechanisms, and trusted flaggers.
- Structured like a language model: Analysing AI as an automated subject
Magee, L., Arora, V. and Munn, L., Big Data & Society, 10 (2), 2023.
Drawing from the resources of psychoanalysis and critical media studies, in this article we develop an analysis of large language models (LLMs) as ‘automated subjects’. The article argues the intentional fictional projection of subjectivity onto LLMs can yield an alternate frame through which artificial intelligence (AI) behaviour, including its productions of bias and harm, can be analysed. First, it introduces language models, discuss their significance and risks, and outline our case for interpreting model design and outputs with support from psychoanalytic concepts. Then it traces a brief history of language models, culminating with the releases, in 2022, of systems that realise ‘state-of-the-art’ natural language processing performance. Finally, it engages with one such system, OpenAI's InstructGPT, as a case study, detailing the layers of its construction and conducting exploratory and semi-structured interviews with chatbots.
These interviews probe the model's moral imperatives to be ‘helpful’, ‘truthful’ and ‘harmless’ by design. The model acts, the authors argue, as the condensation of often competing social desires, articulated through the internet and harvested into training data, which must then be regulated and repressed. This foundational structure can however be redirected via prompting, so that the model comes to identify with, and transfer, its commitments to the immediate human subject before it. In turn, these automated productions of language can lead to the human subject projecting agency upon the model, effecting occasionally further forms of countertransference. The article concludes that critical media methods and psychoanalytic theory together offer a productive frame for grasping the powerful new capacities of AI-driven language systems.
- Last Updated: Apr 14, 2025 9:41 AM
- URL: https://ec-europa-eu.libguides.com/llm-and-genAI
- Print Page