Schlagwort: Large Language Models

From Archives to Algorithms – How Research Transforms

Pre-digital age: Walter Höfer (left) and Library Director Otto Steiner (right) in the Institute’s catalogue room in the 1970s (photo: MPIL)

Researchers from all over the world have for the past 100 years contributed to international law using the resources of the Max Planck Institute for Comparative Public Law and International Law (MPIL). At its heart is one of the now biggest libraries for public international law, comparative public law, and European law in Europe. Yet, the techniques and methods of research have changed vastly since the Institute’s foundation in 1924. In the following, this post might prompt some nostalgia of lost books and broken type writers. Subsequently, it will look into the amazing possibilities that digitalisation, in particular recently hyped products that use artificial intelligence (AI), provides for.

What Research Actually Is

What did not change over all the years is the definition of research. In the Cambridge Dictionary “research” is referred to as “a detailed study of a subject, especially in order to discover (new) information or reach a (new) understanding”[1]. On the one hand, research aims to look at what others have written and creatively rearrange those findings into new ideas and hopefully ground-breaking discoveries. On the other hand, research means to create an understanding of the circumstances that are relevant to a question, of the law, the history, the society. As legal scholars we want to resolve longstanding legal issues, explain the law and its consequences, follow developments, and call for change where necessary. Research stands in between an idea the author would like to shed more light on, and the end result, usually a written piece like a book or journal article. To explore a thought thoroughly, we require knowledge about the written law as it stands, about cases from national and international courts and tribunals, about arguments from fellow scholars. Especially in international law this knowledge needs to extend to resources from all over the world.

With regards to the limited space of this format, this piece will focus on research as a way to gather literature. Other processes such as writing or editing will only be touched upon briefly.

How to Research Without the Internet

Librarian Ruth Fugger at the typewriter (1970s)[2]

For the younger generation, including myself, it is almost impossible to imagine a world without the internet. Most of MPIL’s current PhD students and postdocs grew up in an already digitalised world and certainly never had to write a book or journal article without access to the infinite knowledge provided for by the world wide web. But when looking at the amount of brilliant treatises, monographies, articles, and more from pre-internet times, one cannot deny that it must certainly have been possible to research extensively back then. However, the approach is in many ways not even remotely comparable anymore.

The library is until this day organised in a two-fold way. Firstly, there is the alphabetical order, sorted by the author of a piece. Secondly, there is a systematic order where the books are sorted by topic.[3] Previously, each available resource had its own index card, carefully labelled and sorted within stacks of cards, which were themselves organised into card index boxes. These boxes followed the overall two-fold system. One set of boxes was alphabetically sorted while another set conformed with the organisation of the library by topic. In addition, there were keyword indexes which helped to locate a certain subject matter. In doubt of where a certain book was placed or if one simply did not know which book to look for, this keyword index was the first stop. Certainly, the librarian as well as experienced colleagues could lead the way in case additional help was required to find suitable literature. A very helpful tool nowadays is the possibility of the digital OPAC to display the location of a certain book on a map. Without this, one must rely on printed floor maps or a knowledgeable colleague. In total, the endeavour to find a book took much longer than it does today with the help of some digital tools.

The old card catalogue in the basement of the Institute. It was used until 1998 and contains more than 1 million cards (photo: MPIL)

But the output process, too, has changed in many ways. Until not too long-ago submissions, from small book reviews to  dissertations, were written by hand or on a type writer. Oftentimes, the authors were supported by typists who converted dictated material into writing. Even an author to this very blog has submitted a handwritten contribution on paper requiring digitalisation by the editorial team. Contributions were, moreover, submitted in close contact to a supervisor or editor whereas nowadays upload forms or a simple e-mail are mostly preferred. When it comes to the editorial process, student assistants were tasked with proofreading texts for a spell and grammar check. Pieces that needed to be translated into a different language often came back only weeks later from a professional translator’s office. Today, automated checks for plagiarism are almost the standard, something that used to require a well-versed expert in the field of the submission when done manually. All these tasks can nowadays, often with the help of AI, be completed within minutes and less people involved.[4]

Digitalisation, Generative Artificial Intelligence and What it’s Worth

When the MPIL Heidelberg moved to a new building in 1996, its first website was introduced simultaneously. At the very beginning, the website merely displayed the opening hours of the library. Later on, the library’s catalogue was added and enabled the  public to search for titles and their availability at the Institute, a major step in digitalisation. The MPIL’s intranet for employee information and access to digitalised publications followed shortly after, and laid the foundation for digital communication at the MPIL.[5]

[pdf-embedder url=”https://mpil100.de/wp-content/uploads/2024/04/Virtuelles-Institut-Flyer.pdf” title=”Virtuelles Institut Flyer”]

The “virtual institute“. Brochure on the introduction of the Internet at the MPIL 1998 (photo: MPIL)

We see that there has been a major digital transformation at the MPIL over the past century. From typewriters to computers, from letters to emails, from books to online publications. These are only a few examples. Especially in the past year, the major breakthrough in AI accessibility has opened new, unknown possibilities for researchers. With the launch of tools such as ChatGPT, Claude or LLaMa, the world of AI has become instantaneously available for anyone with access to the internet, at this point at least 65% of the world’s population.[6] Not only can it be a useful tool for scientists to improve their work, but it has become its own field of research. Scholars are now more than ever writing about automated weapon systems, governments using software for decision-making and human rights being potentially impaired by deployment of AI.

This blogpost will focus on the capabilities of generative AI, in particular Large Language Models (LLMs), for our research, and try to shed some light on the usefulness of this technology. LLMs are designed to understand and generate human language and learn by so-called “deeplearning”. During a pre-training phase they process vast amounts of data and learn the relationships between words, grammar, and context to acquire this language understanding. When given input, the LLM utilises its learned knowledge to make predictions or decisions on which word to display next without relying on explicit human instructions but on probabilities.[7]

The use cases of such LLMs for research are sheer endless. From search for relevant literature, case or document summaries to writing outlines, proofreading, or researching in foreign languages. The key to get the desired output is prompt engineering. A prompt is the input that the LLM is given and the starting point for its predictions. It is therefore important to give the AI as much information as possible, always including a basic instruction, a topic, and an output goal (e.g. “write an argument for a reform of the UN Security Council that works to convince an international scholar”).[8] While there is an abundance of different types of prompting, I will here focus on some basics. During “chain-of-thought” prompting the key is to break down bigger tasks into smaller pieces. This technique forces the AI to “think” step‑by‑step and prevents it from filling context gaps by making guesses, which waters down the end result. The output can also be elevated by providing the AI with model examples to consider and shape its response around, the so called “few-shot” prompting. Lastly, “grounded prompting” refers to feeding the AI specific source material, thereby enlarging the given context the AI uses as a basis for its replies. The general rule is: the more context one can provide, the more accurate the output will be.

Nevertheless, the technology has its limits. Training data might be biased and therefore display only certain perspectives. Some material might be copyrighted, and the AI does not indicate whether that is the case. Further, there is a risk of hallucinations, i.e. the production of false or inaccurate information.[9] It is therefore essential to always double check the results independently, ask for sources, and challenge the arguments given. AI is not a one‑fits‑all solution to any issue one might run into as a legal researcher, but rather a way to speed up, facilitate, and enhance research. To ensure ethical use, it should not be used to write full journal articles or books. Not only is AI not capable of doing this in a way that the result lives up to the standards of a human researcher. Rather, one might be confronted with issues of plagiarism or breaching a code of conduct.

Admittedly, all of those technical terms and daily news about broader capabilities and revolutionary inventions can seem intimidating. This past year was only the start of what is to come, with AI rapidly evolving every day. Many scholars fear that their jobs will be swallowed by AI, that their writings become valueless, and that they cannot keep up with outputs produced by machines. However, the key to combat that fear is education on the topic, flexibility and adaptability, and a willingness to incorporate new ways of research into one’s own routine. Many researchers already use AI unknowingly. Google is one of the leading companies in the field of AI development, so when using their platform for a search inquiry, one is automatically confronted with AI algorithms that determine which results will show up.[10] Especially in a research field that encompasses many international resources, most scholars will use a digital translator. One of the most popular products on the market, and also one purchased by MPIL (and licensed by MPG), is DeepL, a tool that works with deeplearning technology to improve its understanding of text and language to ensure that the output even replicates slight linguistic connotations of the input text.[11]

Conclusion

The main advantages we get from a digitalised world lie in the significant reduction of tedious and time-consuming tasks. Especially tasks at the beginning of each research, like finding specific books or trying to figure out which literature to start with, are less time consuming now. We can increase our productivity, start moving to reading and writing much quicker, and focus on mapping out contributions to the scientific discourse. Similarly, we can facilitate the editing process and thereby publish faster, especially when it concerns timely and pressing issues of (international) law.

*** First part of the title by ChatGPT.

[1] ‘Research’, in: Cambridge University Press, Cambridge Dictionary, last accessed 25 March 2024.

[2] Photo: MPIL.

[3] See: MPIL, ‘Recherche’, last accessed 22 March 2024.

[4] See: Irene Pietropaoli, Use of Artificial Intelligence in Legal Practice,   British Institute of International and Comparative Law, 17 October 2023. <https://www.biicl.org/documents/11984_use_of_artificial_intelligence_in_legal_practice_final.pdf> accessed 26 March 2024.

[5] Michaela Fahlbusch, Die Interne Homepage – Das Intranetangebot Des Max-Planck-Instituts Für Ausländisches Öffentliches Recht Und Völkerrecht Heidelberg,  Forum Bibliothek und Information 53 (2001), 256-257, 256. Dietmar Bussmann, Virtuelles Institut, Pressemitteilung 13 February 1998, <https://idw-online.de/de/news2462> accessed 12 April 2024.

[6]  Statista, Number of internet and social media users worldwide as of January 2024, lastaccessed 22 March 2024.

[7] Ashish Vaswani et al., Attention Is All You Need, 31st Conference on Neural Information Processing Systems (NIPS 2017), 5  ; Guodong (Troy) Zhao, How ChatGPT Really Works, Explained for Non-Technical People,  Medium, 19 April 2023   last accessed 22 March 2024.

[8] Daniel Schwarcz and Jonathan H Choi, AI Tools for Lawyers: A Practical Guide, Minnesota Law Review Headnotes 1 (2023) , 5. <

[9] UK Department for Science, Innovation & Technology, Capabilities and Risks from Frontier AI – A Discussion Paper on the Need for Ruther Research into AI Risk,  AI Safety Summit hosted by the UK 1- 2 November 2023.

[10] Justin Burr, 9 Ways We Use AI in Our Products,  The Keyword (blog.google.com) 19 January 2023, last accessed 22 March 2024.

[11] Deep L,  ‘How Does DeepL Work?’, last accessed 22 March 2024.