Where?
How?
Current Focus?
“I want slides that are useful to me - not the one holding the talk”
me
Whats the problem anyway?
“While some domains operate with relatively homogeneous text data, historical research is characterized by texts defined by their diversity in form and content, presenting a significant challenge for NLP-tasks.”
NER4all-Paper
“Yet, due to the high linguistic and genre diversity of sources, only limited canonisation of spellings, the level of required historical domain knowledge, and the scarcity of annotated training data, established approaches to natural language processing (NLP) have been both extremely expensive and yielded only unsatisfactory results in terms of recall and precision.”
NER4all-Paper
“The process of detecting and classifying named entities in historical texts is often less straightforward than it appears as it involves a degree of interpretation by domain experts familiar with the specific subject, historical context, and time period.”
NER4all-Paper
Our hypothesis is that
How do you even test for this?
“To test our approach, we created ground truth with manually annotated named entities from the 1921 Baedeker travel guide for Berlin[…].”
NER4all-Paper
Why this Resource?
Examples from the 1921 Baedeker Travel-Guide for Berlin
“We […] match the extracted spans. We used the most lenient criteria […], meaning it suffices to have an overlap with the annotation and having the correct entity-type annotated.”
“We [allowed] for up to 1 error every 5 generated characters,[…]”
NER4all-Paper
Is it any good?
“readily-available, state-of-the-art LLMs significantly outperform two leading NLP frameworks, spaCy and flair, for NER in historical documents by seven to twentytwo percent higher F1-Scores.”
NER4all-Paper
Variant | Recall | Precision | F1-Score | |||
---|---|---|---|---|---|---|
All 0-Shot | μ ± σ | impact | μ ± σ | impact | μ ± σ | impact |
Specific Context + PE | 0.84 ±0.10 |
3.71 % |
0.91 ±0.08 |
3.99 % |
0.87 ±0.08 |
3.80 % |
Specific Context | 0.81 ±0.19 |
0.00 % |
0.87 ±0.19 |
0.00 % |
0.84 ±0.19 |
0.00 % |
Generic Context + PE | 0.80 ±0.11 |
-2.35 % |
0.92 ±0.10 |
2.92 % |
0.84 ±0.10 |
-0.07 % |
No Context | 0.75 ±0.15 |
-7.43 % |
0.90 ±0.09 |
2.31 % |
0.81 ±0.11 |
-3.64 % |
Baseline flair | 0.76 ±0.13 |
-6.65 % |
0.89 ±0.10 |
1.46 % |
0.81 ±0.11 |
-3.86 % |
Baseline spaCy | 0.71 ±0.13 |
-12.79 % |
0.62 ±0.11 |
-29.32 % |
0.66 ±0.10 |
-21.66 % |
“surprisingly and against our expectations, zero-shot approaches, […] perform better than few-shot approaches until the number of examples reaches 16 and more.”
NER4all-Paper
“Although the process is less straightforward for historians who rely on domain expertise, we discovered that the LLM-based approach can replicate or even exceed human-level performance for certain tasks.”
NER4all-Paper
Whats does this mean?
“[LLMs] outperform […] as soon as a bit of contextual information and persona modelling is included in the prompts.”
“Our ablation study shows how providing historical context to the task […] turns focus away from a purely linguistic approach [and] are core to a successful prompting strategy.”
NER4all-Paper
Variant | Recall | Precision | F1-Score |
---|---|---|---|
All 0-Shot | μ ± σ | μ ± σ | μ ± σ |
Specific Context + PE 32-Shot | 0.89 ±0.09 |
0.90 ±0.06 |
0.89 ±0.06 |
Specific Context + PE | 0.84 ±0.10 |
0.91 ±0.08 |
0.87 ±0.08 |
Specific Context | 0.81 ±0.19 |
0.87 ±0.19 |
0.84 ±0.19 |
Generic Context + PE | 0.80 ±0.11 |
0.92 ±0.10 |
0.84 ±0.10 |
Generic Context | 0.81 ±0.11 |
0.90 ±0.10 |
0.85 ±0.09 |
No Context + PE | 0.74 ±0.15 |
0.91 ±0.10 |
0.81 ±0.11 |
No Context | 0.75 ±0.15 |
0.90 ±0.09 |
0.81 ±0.11 |
“We argue that in order to do so, one has to reconceptualize NER from a purely linguistic task into a humanist endeavour that requires some level of domain expertise and aims at activating the vast body of information LLMs have ingested during their training.”
NER4all-Paper
“We propose a paradigmatic shift in the use of LLMs for NLP tasks: the redefinition of these tasks from a purely linguistic dimension to a content-oriented humanities dimension.”
NER4all-Paper
What now?
“NER [is made available] for all historians by removing the barrier of scripting languages and computational skills required for established NLP tools and instead leveraging natural language prompts and consumer-grade tools and frontends.”
NER4all-Paper
You should try it out yourself!
“In future work, we plan to investigate how well our approach can handle earlier linguistic forms to determine its broader applicability across different historical periods and languages.”
NER4all-Paper
All data and code will be made available shortly.
NER4all-Authors
Questions?