Datasets:
The dataset viewer is not available for this split.
Error code: JobManagerCrashedError
Need help to make the dataset viewer work? Open a discussion for direct support.
Dataset Card for BnL Historical Newspapers
Dataset Summary
The BnL has digitised over 800.000 pages of Luxembourg newspapers. This dataset currently has one configuration covering a subset of these newspapers, which sit under the "Processed Datasets" collection. The BNL:
processed all newspapers and monographs that are in the public domain and extracted the full text and associated meta data of every single article, section, advertisement… The result is a large number of small, easy to use XML files formatted using Dublin Core.
[More Information Needed]
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More Information Needed]
Dataset Structure
The dataset currently contains a single configuration.
Data Instances
An example instance from the datasets:
{'id': 'https://persist.lu/ark:/70795/wx8r4c/articles/DTL47',
'article_type': 8,
'extent': 49,
'ispartof': 'Luxemburger Wort',
'pub_date': datetime.datetime(1853, 3, 23, 0, 0),
'publisher': 'Verl. der St-Paulus-Druckerei',
'source': 'newspaper/luxwort/1853-03-23',
'text': 'Asien. Eine neue Nedcrland-Post ist angekommen mil Nachrichten aus Calcutta bis zum 5. Febr.; Vom» vay, 12. Febr. ; Nangun und HongKong, 13. Jan. Die durch die letzte Post gebrachle Nachricht, der König von Ava sei durch seinen Bruder enlhronl worden, wird bestätigt. (K. Z.) Verantwortl. Herausgeber, F. Schümann.',
'title': 'Asien.',
'url': 'http://www.eluxemburgensia.lu/webclient/DeliveryManager?pid=209701#panel:pp|issue:209701|article:DTL47',
'language': 'de'
}
Data Fields
- 'id': This is a unique and persistent identifier using ARK.
- 'article_type': The type of the exported data, possible values ('ADVERTISEMENT_SECTION', 'BIBLIOGRAPHY', 'CHAPTER', 'INDEX', 'CONTRIBUTION', 'TABLE_OF_CONTENTS', 'WEATHER', 'SHIPPING', 'SECTION', 'ARTICLE', 'TITLE_SECTION', 'DEATH_NOTICE', 'SUPPLEMENT', 'TABLE', 'ADVERTISEMENT', 'CHART_DIAGRAM', 'ILLUSTRATION', 'ISSUE')
- 'extent': The number of words in the text field
- 'ispartof: The complete title of the source document e.g. “Luxemburger Wort”.
- 'pub_date': The publishing date of the document e.g “1848-12-15”
- 'publisher':The publisher of the document e.g. “Verl. der St-Paulus-Druckerei”.
- 'source': Describes the source of the document. For example dc:sourcenewspaper/luxwort/1848-12-15 means that this article comes from the newspaper “luxwort” (ID for Luxemburger Wort) issued on 15.12.1848.
- 'text': The full text of the entire article, section, advertisement etc. It includes any titles and subtitles as well. The content does not contain layout information, such as headings, paragraphs or lines.
- 'title': The main title of the article, section, advertisement, etc.
- 'url': The link to the BnLViewer on eluxemburgensia.lu to view the resource online.
- 'language': The language of the text, possible values ('ar', 'da', 'de', 'fi', 'fr', 'lb', 'nl', 'pt')
Data Splits
This dataset contains a single split train
.
Dataset Creation
Curation Rationale
[More Information Needed]
Source Data
Initial Data Collection and Normalization
[More Information Needed]
Who are the source language producers?
[More Information Needed]
Annotations
Annotation process
[More Information Needed]
Who are the annotators?
[More Information Needed]
Personal and Sensitive Information
[More Information Needed]
Considerations for Using the Data
Social Impact of Dataset
[More Information Needed]
Discussion of Biases
[More Information Needed]
Other Known Limitations
[More Information Needed]
Additional Information
Dataset Curators
[More Information Needed]
Licensing Information
[More Information Needed]
Citation Information
@misc{bnl_newspapers,
title={Historical Newspapers},
url={https://data.bnl.lu/data/historical-newspapers/},
author={ Bibliothèque nationale du Luxembourg},
Contributions
Thanks to @davanstrien for adding this dataset.
- Downloads last month
- 464