Dataset Viewer
Viewer
The dataset viewer is not available for this split.
Job manager crashed while running this job (missing heartbeats).
Error code:   JobManagerCrashedError

Need help to make the dataset viewer work? Open a discussion for direct support.

Dataset Card for BnL Historical Newspapers

Dataset Summary

The BnL has digitised over 800.000 pages of Luxembourg newspapers. This dataset currently has one configuration covering a subset of these newspapers, which sit under the "Processed Datasets" collection. The BNL:

processed all newspapers and monographs that are in the public domain and extracted the full text and associated meta data of every single article, section, advertisement… The result is a large number of small, easy to use XML files formatted using Dublin Core.

[More Information Needed]

Supported Tasks and Leaderboards

[More Information Needed]

Languages

[More Information Needed]

Dataset Structure

The dataset currently contains a single configuration.

Data Instances

An example instance from the datasets:

{'id': 'https://persist.lu/ark:/70795/wx8r4c/articles/DTL47',
 'article_type': 8,
 'extent': 49,
 'ispartof': 'Luxemburger Wort',
 'pub_date': datetime.datetime(1853, 3, 23, 0, 0),
 'publisher': 'Verl. der St-Paulus-Druckerei',
 'source': 'newspaper/luxwort/1853-03-23',
 'text': 'Asien. Eine neue Nedcrland-Post ist angekommen mil Nachrichten aus Calcutta bis zum 5. Febr.; Vom» vay, 12. Febr. ; Nangun und HongKong, 13. Jan. Die durch die letzte Post gebrachle Nachricht, der König von Ava sei durch seinen Bruder enlhronl worden, wird bestätigt. (K. Z.) Verantwortl. Herausgeber, F. Schümann.',
 'title': 'Asien.',
 'url': 'http://www.eluxemburgensia.lu/webclient/DeliveryManager?pid=209701#panel:pp|issue:209701|article:DTL47',
 'language': 'de'
}

Data Fields

  • 'id': This is a unique and persistent identifier using ARK.
  • 'article_type': The type of the exported data, possible values ('ADVERTISEMENT_SECTION', 'BIBLIOGRAPHY', 'CHAPTER', 'INDEX', 'CONTRIBUTION', 'TABLE_OF_CONTENTS', 'WEATHER', 'SHIPPING', 'SECTION', 'ARTICLE', 'TITLE_SECTION', 'DEATH_NOTICE', 'SUPPLEMENT', 'TABLE', 'ADVERTISEMENT', 'CHART_DIAGRAM', 'ILLUSTRATION', 'ISSUE')
  • 'extent': The number of words in the text field
  • 'ispartof: The complete title of the source document e.g. “Luxemburger Wort”.
  • 'pub_date': The publishing date of the document e.g “1848-12-15”
  • 'publisher':The publisher of the document e.g. “Verl. der St-Paulus-Druckerei”.
  • 'source': Describes the source of the document. For example dc:sourcenewspaper/luxwort/1848-12-15 means that this article comes from the newspaper “luxwort” (ID for Luxemburger Wort) issued on 15.12.1848.
  • 'text': The full text of the entire article, section, advertisement etc. It includes any titles and subtitles as well. The content does not contain layout information, such as headings, paragraphs or lines.
  • 'title': The main title of the article, section, advertisement, etc.
  • 'url': The link to the BnLViewer on eluxemburgensia.lu to view the resource online.
  • 'language': The language of the text, possible values ('ar', 'da', 'de', 'fi', 'fr', 'lb', 'nl', 'pt')

Data Splits

This dataset contains a single split train.

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

[More Information Needed]

Citation Information

@misc{bnl_newspapers,
title={Historical Newspapers},
url={https://data.bnl.lu/data/historical-newspapers/},
author={ Bibliothèque nationale du Luxembourg},

Contributions

Thanks to @davanstrien for adding this dataset.

Downloads last month
464
Edit dataset card
Evaluate models HF Leaderboard