Daniel@FinTOC’2 Shared Task: Title Detection and Structure Extraction - Faculté des Lettres de Sorbonne Université Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Daniel@FinTOC’2 Shared Task: Title Detection and Structure Extraction

Résumé

We present our contributions for the two tracks of the 2020 FinTOC Shared Tasks: Table of Content (ToC) extraction in English documents and French documents. We describe separately our work on Title Detection and ToC Extraction. For ToC Extraction, we propose an approach that combines information from multiple sources: the table of contents, the wording of the document, and lexical domain knowledge. For the title detection part, we compare surface features to character-based features on various training configurations. We show that title detection results are very sensitive to the kind of training dataset used.
Fichier principal
Vignette du fichier
Daniel_FinTOC2020_TOC_detection.pdf (284.43 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03024867 , version 1 (26-11-2020)

Identifiants

  • HAL Id : hal-03024867 , version 1

Citer

Emmanuel Giguet, Gaël Lejeune, Jean-Baptiste Tanguy. Daniel@FinTOC’2 Shared Task: Title Detection and Structure Extraction. 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation @COLING’2020, Dec 2020, Barcelone, Spain. ⟨hal-03024867⟩
132 Consultations
81 Téléchargements

Partager

Gmail Facebook X LinkedIn More