• Skip navigation
  • Skip to navigation
  • Skip to the bottom
Simulate organization breadcrumb open Simulate organization breadcrumb close
  • FAUTo the central FAU website
  • RRZE
  • NHR-Verein e.V.
  • Gauß-Allianz
  • Jobs

Navigation Navigation close
  • News
  • People
  • Research
    • Research Focus
    • Publications, Posters and Talks
    • Software & Tools
    • HPC Performance Lab
    • Atomic Structure Simulation Lab
    • NHR PerfLab Seminar
    • Projects
    • Awards
    Portal Research
  • Teaching & Training
    • Lectures and Seminars
    • Tutorials and courses
    • Theses
    • HPC Cafe
    • Student Cluster Competition
    Portal Teaching
  • Systems & Services
    • Systems, Documentation & Instructions
    • Support & Contact
    • Success Stories from the Support
    • Training Resources
    • Summary of System Utilization
    • Reports from User Projects
    Portal Systems & Services
  • FAQ

  1. Home
  2. Systems & Services
  3. User projects
  4. Linguistics
  5. HPC User Report from P. Uhrig (Chair of English Linguistics)

HPC User Report from P. Uhrig (Chair of English Linguistics)

In page navigation: Systems & Services
  • Systems, Documentation & Instructions
    • Getting started with HPC
      • NHR@FAU HPC-Portal Usage
    • NHR application rules – NHR@FAU
    • HPC clusters & systems
      • Dialog server
      • Alex GPGPU cluster (NHR+Tier3)
      • Fritz parallel cluster (NHR+Tier3)
      • Meggie parallel cluster (Tier3)
      • Emmy parallel cluster (Tier3)
      • Woody throughput cluster (Tier3)
      • Woody-NG throughput cluster (Tier3)
      • TinyFat cluster (Tier3)
      • TinyGPU cluster (Tier3)
      • Test cluster
      • Jupyterhub
    • SSH – Secure Shell access to HPC systems
    • File systems
    • Batch Processing
      • Job script examples – Slurm
      • Advanced topics Slurm
      • Torque batch system
    • Software environment
    • Special applications, and tips & tricks
      • Amber/AmberTools
      • ANSYS CFX
      • ANSYS Fluent
      • ANSYS Mechanical
      • Continuous Integration / Gitlab Cx
        • Continuous Integration / One-way syncing of GitHub to Gitlab repositories
      • CP2K
      • CPMD
      • GROMACS
      • IMD
      • Intel MKL
      • LAMMPS
      • Matlab
      • NAMD
      • OpenFOAM
      • ORCA
      • Python and Jupyter
      • Quantum Espresso
      • R and R Studio
      • Spack package manager
      • STAR-CCM+
      • Tensorflow and PyTorch
      • TURBOMOLE
      • VASP
        • Request access to central VASP installation
      • Working with NVIDIA GPUs
      • WRF
  • Support & Contact
    • Monthly HPC Cafe
    • HPC Performance Lab
    • Atomic Structure Simulation Lab
    • Support Success Stories
      • Success story: Elmer/Ice
  • HPC User Training
  • HPC System Utilization
  • User projects
    • Biology, life sciences & pharmaceutics
      • HPC User Report from A. Bochicchio (Professorship of Computational Biology)
      • HPC User Report from A. Horn (Bioinformatics)
      • HPC User Report from C. Söldner (Professorship for Bioinformatics)
      • HPC User Report from F. Beierlein (NHR@FAU, Computer Chemistry Center)
      • HPC User Report from J. Calderón (Computer Chemistry Center)
      • HPC User Report from J. Kaindl (Chair of Medicinal Chemistry)
      • HPC User Report from K. Pluhackova (Computational Biology Group)
    • Chemical & mechanical engineering
      • HPC User Report from A. Leonardi (Institute for Multiscale Simulation)
      • HPC User Report from F. Lenahan (Institute of Advanced Optical Technologies – Thermophysical Properties)
      • HPC User Report from F. Weber (Chair of Applied Mechanics)
      • HPC User Report from K. Nusser (Institute of Process Machinery and Systems Engineering)
      • HPC User Report from K. Nusser (Institute of Process Machinery and Systems Engineering)
      • HPC User Report from L. Eckendörfer (Catalytic Reactors and Process Technology)
      • HPC User Report from M. Klement (Institute for Multiscale Simulation)
      • HPC User Report from M. Münsch (Chair of Fluid Mechanics)
      • HPC User Report from T. Klein (Institute of Advanced Optical Technologies – Thermophysical Properties)
      • HPC User Report from T. Schikarski (Chair of Fluid Mechanics / Chair of Particle Technology)
      • HPC User Report from U. Higgoda (Institute of Advanced Optical Technologies – Thermophysical Properties)
    • Chemistry
      • HPC User Report from B. Becit (Professorship of Theoretical Chemistry)
      • HPC User Report from B. Meyer (Computational Chemistry – ICMM)
      • HPC User Report from D. Munz (Chair of Inorganic and General Chemistry)
      • HPC User Report from J. Konrad (Professorship of Theoretical Chemistry)
      • HPC User Report from P. Schwarz (Interdisciplinary Center for Molecular Materials)
      • HPC User Report from S. Frühwald (Chair of Theoretical Chemistry)
      • HPC User Report from S. Maisel (Chair of Theoretical Chemistry)
      • HPC User Report from S. Sansotta (Professorship of Theoretical Chemistry)
      • HPC User Report from S. Seiler (Interdisciplinary Center for Molecular Materials)
      • HPC User Report from S. Trzeciak (Professorship of Theoretical Chemistry)
      • HPC User Report from T. Klöffel (Interdisciplinary Center for Molecular Materials)
      • HPC User Report from T. Kollmann (Professorship of Theoretical Chemistry)
    • Computer science & Mathematics
      • HPC User Report from B. Jakubaß & S. Falk (Division of Phoniatrics and Pediatric Audiology)
      • HPC User Report from D. Schuster (Chair for System Simulation)
      • HPC User Report from F. Wein (Professorship for Mathematical Optimization)
      • HPC User Report from J. Hornich (Professur für Höchstleistungsrechnen)
      • HPC User Report from L. Folle and K. Tkotz (Chair of Computer Science 5, Pattern Recognition)
      • HPC User Report from R. Burlacu (Economics, Discrete Optimization, and Mathematics)
      • HPC User Report from S. Falk (Division of Phoniatrics and Pediatric Audiology)
      • HPC User Report from S. Falk (Phoniatrics and Pediatric Audiology)
      • HPC User Report from S. Jacob (Chair of System Simulation)
    • Electrical engineering & Audio processing
      • HPC User Report from N. Pia (AudioLabs)
      • HPC User Report from S. Balke (Audiolabs)
    • Geography & Climatology
      • HPC usage report from F. Temme, J. V. Turton, T. Mölg and T. Sauter
      • HPC usage report from J. Turton, T. Mölg and E. Collier
      • HPC usage report from N. Landshuter, T. Mölg, J. Grießinger, A. Bräuning and T. Peters
      • HPC User Report from C. Pickler and T. Mölg (Climate System Research Group)
      • HPC User Report from E. Collier (Climate System Research Group)
      • HPC User Report from E. Collier and T. Mölg (Climate System Research Group)
      • HPC User Report from E. Collier, T. Sauter, T. Mölg & D. Hardy (Climate System Research Group, Institute of Geography)
      • HPC User Report from E. Kropač, T. Mölg, N. J. Cullen, E. Collier, C. Pickler, and J. V. Turton (Climate System Research Group)
      • HPC User Report from J. Fürst (Department of Geography)
      • HPC User Report from P. Friedl (Department of Geography)
      • HPC User Report from T. Mölg (Climate System Research Group)
    • Linguistics
      • HPC User Report from P. Uhrig (Chair of English Linguistics)
    • Material sciences
      • HPC User Report from A. Rausch (Chair of Materials Science and Engineering for Metals)
      • HPC User Report from D. Wei (Chair of Materials Simulation)
      • HPC User Report from J. Köpf (Chair of Materials Science and Engineering for Metals)
      • HPC User Report from P. Baranova (Chair of General Materials Properties)
      • HPC User Report from S. Nasiri (Chair for Materials Simulation)
      • HPC User Report from S.A. Hosseini (Chair for Materials Simulation)
      • HPC User Report from T. Klein (Chair for Materials Simulation)
    • Medical research
      • HPC User Report from H. Sadeghi (Phoniatrics and Pediatric Audiology)
      • HPC User Report from P. Ritt (Imaging and Physics Group, Clinic of Nuclear Medicine)
      • HPC User Report from S. Falk (Division of Phoniatrics and Pediatric Audiology)
    • Physics
      • HPC User Report from D. Jankowsky (High-Energy Astrophysics)
      • HPC User Report from M. Maiti (Inst. Theoretische Physik 1)
      • HPC User Report from N. Vučemilović-Alagić (PULS group of the Physics Department)
      • HPC User Report from O. Malcioglu (Theoretische Festkörperphysik)
      • HPC User Report from S. Fey (Chair of Theoretical Physics I)
      • HPC User Report from S. Ninova (Theoretical Solid-State Physics)
      • HPC User Report from S. Schmidt (Erlangen Centre for Astroparticle Physics)
    • Regional users and student projects
      • HPC User Report from Dr. N. Ferruz (University of Bayreuth)
      • HPC User Report from J. Martens (Comprehensive Heart Failure Center / Universitätsklinikum Würzburg)
      • HPC User Report from M. Fritsche (HS-Coburg)
      • HPC User Report from M. Heß (TH-Nürnberg)
      • HPC User Report from M. Kögel (TH-Nürnberg)
  • NHR compute time projects

HPC User Report from P. Uhrig (Chair of English Linguistics)

Multimodal Corpus Linguistics

Contact:

Dr. Peter Uhrig
Chair of English Linguistics
Friedrich-Alexander-Universität Erlangen-Nürnberg

Mainly used HPC resources at RRZE

throughput on all of RRZE's HPC clusters

While using linguistic annotation of textual data is mainstream in linguistic research, creating such datasets is expensive. For small collections, manual annotation ist still an option, but for large quantities of texts (i.e. billions of words) only automatic annotation is feasible. Even traditional textual analysis is computationally expensive enough to necessitate HPC resources. When audio-visual data comes into the picture, the required CPU time multiplies.

Motivation and problem definition

The aim of this research is to make collections of texts and audiovisual data searchable and to analyze it with automatic methods. In order to achieve this, forced alignment of transcripts and audio track are performed, followed by image analysis (currently mainly hand movement, head movement, facial expressions). Also, a full set of linguistic annotations is run on the data, e.g. PoS-tagging, lemmatization and dependency-parsing.
For monomodal text data, the linguistic annotation and an analysis of co-occurrence frequencies is run on the HPC systems.

Methods and codes

The software used for Natural Language Processing (Stanford CoreNLP: https://stanfordnlp.github.io/CoreNLP/) ist mostly off-the-shelf, some features depend on our own code.
For the forced alignment, gentle (https://lowerquality.com/gentle/) is currently used for English.
Gesture recognition is performed by a piece of software developed in the context of the Distributed Little Red Hen Lab (http://redhenlab.org) at Case Western Reserve University by Sergiy Turchyn, based on OpenCV (https://opencv.org).
Some of these tools can make use of multiple CPUs and/or GPUs, but the vast majority of the code is not parallel and thus relies only on throughput computing.

Results

The automatically annotated data enables users of our databases to find relevant data for their research projects, e.g. find abstract grammatical structures, find the exact locations of certain expressions in the video recordings, etc. The annotations are made available via a customized web-based search interface.
For instance, a study of clausal subjects in English, which are nearly impossible to find without syntactic annotation, was carried out on a much larger scale than otherwise possible. In multimodal research, the data is used to find gestures associated with certain constructions in a semi-automatic approach. To what extent a fully-automatic approach can be used for different research questions is currently investigated in an ongoing research project.

Outreach

The following publications/talks report on the infrastructure or present results generated with RRZE’s HPC facilities:

Book:

  • Peter Uhrig (2018): Subjects in English [revised PhD thesis; will be published in spring 2018 in the series Trends in Linguistics. Studies and Monographs with De Gruyter Mouton]

Articles:

  • Stefan Evert/Peter Uhrig/Sabine Bartsch/Thomas Proisl (2017): “E-VIEW-alation – a large-scale evaluation study of association measures for collocation identification.” In Electronic lexicography in the 21st century. Proceedings of the eLex 2017 conference, Leiden, The Netherlands.
  • Peter Uhrig/Thomas Proisl (2012): “Less hay, more needles – using dependency-annotated corpora to provide lexicographers with more accurate lists of collocation candidates.” Lexicographica 28.
  • Thomas Proisl/Peter Uhrig (2012): “Efficient Dependency Graph Matching with the IMS Open Corpus Workbench.” LREC 2012, Istanbul.

Invited Talks:

  • Peter Uhrig (2017): Texts – Sounds – Images: Multimodal Corpus Linguistics. LMU München.
  • Peter Uhrig (2017): Researching co-speech gesture in NewsScape – an integrated workflow for retrieval, annotation, and analysis. International Conference on Multimodal Communication: Developing New Theories and Methods, Osnabrück. [Plenary Workshop on Methods]
  • Peter Uhrig (2017): Demo on multimodal data extraction and annotation. Time concepts and their expression: creativity, cognition, communication: CREATIME workshop, Pamplona, Spain.
  • Peter Uhrig/Thomas Proisl (2012): Sprachstrukturen effizient speichern, verarbeiten und abfragen: Das Erlanger Treebank.info-Projekt. Vortragsreihe Digital Humanities Erlangen. [Repeated for the general public: Lange Nacht der Wissenschaften 2013, Erlangen.]
  • Peter Uhrig (2012): A fast and user-friendly interface for large treebanks. Universität Trier.
  • Peter Uhrig/Thomas Proisl (2011): Treebank.info – Ein System zur Abfrage syntaktisch annotierter Korpora. Otto-Friedrich-Universität Bamberg.

Further Talks and Conference Papers:

  • Stefan Evert/Peter Uhrig/Sabine Bartsch/Thomas Proisl: E-VIEW-alation — a large-scale evaluation study of association measures for collocation identification. Electronic Lexicography in the 21st century: Lexicography from scratch. Leiden (Niederlande).
  • Peter Uhrig (2017): NewsScape and the Distributed Little Red Hen Lab – A digital infrastructure for the large-scale analysis of TV broadcasts. Anglistentag 2017, Regensburg.
  • Peter Uhrig (2017): Gesture and Argument Structure – gesture as evidence for item-specific and general knowledge. 14th International Cognitive Linguistics Conference, Tartu (Estonia).
  • Peter Uhrig (2017): A corpus infrastructure for accessing multimodal data: NewsScape and the Distributed Little Red Hen Lab. ICAME 38, Prag.
  • Sabine Bartsch/Stefan Evert/Thomas Proisl/Peter Uhrig (2015): (Association) measure for measure: Comparing collocation dictionaries with co-occurrence data for a better understanding of the notion of collocation. ICAME 36, Trier.
  • Thomas Proisl/Peter Uhrig (2012): Using Dependency-Annotated Corpora to Improve Collocation Extraction. ICAME 33, Leuven.
  • Peter Uhrig/Thomas Proisl (2012): Geparste Korpora für alle! Pre-Conference Workshop auf dem GAL-Kongress 2012, Erlangen.
  • Peter Uhrig/Thomas Proisl (2011): A fast and user-friendly interface for large treebanks. Corpus Linguistics 2011, Birmingham.
  • Thomas Proisl/Peter Uhrig (2011): Verbesserung der Kollokationsextraktion durch Verwendung dependenzannotierter Korpora. GAL Sektionentagung 2011, Bayreuth.
  • Peter Uhrig (2011): Als die Sprachwissenschaft fast zu einer Naturwissenschaft wurde: Wie der Computer die Sprachforschung revolutioniert hat. Lange Nacht der Wissenschaften, Erlangen.
  • Peter Uhrig/Thomas Proisl (2011): The Erlangen Treebank. Vortragsreihe Approaches to Corpus Linguistics, IZ LVK, Erlangen.
  • Peter Uhrig/Thomas Proisl (2011): The Treebank.info project. Software Demonstration. ICAME 32, Oslo.

Researcher’s Bio and Affiliation

Dr. Peter Uhrig is a researcher at the chair of English Linguistics. He is currently working on a post-doctoral project on large-scale multimodal corpus linguistics, for which HPC resources are essential.

Erlangen National High Performance Computing Center (NHR@FAU)
Martensstraße 1
91058 Erlangen
Germany
  • Imprint
  • Privacy
  • Accessibility
  • How to find us
Up