Skip to main content

Showing 1–42 of 42 results for author: Hämäläinen, M

Searching in archive cs. Search in all archives.
.
  1. Securing Hybrid Wireless Body Area Networks (HyWBAN): Advancements in Semantic Communications and Jamming Techniques

    Authors: Simone Soderi, Mariella Särestöniemi, Syifaul Fuada, Matti Hämäläinen, Marcos Katz, Jari Iinatti

    Abstract: This paper explores novel strategies to strengthen the security of Hybrid Wireless Body Area Networks (HyWBANs), essential in smart healthcare and Internet of Things (IoT) applications. Recognizing the vulnerability of HyWBAN to sophisticated cyber-attacks, we propose an innovative combination of semantic communications and jamming receivers. This dual-layered security mechanism protects against u… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Journal ref: Digital Health and Wireless Solutions, 2024

  2. arXiv:2402.16420  [pdf, other

    cs.CL

    Predicting Sustainable Development Goals Using Course Descriptions -- from LLMs to Conventional Foundation Models

    Authors: Lev Kharlashkin, Melany Macias, Leo Huovinen, Mika Hämäläinen

    Abstract: We present our work on predicting United Nations sustainable development goals (SDG) for university courses. We use an LLM named PaLM 2 to generate training data given a noisy human-authored course description input as input. We use this data to train several different smaller language models to predict SDGs for university courses. This work contributes to better university level adaptation of SDG… ▽ More

    Submitted 23 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: 3 figures, 2 tables

  3. arXiv:2312.14586  [pdf, other

    eess.AS cs.SD

    Noise Morphing for Audio Time Stretching

    Authors: Eloi Moliner, Leonardo Fierro, Alec Wright, Matti Hämäläinen, Vesa Välimäki

    Abstract: This letter introduces an innovative method to enhance the quality of audio time stretching by precisely decomposing a sound into sines, transients, and noise and by improving the processing of the latter component. While there are established methods for time-stretching sines and transients with high quality, the manipulation of noise or residual components has lacked robust solutions in prior re… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: submitted to IEEE Signal Processing Letters

  4. arXiv:2305.15380  [pdf, ps, other

    cs.CL

    Sentiment Analysis Using Aligned Word Embeddings for Uralic Languages

    Authors: Khalid Alnajjar, Mika Hämäläinen, Jack Rueter

    Abstract: In this paper, we present an approach for translating word embeddings from a majority language into 4 minority languages: Erzya, Moksha, Udmurt and Komi-Zyrian. Furthermore, we align these word embeddings and present a novel neural network model that is trained on English data to conduct sentiment analysis and then applied on endangered language data through the aligned word embeddings. To test ou… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023)

  5. arXiv:2301.01134  [pdf, other

    cs.MM cs.CL cs.CV

    Ring That Bell: A Corpus and Method for Multimodal Metaphor Detection in Videos

    Authors: Khalid Alnajjar, Mika Hämäläinen, Shuo Zhang

    Abstract: We present the first openly available multimodal metaphor annotated corpus. The corpus consists of videos including audio and subtitles that have been annotated by experts. Furthermore, we present a method for detecting metaphors in the new dataset based on the textual content of the videos. The method achieves a high F1-score (62\%) for metaphorical labels. We also experiment with other modalitie… ▽ More

    Submitted 15 December, 2022; originally announced January 2023.

    Comments: Figlang 2022

  6. arXiv:2212.02911  [pdf, ps, other

    cs.CL

    Modern French Poetry Generation with RoBERTa and GPT-2

    Authors: Mika Hämäläinen, Khalid Alnajjar, Thierry Poibeau

    Abstract: We present a novel neural model for modern poetry generation in French. The model consists of two pretrained neural models that are fine-tuned for the poem generation task. The encoder of the model is a RoBERTa based one while the decoder is based on GPT-2. This way the model can benefit from the superior natural language understanding performance of RoBERTa and the good natural language generatio… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: ICCC 2022

  7. arXiv:2212.02907  [pdf, other

    cs.CL

    Emotion Conditioned Creative Dialog Generation

    Authors: Khalid Alnajjar, Mika Hämäläinen

    Abstract: We present a DialGPT based model for generating creative dialog responses that are conditioned based on one of the following emotions: anger, disgust, fear, happiness, pain, sadness and surprise. Our model is capable of producing a contextually apt response given an input sentence and a desired emotion label. Our model is capable of expressing the desired emotion with an accuracy of 0.6. The best… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: NLP4DH 2022

  8. arXiv:2212.02170  [pdf, other

    cs.CL

    Automatic Generation of Factual News Headlines in Finnish

    Authors: Maximilian Koppatz, Khalid Alnajjar, Mika Hämäläinen, Thierry Poibeau

    Abstract: We present a novel approach to generating news headlines in Finnish for a given news story. We model this as a summarization task where a model is given a news article, and its task is to produce a concise headline describing the main topic of the article. Because there are no openly available GPT-2 models for Finnish, we will first build such a model using several corpora. The model is then fine-… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

    Comments: INLG 2022

  9. arXiv:2212.02168  [pdf, ps, other

    cs.CL

    Video Games as a Corpus: Sentiment Analysis using Fallout New Vegas Dialog

    Authors: Mika Hämäläinen, Khalid Alnajjar, Thierry Poibeau

    Abstract: We present a method for extracting a multilingual sentiment annotated dialog data set from Fallout New Vegas. The game developers have preannotated every line of dialog in the game in one of the 8 different sentiments: \textit{anger, disgust, fear, happy, neutral, pained, sad } and \textit{surprised}. The game has been translated into English, Spanish, German, French and Italian. We conduct experi… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

    Comments: FDG 2022

  10. arXiv:2211.16992  [pdf, other

    eess.AS cs.SD

    Extreme Audio Time Stretching Using Neural Synthesis

    Authors: Leonardo Fierro, Alec Wright, Vesa Välimäki, Matti Hämäläinen

    Abstract: A deep neural network solution for time-scale modification (TSM) focused on large stretching factors is proposed, targeting environmental sounds. Traditional TSM artifacts such as transient smearing, loss of presence, and phasiness are heavily accentuated and cause poor audio quality when the TSM factor is four or larger. The weakness of established TSM methods, often based on a phase vocoder stru… ▽ More

    Submitted 30 November, 2022; originally announced November 2022.

    Comments: Submitted to IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023 on Oct 27, 2022

  11. arXiv:2211.01889  [pdf, other

    cs.CL

    When to Laugh and How Hard? A Multimodal Approach to Detecting Humor and its Intensity

    Authors: Khalid Alnajjar, Mika Hämäläinen, Jörg Tiedemann, Jorma Laaksonen, Mikko Kurimo

    Abstract: Prerecorded laughter accompanying dialog in comedy TV shows encourages the audience to laugh by clearly marking humorous moments in the show. We present an approach for automatically detecting humor in the Friends TV show using multimodal data. Our model is capable of recognizing whether an utterance is humorous or not and assess the intensity of it. We use the prerecorded laughter in the show as… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: Outstanding paper award in COLING 2022

  12. arXiv:2207.04453  [pdf

    cs.CL

    Multilingual Persuasion Detection: Video Games as an Invaluable Data Source for NLP

    Authors: Teemu Pöyhönen, Mika Hämäläinen, Khalid Alnajjar

    Abstract: Role-playing games (RPGs) have a considerable amount of text in video game dialogues. Quite often this text is semi-annotated by the game developers. In this paper, we extract a multilingual dataset of persuasive dialogue from several RPGs. We show the viability of this data in building a persuasion detection system using a natural language processing (NLP) model called BERT. We believe that video… ▽ More

    Submitted 10 July, 2022; originally announced July 2022.

    Comments: DiGRA 2022

  13. arXiv:2205.08024  [pdf, ps, other

    cs.CL

    Harnessing Multilingual Resources to Question Answering in Arabic

    Authors: Khalid Alnajjar, Mika Hämäläinen

    Abstract: The goal of the paper is to predict answers to questions given a passage of Qur'an. The answers are always found in the passage, so the task of the model is to predict where an answer starts and where it ends. As the initial data set is rather small for training, we make use of multilingual BERT so that we can augment the training data by using data available for languages other than Arabic. Furth… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

  14. arXiv:2112.14153  [pdf, other

    cs.CL

    Processing M.A. Castrén's Materials: Multilingual Typed and Handwritten Manuscripts

    Authors: Niko Partanen, Jack Rueter, Mika Hämäläinen, Khalid Alnajjar

    Abstract: The study forms a technical report of various tasks that have been performed on the materials collected and published by Finnish ethnographer and linguist, Matthias Alexander Castrén (1813-1852). The Finno-Ugrian Society is publishing Castrén's manuscripts as new critical and digital editions, and at the same time different research groups have also paid attention to these materials. We discuss th… ▽ More

    Submitted 28 December, 2021; originally announced December 2021.

    Comments: Proceedings of the Workshop on Natural Language Processing for Digital Humanities

  15. arXiv:2112.12489  [pdf, other

    cs.CL

    TFW2V: An Enhanced Document Similarity Method for the Morphologically Rich Finnish Language

    Authors: Quan Duong, Mika Hämäläinen, Khalid Alnajjar

    Abstract: Measuring the semantic similarity of different texts has many important applications in Digital Humanities research such as information retrieval, document clustering and text summarization. The performance of different methods depends on the length of the text, the domain and the language. This study focuses on experimenting with some of the current approaches to Finnish, which is a morphological… ▽ More

    Submitted 23 December, 2021; originally announced December 2021.

    Comments: Workshop on Natural Language Processing for Digital Humanities (NLP4DH)

  16. arXiv:2111.04574  [pdf

    cs.CL

    Detecting Depression in Thai Blog Posts: a Dataset and a Baseline

    Authors: Mika Hämäläinen, Pattama Patpong, Khalid Alnajjar, Niko Partanen, Jack Rueter

    Abstract: We present the first openly available corpus for detecting depression in Thai. Our corpus is compiled by expert verified cases of depression in several online blogs. We experiment with two different LSTM based models and two different BERT based models. We achieve a 77.53\% accuracy with a Thai BERT model in detecting depression. This establishes a good baseline for future researcher on the same c… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

    Comments: Workshop on Noisy User-generated Text (at EMNLP)

  17. arXiv:2111.03800  [pdf, other

    cs.CL

    Finnish Dialect Identification: The Effect of Audio and Text

    Authors: Mika Hämäläinen, Khalid Alnajjar, Niko Partanen, Jack Rueter

    Abstract: Finnish is a language with multiple dialects that not only differ from each other in terms of accent (pronunciation) but also in terms of morphological forms and lexical choice. We present the first approach to automatically detect the dialect of a speaker based on a dialect transcript and transcript with audio recording in a dataset consisting of 23 different dialects. Our results show that the b… ▽ More

    Submitted 6 November, 2021; originally announced November 2021.

    Comments: EMNLP 2021

  18. arXiv:2109.11326  [pdf, ps, other

    cs.CL

    The Current State of Finnish NLP

    Authors: Mika Hämäläinen, Khalid Alnajjar

    Abstract: There are a lot of tools and resources available for processing Finnish. In this paper, we survey recent papers focusing on Finnish NLP related to many different subcategories of NLP such as parsing, generation, semantics and speech. NLP research is conducted in many different research groups in Finland, and it is frequently the case that NLP tools and models resulting from academic research are m… ▽ More

    Submitted 23 September, 2021; originally announced September 2021.

    Comments: Seventh international workshop on computational linguistics of Uralic languages (IWCLUL)

  19. arXiv:2109.08702  [pdf, other

    cs.CL

    When a Computer Cracks a Joke: Automated Generation of Humorous Headlines

    Authors: Khalid Alnajjar, Mika Hämäläinen

    Abstract: Automated news generation has become a major interest for new agencies in the past. Oftentimes headlines for such automatically generated news articles are unimaginative as they have been generated with ready-made templates. We present a computationally creative approach for headline generation that can generate humorous versions of existing headlines. We evaluate our system with human judges and… ▽ More

    Submitted 17 September, 2021; originally announced September 2021.

    Comments: Proceedings of the 12th International Conference on Computational Creativity (ICCC 2021)

  20. arXiv:2108.09546  [pdf, ps, other

    cs.CL

    How Cute is Pikachu? Gathering and Ranking Pokémon Properties from Data with Pokémon Word Embeddings

    Authors: Mika Hämäläinen, Khalid Alnajjar, Niko Partanen

    Abstract: We present different methods for obtaining descriptive properties automatically for the 151 original Pokémon. We train several different word embeddings models on a crawled Pokémon corpus, and use them to rank automatically English adjectives based on how characteristic they are to a given Pokémon. Based on our experiments, it is better to train a model with domain specific data than to use a pret… ▽ More

    Submitted 21 August, 2021; originally announced August 2021.

    Comments: English translation of Hämäläinen, M., Alnajjar, K. \& Partanen, N. (2021). Nettikorpuksen avulla tuotettuja sanavektorimalleja Pokémonien ominaisuuksien kuvaamiseksi. In Saarikivi, T. \& Saarikivi, J. (eds.) \textit{Turhan tiedon kirja -- Tutkimuksista pois jätettyjä sivuja}

  21. arXiv:2108.00308  [pdf, ps, other

    cs.CL

    Human Evaluation of Creative NLG Systems: An Interdisciplinary Survey on Recent Papers

    Authors: Mika Hämäläinen, Khalid Alnajjar

    Abstract: We survey human evaluation in papers presenting work on creative natural language generation that have been published in INLG 2020 and ICCC 2020. The most typical human evaluation method is a scaled survey, typically on a 5 point scale, while many other less common methods exist. The most commonly evaluated parameters are meaning, syntactic correctness, novelty, relevance and emotional value, amon… ▽ More

    Submitted 31 July, 2021; originally announced August 2021.

    Comments: Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)

  22. arXiv:2107.03266  [pdf, ps, other

    cs.CL

    Lemmatization of Historical Old Literary Finnish Texts in Modern Orthography

    Authors: Mika Hämäläinen, Niko Partanen, Khalid Alnajjar

    Abstract: Texts written in Old Literary Finnish represent the first literary work ever written in Finnish starting from the 16th century. There have been several projects in Finland that have digitized old publications and made them available for research use. However, using modern NLP methods in such data poses great challenges. In this paper we propose an approach for simultaneously normalizing and lemmat… ▽ More

    Submitted 7 July, 2021; originally announced July 2021.

    Comments: la 28e Conférence sur le Traitement Automatique des Langues Naturelles (TALN)

  23. arXiv:2106.03391  [pdf, other

    cs.CL

    Apurinã Universal Dependencies Treebank

    Authors: Jack Rueter, Marília Fernanda Pereira de Freitas, Sidney da Silva Facundes, Mika Hämäläinen, Niko Partanen

    Abstract: This paper presents and discusses the first Universal Dependencies treebank for the Apurinã language. The treebank contains 76 fully annotated sentences, applies 14 parts-of-speech, as well as seven augmented or new features - some of which are unique to Apurinã. The construction of the treebank has also served as an opportunity to develop finite-state description of the language and facilitate th… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: The First Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP)

  24. arXiv:2106.03389  [pdf, other

    cs.CL

    Never guess what I heard... Rumor Detection in Finnish News: a Dataset and a Baseline

    Authors: Mika Hämäläinen, Khalid Alnajjar, Niko Partanen, Jack Rueter

    Abstract: This study presents a new dataset on rumor detection in Finnish language news headlines. We have evaluated two different LSTM based models and two different BERT models, and have found very significant differences in the results. A fine-tuned FinBERT reaches the best overall accuracy of 94.3% and rumor label accuracy of 96.0% of the time. However, a model fine-tuned on Multilingual BERT reaches th… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: 2021 Workshop on NLP4IF: Censorship, Disinformation, and Propaganda

  25. arXiv:2105.12428  [pdf, other

    cs.CL

    Neural Morphology Dataset and Models for Multiple Languages, from the Large to the Endangered

    Authors: Mika Hämäläinen, Niko Partanen, Jack Rueter, Khalid Alnajjar

    Abstract: We train neural models for morphological analysis, generation and lemmatization for morphologically rich languages. We present a method for automatically extracting substantially large amount of training data from FSTs for 22 languages, out of which 17 are endangered. The neural models follow the same tagset as the FSTs in order to make it possible to use them as fallback systems together with the… ▽ More

    Submitted 26 May, 2021; originally announced May 2021.

    Comments: The 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021)

  26. arXiv:2105.05542  [pdf, other

    cs.CL

    !Qué maravilla! Multimodal Sarcasm Detection in Spanish: a Dataset and a Baseline

    Authors: Khalid Alnajjar, Mika Hämäläinen

    Abstract: We construct the first ever multimodal sarcasm dataset for Spanish. The audiovisual dataset consists of sarcasm annotated text that is aligned with video and audio. The dataset represents two varieties of Spanish, a Latin American variety and a Peninsular Spanish variety, which ensures a wider dialectal coverage for this global language. We present several models for sarcasm detection that will se… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

    Comments: Accepted to The Third Workshop on Multimodal Artificial Intelligence (MAI-Workshop)

  27. arXiv:2104.05361  [pdf, ps, other

    cs.CL

    The Great Misalignment Problem in Human Evaluation of NLP Methods

    Authors: Mika Hämäläinen, Khalid Alnajjar

    Abstract: We outline the Great Misalignment Problem in natural language processing research, this means simply that the problem definition is not in line with the method proposed and the human evaluation is not in line with the definition nor the method. We study this misalignment problem by surveying 10 randomly sampled papers published in ACL 2020 that report results with human evaluation. Our results sho… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: Workshop on Human Evaluation of NLP Systems at EACL 2021

  28. From Plenipotentiary to Puddingless: Users and Uses of New Words in Early English Letters

    Authors: Tanja Säily, Eetu Mäkelä, Mika Hämäläinen

    Abstract: We study neologism use in two samples of early English correspondence, from 1640--1660 and 1760--1780. Of especial interest are the early adopters of new vocabulary, the social groups they represent, and the types and functions of their neologisms. We describe our computer-assisted approach and note the difficulties associated with massive variation in the corpus. Our findings include that while m… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

    Comments: In Multilingual Facilitation (2021)

  29. Endangered Languages are not Low-Resourced!

    Authors: Mika Hämäläinen

    Abstract: The term low-resourced has been tossed around in the field of natural language processing to a degree that almost any language that is not English can be called "low-resourced"; sometimes even just for the sake of making a mundane or mediocre paper appear more interesting and insightful. In a field where English is a synonym for language and low-resourced is a synonym for anything not English, cal… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

    Comments: In Multilingual Facilitation (2021)

  30. arXiv:2012.05331  [pdf

    cs.CL

    Speech Recognition for Endangered and Extinct Samoyedic languages

    Authors: Niko Partanen, Mika Hämäläinen, Tiina Klooster

    Abstract: Our study presents a series of experiments on speech recognition with endangered and extinct Samoyedic languages, spoken in Northern and Southern Siberia. To best of our knowledge, this is the first time a functional ASR system is built for an extinct language. We achieve with Kamas language a Label Error Rate of 15\%, and conclude through careful error analysis that this quality is already very u… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

    Comments: the 34th Pacific Asia Conference on Language, Information and Computation

  31. Normalization of Different Swedish Dialects Spoken in Finland

    Authors: Mika Hämäläinen, Niko Partanen, Khalid Alnajjar

    Abstract: Our study presents a dialect normalization method for different Finland Swedish dialects covering six regions. We tested 5 different models, and the best model improved the word error rate from 76.45 to 28.58. Contrary to results reported in earlier research on Finnish dialects, we found that training the model with one word at a time gave best results. We believe this is due to the size of the tr… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

    Comments: In Proceedings of the 4th ACM SIGSPATIAL Workshop on Geospatial Humanities (GeoHumanities'20)

  32. arXiv:2012.02578  [pdf, other

    cs.CL

    Ve'rdd. Narrowing the Gap between Paper Dictionaries, Low-Resource NLP and Community Involvement

    Authors: Khalid Alnajjar, Mika Hämäläinen, Jack Rueter, Niko Partanen

    Abstract: We present an open-source online dictionary editing system, Ve'rdd, that offers a chance to re-evaluate and edit grassroots dictionaries that have been exposed to multiple amateur editors. The idea is to incorporate community activities into a state-of-the-art finite-state language description of a seriously endangered minority language, Skolt Sami. Problems involve getting the community to take p… ▽ More

    Submitted 4 December, 2020; originally announced December 2020.

    Comments: Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations

  33. arXiv:2011.03502  [pdf, ps, other

    cs.CL cs.LG

    An Unsupervised method for OCR Post-Correction and Spelling Normalisation for Finnish

    Authors: Quan Duong, Mika Hämäläinen, Simon Hengchen

    Abstract: Historical corpora are known to contain errors introduced by OCR (optical character recognition) methods used in the digitization process, often said to be degrading the performance of NLP systems. Correcting these errors manually is a time-consuming process and a great part of the automatic approaches have been relying on rules or supervised machine learning. We build on previous work on fully au… ▽ More

    Submitted 6 November, 2020; originally announced November 2020.

  34. arXiv:2010.05269  [pdf, other

    cs.CL

    Automated Prediction of Medieval Arabic Diacritics

    Authors: Khalid Alnajjar, Mika Hämäläinen, Niko Partanen, Jack Rueter

    Abstract: This study uses a character level neural machine translation approach trained on a long short-term memory-based bi-directional recurrent neural network architecture for diacritization of Medieval Arabic. The results improve from the online tool used as a baseline. A diacritization model have been published openly through an easy to use Python package available on PyPi and Zenodo. We have found tha… ▽ More

    Submitted 11 October, 2020; originally announced October 2020.

  35. arXiv:2009.02685  [pdf, ps, other

    cs.CL

    Automatic Dialect Adaptation in Finnish and its Effect on Perceived Creativity

    Authors: Mika Hämäläinen, Niko Partanen, Khalid Alnajjar, Jack Rueter, Thierry Poibeau

    Abstract: We present a novel approach for adapting text written in standard Finnish to different dialects. We experiment with character level NMT models both by using a multi-dialectal and transfer learning approaches. The models are tested with over 20 different dialects. The results seem to favor transfer learning, although not strongly over the multi-dialectal approach. We study the influence dialectal a… ▽ More

    Submitted 6 September, 2020; originally announced September 2020.

    Comments: In proceedings of the Eleventh International Conference on Computational Creativity

  36. arXiv:2004.14062  [pdf, ps, other

    cs.CL

    Morphological Disambiguation of South Sámi with FSTs and Neural Networks

    Authors: Mika Hämäläinen, Linda Wiechetek

    Abstract: We present a method for conducting morphological disambiguation for South Sámi, which is an endangered language. Our method uses an FST-based morphological analyzer to produce an ambiguous set of morphological readings for each word in a sentence. These readings are disambiguated with a Bi-RNN model trained on the related North Sámi UD Treebank and some synthetically generated South Sámi data. The… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

    Comments: 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020)

  37. arXiv:2004.04803  [pdf, other

    cs.CL cs.FL

    FST Morphology for the Endangered Skolt Sami Language

    Authors: Jack Rueter, Mika Hämäläinen

    Abstract: We present advances in the development of a FST-based morphological analyzer and generator for Skolt Sami. Like other minority Uralic languages, Skolt Sami exhibits a rich morphology, on the one hand, and there is little golden standard material for it, on the other. This makes NLP approaches for its study difficult without a solid morphological analysis. The language is severely endangered and th… ▽ More

    Submitted 9 April, 2020; originally announced April 2020.

    Comments: Accepted to The 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020)

  38. arXiv:1910.13946  [pdf, other

    cs.CL

    Let's FACE it. Finnish Poetry Generation with Aesthetics and Framing

    Authors: Mika Hämäläinen, Khalid Alnajjar

    Abstract: We present a creative poem generator for the morphologically rich Finnish language. Our method falls into the master-apprentice paradigm, where a computationally creative genetic algorithm teaches a BRNN model to generate poetry. We model several parts of poetic aesthetics in the fitness function of the genetic algorithm, such as sonic features, semantic coherence, imagery and metaphor. Furthermor… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

    Journal ref: Proceedings of the 12th International Conference on Natural Language Generation (INLG 2019)

  39. From the Paft to the Fiiture: a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction

    Authors: Mika Hämäläinen, Simon Hengchen

    Abstract: A great deal of historical corpora suffer from errors introduced by the OCR (optical character recognition) methods used in the digitization process. Correcting these errors manually is a time-consuming process and a great part of the automatic approaches have been relying on rules or supervised machine learning. We present a fully automatic unsupervised way of extracting parallel data for trainin… ▽ More

    Submitted 12 October, 2019; originally announced October 2019.

    Journal ref: Proceedings of Recent Advances in Natural Language Processing. Angelova, G., Mitkov, R., Nikolova, I. & Temnikova, I. (eds.). Shoumen: INCOMA, p. 432-437 6 p (2019)

  40. arXiv:1909.02636  [pdf

    q-bio.QM cs.LG stat.ML

    Contextual Minimum-Norm Estimates (CMNE): A Deep Learning Method for Source Estimation in Neuronal Networks

    Authors: Christoph Dinh, John GW Samuelsson, Alexander Hunold, Matti S Hämäläinen, Sheraz Khan

    Abstract: Magnetoencephalography (MEG) and Electroencephalography (EEG) source estimates have thus far mostly been derived sample by sample, i.e., independent of each other in time. However, neuronal assemblies are heavily interconnected, constraining the temporal evolution of neural activity in space as detected by MEG and EEG. The observed neural currents are thus highly context dependent. Here, a new met… ▽ More

    Submitted 5 September, 2019; originally announced September 2019.

    Comments: 14 pages, 9 figures

  41. arXiv:1907.04954  [pdf, other

    cs.CL

    Modelling the Socialization of Creative Agents in a Master-Apprentice Setting: The Case of Movie Title Puns

    Authors: Mika Hämäläinen, Khalid Alnajjar

    Abstract: This paper presents work on modelling the social psychological aspect of socialization in the case of a computationally creative master-apprentice system. In each master-apprentice pair, the master, a genetic algorithm, is seen as a parent for its apprentice, which is an NMT based sequence-to-sequence model. The effect of different parenting styles on the creative output of each pair is in the foc… ▽ More

    Submitted 10 July, 2019; originally announced July 2019.

    Journal ref: Proceedings of the 10th International Conference on Computational Creativity. Grace, K., Cook, M., Ventura, D. & Maher, M. L. (eds.). Association for Computational Creativity, p. 266-273 (2019)

  42. arXiv:1803.04364  [pdf

    q-bio.NC cs.DM stat.ML

    Maturation Trajectories of Cortical Resting-State Networks Depend on the Mediating Frequency Band

    Authors: Sheraz Khan, Javeria Hashmi, Fahimeh Mamashli, Konstantinos Michmizos, Manfred Kitzbichler, Hari Bharadwaj, Yousra Bekhti, Santosh Ganesan, Keri A Garel, Susan Whitfield-Gabrieli, Randy Gollub, Jian Kong, Lucia M Vaina, Kunjan Rana, Steven Stufflebeam, Matti Hamalainen, Tal Kenet

    Abstract: The functional significance of resting state networks and their abnormal manifestations in psychiatric disorders are firmly established, as is the importance of the cortical rhythms in mediating these networks. Resting state networks are known to undergo substantial reorganization from childhood to adulthood, but whether distinct cortical rhythms, which are generated by separable neural mechanisms… ▽ More

    Submitted 12 February, 2018; originally announced March 2018.