Skip to main content

Showing 1–36 of 36 results for author: Utiyama, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.11197  [pdf, other

    cs.CL

    Centroid-Based Efficient Minimum Bayes Risk Decoding

    Authors: Hiroyuki Deguchi, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe, Hideki Tanaka, Masao Utiyama

    Abstract: Minimum Bayes risk (MBR) decoding achieved state-of-the-art translation performance by using COMET, a neural metric that has a high correlation with human evaluation. However, MBR decoding requires quadratic time since it computes the expected score between a translation hypothesis and all reference translations. We propose centroid-based MBR (CBMBR) decoding to improve the speed of MBR decoding.… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  2. Universal Multimodal Representation for Language Understanding

    Authors: Zhuosheng Zhang, Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Zuchao Li, Hai Zhao

    Abstract: Representation learning is the foundation of natural language processing (NLP). This work presents new methods to employ visual information as assistant signals to general NLP tasks. For each sentence, we first retrieve a flexible number of images either from a light topic-image lookup table extracted over the existing sentence-image pairs or a shared cross-modal embedding space that is pre-traine… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

    Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  3. arXiv:2212.00460  [pdf, other

    cs.CL

    Language Model Pre-training on True Negatives

    Authors: Zhuosheng Zhang, Hai Zhao, Masao Utiyama, Eiichiro Sumita

    Abstract: Discriminative pre-trained language models (PLMs) learn to predict original texts from intentionally corrupted ones. Taking the former text as positive and the latter as negative samples, the PLM can be trained effectively for contextualized representation. However, the training of such a type of PLMs highly relies on the quality of the automatically constructed samples. Existing PLMs simply treat… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

    Comments: Accepted by AAAI 2023

  4. arXiv:2108.12599  [pdf, other

    cs.CL cs.AI cs.HC cs.IR

    Smoothing Dialogue States for Open Conversational Machine Reading

    Authors: Zhuosheng Zhang, Siru Ouyang, Hai Zhao, Masao Utiyama, Eiichiro Sumita

    Abstract: Conversational machine reading (CMR) requires machines to communicate with humans through multi-turn interactions between two salient dialogue states of decision making and question generation processes. In open CMR settings, as the more realistic scenario, the retrieved background knowledge would be noisy, which results in severe challenges in the information transmission. Existing studies common… ▽ More

    Submitted 2 September, 2021; v1 submitted 28 August, 2021; originally announced August 2021.

    Comments: Accepted by EMNLP 2021 Main Conference

  5. arXiv:2107.12627  [pdf, other

    cs.CL

    Cross-lingual Transferring of Pre-trained Contextualized Language Models

    Authors: Zuchao Li, Kevin Parnow, Hai Zhao, Zhuosheng Zhang, Rui Wang, Masao Utiyama, Eiichiro Sumita

    Abstract: Though the pre-trained contextualized language model (PrLM) has made a significant impact on NLP, training PrLMs in languages other than English can be impractical for two reasons: other languages often lack corpora sufficient for training powerful PrLMs, and because of the commonalities among human languages, computationally expensive PrLM training for different languages is somewhat redundant. I… ▽ More

    Submitted 27 July, 2021; originally announced July 2021.

  6. arXiv:2104.03523  [pdf, ps, other

    cs.CL

    User-Generated Text Corpus for Evaluating Japanese Morphological Analysis and Lexical Normalization

    Authors: Shohei Higashiyama, Masao Utiyama, Taro Watanabe, Eiichiro Sumita

    Abstract: Morphological analysis (MA) and lexical normalization (LN) are both important tasks for Japanese user-generated text (UGT). To evaluate and compare different MA/LN systems, we have constructed a publicly available Japanese UGT corpus. Our corpus comprises 929 sentences annotated with morphological and normalization information, along with category information we classified for frequent UGT-specifi… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: NAACL-HLT 2021

  7. Text Compression-aided Transformer Encoding

    Authors: Zuchao Li, Zhuosheng Zhang, Hai Zhao, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita

    Abstract: Text encoding is one of the most important steps in Natural Language Processing (NLP). It has been done well by the self-attention mechanism in the current state-of-the-art Transformer encoder, which has brought about significant improvements in the performance of many NLP tasks. Though the Transformer encoder may effectively capture general information in its resulting representations, the backbo… ▽ More

    Submitted 11 February, 2021; originally announced February 2021.

  8. arXiv:2012.15086  [pdf, other

    cs.CL cs.AI cs.CV

    Accurate Word Representations with Universal Visual Guidance

    Authors: Zhuosheng Zhang, Haojie Yu, Hai Zhao, Rui Wang, Masao Utiyama

    Abstract: Word representation is a fundamental component in neural language understanding models. Recently, pre-trained language models (PrLMs) offer a new performant method of contextualized word representations by leveraging the sequence-level context for modeling. Although the PrLMs generally give more accurate contextualized word representations than non-contextualized models do, they are still subject… ▽ More

    Submitted 30 December, 2020; originally announced December 2020.

  9. arXiv:2010.05122  [pdf, ps, other

    cs.CL

    SJTU-NICT's Supervised and Unsupervised Neural Machine Translation Systems for the WMT20 News Translation Task

    Authors: Zuchao Li, Hai Zhao, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita

    Abstract: In this paper, we introduced our joint team SJTU-NICT 's participation in the WMT 2020 machine translation shared task. In this shared task, we participated in four translation directions of three language pairs: English-Chinese, English-Polish on supervised machine translation track, German-Upper Sorbian on low-resource and unsupervised machine translation tracks. Based on different conditions of… ▽ More

    Submitted 10 October, 2020; originally announced October 2020.

    Comments: WMT20

  10. arXiv:2008.01523  [pdf, other

    cs.CL

    A System for Worldwide COVID-19 Information Aggregation

    Authors: Akiko Aizawa, Frederic Bergeron, Junjie Chen, Fei Cheng, Katsuhiko Hayashi, Kentaro Inui, Hiroyoshi Ito, Daisuke Kawahara, Masaru Kitsuregawa, Hirokazu Kiyomaru, Masaki Kobayashi, Takashi Kodama, Sadao Kurohashi, Qianying Liu, Masaki Matsubara, Yusuke Miyao, Atsuyuki Morishima, Yugo Murawaki, Kazumasa Omura, Haiyue Song, Eiichiro Sumita, Shinji Suzuki, Ribeka Tanaka, Yu Tanaka, Masashi Toyoda , et al. (4 additional authors not shown)

    Abstract: The global pandemic of COVID-19 has made the public pay close attention to related news, covering various domains, such as sanitation, treatment, and effects on education. Meanwhile, the COVID-19 condition is very different among the countries (e.g., policies and development of the epidemic), and thus citizens would be interested in news in foreign countries. We build a system for worldwide COVID-… ▽ More

    Submitted 11 October, 2020; v1 submitted 27 July, 2020; originally announced August 2020.

    Comments: Accepted to EMNLP 2020 Workshop NLP-COVID

  11. arXiv:2004.10171  [pdf, other

    cs.CL

    Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation

    Authors: Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao

    Abstract: Unsupervised neural machine translation (UNMT) has recently achieved remarkable results for several language pairs. However, it can only translate between a single language pair and cannot produce translation results for multiple language pairs at the same time. That is, research on multilingual UNMT has been limited. In this paper, we empirically introduce a simple method to translate between thi… ▽ More

    Submitted 21 April, 2020; originally announced April 2020.

    Comments: Accepted to ACL 2020

  12. arXiv:2004.04507  [pdf, other

    cs.CL

    Self-Training for Unsupervised Neural Machine Translation in Unbalanced Training Data Scenarios

    Authors: Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao

    Abstract: Unsupervised neural machine translation (UNMT) that relies solely on massive monolingual corpora has achieved remarkable results in several translation tasks. However, in real-world scenarios, massive monolingual corpora do not exist for some extremely low-resource languages such as Estonian, and UNMT systems usually perform poorly when there is not adequate training corpus for one language. In th… ▽ More

    Submitted 23 May, 2021; v1 submitted 9 April, 2020; originally announced April 2020.

    Comments: Accepted by NAACL 2021

  13. arXiv:2004.03818  [pdf, other

    cs.CL cs.LG

    Explicit Reordering for Neural Machine Translation

    Authors: Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita

    Abstract: In Transformer-based neural machine translation (NMT), the positional encoding mechanism helps the self-attention networks to learn the source representation with order dependency, which makes the Transformer-based NMT achieve state-of-the-art results for various translation tasks. However, Transformer-based NMT only adds representations of positions sequentially to word vectors in the input sente… ▽ More

    Submitted 8 April, 2020; originally announced April 2020.

  14. arXiv:2004.02127  [pdf, other

    cs.CL

    Reference Language based Unsupervised Neural Machine Translation

    Authors: Zuchao Li, Hai Zhao, Rui Wang, Masao Utiyama, Eiichiro Sumita

    Abstract: Exploiting a common language as an auxiliary for better translation has a long tradition in machine translation and lets supervised learning-based machine translation enjoy the enhancement delivered by the well-used pivot language in the absence of a source language to target language parallel corpus. The rise of unsupervised neural machine translation (UNMT) almost completely relieves the paralle… ▽ More

    Submitted 9 October, 2020; v1 submitted 5 April, 2020; originally announced April 2020.

    Comments: EMNLP 2020, ACL Findings

  15. arXiv:2002.12558  [pdf, other

    cs.CL

    Modeling Future Cost for Neural Machine Translation

    Authors: Chaoqun Duan, Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Conghui Zhu, Tiejun Zhao

    Abstract: Existing neural machine translation (NMT) systems utilize sequence-to-sequence neural networks to generate target translation word by word, and then make the generated word at each time-step and the counterpart in the references as consistent as possible. However, the trained translation model tends to focus on ensuring the accuracy of the generated target word at the current time-step and does no… ▽ More

    Submitted 28 February, 2020; originally announced February 2020.

  16. arXiv:2002.12549  [pdf, other

    cs.CL

    Robust Unsupervised Neural Machine Translation with Adversarial Denoising Training

    Authors: Haipeng Sun, Rui Wang, Kehai Chen, Xugang Lu, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao

    Abstract: Unsupervised neural machine translation (UNMT) has recently attracted great interest in the machine translation community. The main advantage of the UNMT lies in its easy collection of required large training text sentences while with only a slightly worse performance than supervised neural machine translation which requires expensive annotated translation pairs on some translation tasks. In most… ▽ More

    Submitted 2 December, 2020; v1 submitted 28 February, 2020; originally announced February 2020.

    Comments: Accepted at COLING 2020

  17. arXiv:1912.11980  [pdf, other

    cs.CL

    Explicit Sentence Compression for Neural Machine Translation

    Authors: Zuchao Li, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Zhuosheng Zhang, Hai Zhao

    Abstract: State-of-the-art Transformer-based neural machine translation (NMT) systems still follow a standard encoder-decoder framework, in which source sentence representation can be well done by an encoder with self-attention mechanism. Though Transformer-based encoder may effectively capture general information in its resulting source sentence representation, the backbone information, which stands for th… ▽ More

    Submitted 26 December, 2019; originally announced December 2019.

    Comments: Working in progress, part of this work is accepted in AAAI-2020

  18. arXiv:1911.02971  [pdf, other

    cs.CL cs.CV cs.LG

    Probing Contextualized Sentence Representations with Visual Awareness

    Authors: Zhuosheng Zhang, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Hai Zhao

    Abstract: We present a universal framework to model contextualized sentence representations with visual awareness that is motivated to overcome the shortcomings of the multimodal parallel data with manual annotations. For each sentence, we first retrieve a diversity of images from a shared cross-modal embedding space, which is pre-trained on a large-scale of text-image pairs. Then, the texts and images are… ▽ More

    Submitted 7 November, 2019; originally announced November 2019.

  19. arXiv:1910.14528  [pdf, other

    cs.CL

    Document-level Neural Machine Translation with Associated Memory Network

    Authors: Shu Jiang, Rui Wang, Zuchao Li, Masao Utiyama, Kehai Chen, Eiichiro Sumita, Hai Zhao, Bao-liang Lu

    Abstract: Standard neural machine translation (NMT) is on the assumption that the document-level context is independent. Most existing document-level NMT approaches are satisfied with a smattering sense of global document-level information, while this work focuses on exploiting detailed document-level context in terms of a memory network. The capacity of the memory network that detecting the most relevant p… ▽ More

    Submitted 24 August, 2021; v1 submitted 31 October, 2019; originally announced October 2019.

  20. arXiv:1909.00562  [pdf, other

    cs.DC cs.CL cs.LG cs.NE

    Hybrid Data-Model Parallel Training for Sequence-to-Sequence Recurrent Neural Network Machine Translation

    Authors: Junya Ono, Masao Utiyama, Eiichiro Sumita

    Abstract: Reduction of training time is an important issue in many tasks like patent translation involving neural networks. Data parallelism and model parallelism are two common approaches for reducing training time using multiple graphics processing units (GPUs) on one machine. In this paper, we propose a hybrid data-model parallel approach for sequence-to-sequence (Seq2Seq) recurrent neural network (RNN)… ▽ More

    Submitted 9 September, 2019; v1 submitted 2 September, 2019; originally announced September 2019.

    Comments: 9 pages, 4 figures, 5 tables

  21. arXiv:1908.09605  [pdf, ps, other

    cs.CL

    Revisiting Simple Domain Adaptation Methods in Unsupervised Neural Machine Translation

    Authors: Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao, Chenhui Chu

    Abstract: Domain adaptation has been well-studied in supervised neural machine translation (SNMT). However, it has not been well-studied for unsupervised neural machine translation (UNMT), although UNMT has recently achieved remarkable results in several domain-specific language pairs. Besides the inconsistent domains between training data and test data for SNMT, there sometimes exists an inconsistent domai… ▽ More

    Submitted 5 May, 2020; v1 submitted 26 August, 2019; originally announced August 2019.

  22. arXiv:1809.07043   

    cs.CL

    NICT's Corpus Filtering Systems for the WMT18 Parallel Corpus Filtering Task

    Authors: Rui Wang, Benjamin Marie, Masao Utiyama, Eiichiro Sumita

    Abstract: This paper presents the NICT's participation in the WMT18 shared parallel corpus filtering task. The organizers provided 1 billion words German-English corpus crawled from the web as part of the Paracrawl project. This corpus is too noisy to build an acceptable neural machine translation (NMT) system. Using the clean data of the WMT18 shared news translation task, we designed several features and… ▽ More

    Submitted 11 October, 2018; v1 submitted 19 September, 2018; originally announced September 2018.

    Comments: Due to the policy of our institute, with the agreement of all of the author, we decide to withdraw this paper

  23. arXiv:1809.07037   

    cs.CL

    NICT's Neural and Statistical Machine Translation Systems for the WMT18 News Translation Task

    Authors: Benjamin Marie, Rui Wang, Atsushi Fujita, Masao Utiyama, Eiichiro Sumita

    Abstract: This paper presents the NICT's participation to the WMT18 shared news translation task. We participated in the eight translation directions of four language pairs: Estonian-English, Finnish-English, Turkish-English and Chinese-English. For each translation direction, we prepared state-of-the-art statistical (SMT) and neural (NMT) machine translation systems. Our NMT systems were trained with the t… ▽ More

    Submitted 11 October, 2018; v1 submitted 19 September, 2018; originally announced September 2018.

    Comments: Due to the policy of our institue, with the agreement of all of the author, we decide to withdraw this paper

  24. arXiv:1808.08482   

    cs.CL

    Exploring Recombination for Efficient Decoding of Neural Machine Translation

    Authors: Zhisong Zhang, Rui Wang, Masao Utiyama, Eiichiro Sumita, Hai Zhao

    Abstract: In Neural Machine Translation (NMT), the decoder can capture the features of the entire prediction history with neural connections and representations. This means that partial hypotheses with different prefixes will be regarded differently no matter how similar they are. However, this might be inefficient since some partial hypotheses can contain only local differences that will not influence futu… ▽ More

    Submitted 14 October, 2018; v1 submitted 25 August, 2018; originally announced August 2018.

    Comments: Due to the policy of our institute, with the agreement of all of the author, we decide to withdraw this paper

  25. arXiv:1805.00178  [pdf, other

    cs.CL

    Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation

    Authors: Rui Wang, Masao Utiyama, Eiichiro Sumita

    Abstract: Traditional Neural machine translation (NMT) involves a fixed training procedure where each sentence is sampled once during each epoch. In reality, some sentences are well-learned during the initial few epochs; however, using this approach, the well-learned sentences would continue to be trained along with those sentences that were not well learned for 10-30 epochs, which results in a wastage of t… ▽ More

    Submitted 2 October, 2019; v1 submitted 1 May, 2018; originally announced May 2018.

    Comments: Revised version of ACL-2018

  26. arXiv:1804.02559  [pdf, other

    cs.CL

    Guiding Neural Machine Translation with Retrieved Translation Pieces

    Authors: Jingyi Zhang, Masao Utiyama, Eiichro Sumita, Graham Neubig, Satoshi Nakamura

    Abstract: One of the difficulties of neural machine translation (NMT) is the recall and appropriate translation of low-frequency words or phrases. In this paper, we propose a simple, fast, and effective method for recalling previously seen translation examples and incorporating them into the NMT decoding process. Specifically, for an input sentence, we use a search engine to retrieve sentence pairs whose so… ▽ More

    Submitted 7 April, 2018; originally announced April 2018.

    Comments: NAACL 2018

  27. arXiv:1802.07170  [pdf, ps, other

    cs.CL

    CytonMT: an Efficient Neural Machine Translation Open-source Toolkit Implemented in C++

    Authors: Xiaolin Wang, Masao Utiyama, Eiichiro Sumita

    Abstract: This paper presents an open-source neural machine translation toolkit named CytonMT (https://github.com/arthurxlw/cytonMt). The toolkit is built from scratch only using C++ and NVIDIA's GPU-accelerated libraries. The toolkit features training efficiency, code simplicity and translation quality. Benchmarks show that CytonMT accelerates the training speed by 64.5% to 110.8% on neural networks of var… ▽ More

    Submitted 2 June, 2018; v1 submitted 16 February, 2018; originally announced February 2018.

  28. arXiv:1711.04231  [pdf, other

    cs.CL

    Syntax-Directed Attention for Neural Machine Translation

    Authors: Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao

    Abstract: Attention mechanism, including global attention and local attention, plays a key role in neural machine translation (NMT). Global attention attends to all source words for word prediction. In comparison, local attention selectively looks at fixed-window source words. However, alignment weights for the current target word often decrease to the left and right by linear distance centering on the alig… ▽ More

    Submitted 19 September, 2019; v1 submitted 11 November, 2017; originally announced November 2017.

    Comments: AAAI2018, revised version

  29. arXiv:1711.00309  [pdf, other

    cs.CL

    Improving Neural Machine Translation through Phrase-based Forced Decoding

    Authors: Jingyi Zhang, Masao Utiyama, Eiichro Sumita, Graham Neubig, Satoshi Nakamura

    Abstract: Compared to traditional statistical machine translation (SMT), neural machine translation (NMT) often sacrifices adequacy for the sake of fluency. We propose a method to combine the advantages of traditional SMT and NMT by exploiting an existing phrase-based SMT model to compute the phrase-based decoding cost for an NMT output and then using this cost to rerank the n-best NMT outputs. The main cha… ▽ More

    Submitted 1 November, 2017; originally announced November 2017.

    Comments: IJCNLP2017

  30. arXiv:1609.04186  [pdf, other

    cs.CL

    Neural Machine Translation with Supervised Attention

    Authors: Lemao Liu, Masao Utiyama, Andrew Finch, Eiichiro Sumita

    Abstract: The attention mechanisim is appealing for neural machine translation, since it is able to dynam- ically encode a source sentence by generating a alignment between a target word and source words. Unfortunately, it has been proved to be worse than conventional alignment models in aligment accuracy. In this paper, we analyze and explain this issue from the point view of re- ordering, and propose a su… ▽ More

    Submitted 14 September, 2016; originally announced September 2016.

    Comments: This paper was submitted into COLING2016 on July 10, and it is under review

  31. arXiv:1607.08693  [pdf, ps, other

    cs.CL

    Connecting Phrase based Statistical Machine Translation Adaptation

    Authors: Rui Wang, Hai Zhao, Bao-Liang Lu, Masao Utiyama, Eiichro Sumita

    Abstract: Although more additional corpora are now available for Statistical Machine Translation (SMT), only the ones which belong to the same or similar domains with the original corpus can indeed enhance SMT performance directly. Most of the existing adaptation methods focus on sentence selection. In comparison, phrase is a smaller and more fine grained unit for data selection, therefore we propose a stra… ▽ More

    Submitted 29 July, 2016; originally announced July 2016.

    Comments: under review by COLING-2016

    Journal ref: It is published in COLING 2016

  32. A Novel Bilingual Word Embedding Method for Lexical Translation Using Bilingual Sense Clique

    Authors: Rui Wang, Hai Zhao, Sabine Ploux, Bao-Liang Lu, Masao Utiyama, Eiichiro Sumita

    Abstract: Most of the existing methods for bilingual word embedding only consider shallow context or simple co-occurrence information. In this paper, we propose a latent bilingual sense unit (Bilingual Sense Clique, BSC), which is derived from a maximum complete sub-graph of pointwise mutual information based graph over bilingual corpus. In this way, we treat source and target words equally and a separated… ▽ More

    Submitted 2 August, 2016; v1 submitted 29 July, 2016; originally announced July 2016.

    Comments: under review by COLING-2016

  33. arXiv:cs/0105001  [pdf, ps, other

    cs.CL

    Correction of Errors in a Modality Corpus Used for Machine Translation by Using Machine-learning Method

    Authors: Masaki Murata, Masao Utiyama, Kiyotaka Uchimoto, Qing Ma, Hitoshi Isahara

    Abstract: We performed corpus correction on a modality corpus for machine translation by using such machine-learning methods as the maximum-entropy method. We thus constructed a high-quality modality corpus based on corpus correction. We compared several kinds of methods for corpus correction in our experiments and developed a good method for corpus correction.

    Submitted 2 May, 2001; originally announced May 2001.

    Comments: 9 pages. Computation and Language. This paper is the English translation of our Japanese papar

    ACM Class: H.3.3; I.2.7

  34. arXiv:cs/0103013  [pdf, ps, other

    cs.CL

    CRL at Ntcir2

    Authors: Masaki Murata, Masao Utiyama, Qing Ma, Hiromi Ozaku, Hitoshi Isahara

    Abstract: We have developed systems of two types for NTCIR2. One is an enhenced version of the system we developed for NTCIR1 and IREX. It submitted retrieval results for JJ and CC tasks. A variety of parameters were tried with the system. It used such characteristics of newspapers as locational information in the CC tasks. The system got good results for both of the tasks. The other system is a portable… ▽ More

    Submitted 12 March, 2001; originally announced March 2001.

    Comments: 11 pages. Computation and Language. This paper describes our results of information retrieval in the NTCIR2 contest

    ACM Class: H.3.3; I.2.7

  35. arXiv:cs/0008032  [pdf, ps, other

    cs.CL

    Japanese Probabilistic Information Retrieval Using Location and Category Information

    Authors: Masaki Murata, Qing Ma, Kiyotaka Uchimoto, Hiromi Ozaku, Masao Utiyama, Hitoshi Isahara

    Abstract: Robertson's 2-poisson information retrieve model does not use location and category information. We constructed a framework using location and category information in a 2-poisson model. We submitted two systems based on this framework to the IREX contest, Japanese language information retrieval contest held in Japan in 1999. For precision in the A-judgement measure they scored 0.4926 and 0.4827,… ▽ More

    Submitted 28 August, 2000; originally announced August 2000.

    Comments: 7,8 pages. Computation and Language. IRAL'2000, Hong Kong, September 30, 2000

    ACM Class: H.3.3; I.2.7

  36. arXiv:cs/9911006  [pdf, ps, other

    cs.CL

    Question Answering System Using Syntactic Information

    Authors: M. Murata, M. Utiyama, H. Isahara

    Abstract: Question answering task is now being done in TREC8 using English documents. We examined question answering task in Japanese sentences. Our method selects the answer by matching the question sentence with knowledge-based data written in natural language. We use syntactic information to obtain highly accurate answers.

    Submitted 15 November, 1999; originally announced November 1999.

    Comments: 6 pages, 0 figures. Computation and Language

    ACM Class: H.3.3; I.2.7