Journal of the Operations Research Society of China ›› 2020, Vol. 8 ›› Issue (4): 601-617.doi: 10.1007/s40305-019-00270-z

Previous Articles     Next Articles

Proposal of Japanese Vocabulary Difficulty Level Dictionaries for Automated Essay Scoring Support System Using Rubric

Megumi Yamamoto1, Nobuo Umemura2, Hiroyuki Kawano3   

  1. 1 School of Contemporary International Studies, Nagoya University of Foreign Studies, Iwasaki-cho, Nisshin 470-0197, Japan;
    2 School of Media and Design, Nagoya University of Arts and Sciences, Nagoya 470-0196, Japan;
    3 Faculty of Science and Engineering, Nanzan University, Nagoya 466-8673, Japan
  • Received:2018-12-18 Revised:2019-07-16 Online:2020-12-30 Published:2020-12-29
  • Contact: Megumi Yamamoto, Nobuo Umemura, Hiroyuki Kawano E-mail:yamamoto@nufs.ac.jp;d_chaser@nuas.ac.jp;kawano@nanzan-u.ac.jp
  • Supported by:
    This work was supported by JSPS KAKENHI (Nos. 18K11589, 17K00432).

Abstract: We are developing a Moodle plug-in, which is an AES (automated essay scoring) support system for the basic education of university students. Our system evaluates essays based on rubric, which has five evaluation viewpoints “Contents, Structure, Evidence, Style, and Skill”. Vocabulary level is one of the scoring items of Skill. It is calculated using Japanese Language Learners’ Dictionaries constructed by Sunakawa et al. Since this does not fully cover the words used in the student-level essays, we found that there is a problem with the accuracy of the vocabulary level scoring. In this paper, we propose to construct comprehensive Japanese vocabulary difficulty level dictionaries using Japanese Wikipedia as the corpus. We apply Latent Dirichlet Allocation (LDA) to the Wikipedia corpus and find the word appearance probability as oneoftheindexesofworddifficulty.WeusetheTF-IDFvalueinsteadoftheLDAvalue of the words, which rarely appears. As a result, we constructed highly comprehensive Japanese vocabulary difficulty level dictionaries. We confirmed that the vocabulary levelcanbescoredforallwordsinthetestdatasetbyusingtheconstructeddictionaries.

Key words: Automated essay scoring, Vocabulary level, Dictionary, Wikipedia, Corpus, LDA, Rubric

CLC Number: