我院博士生论文获得数据库顶级会议ICDE 2015最佳论文奖

更新时间:2015-04-20 17:13:48 浏览量:

近日,我院计算机系2012级博士生华雯(第一作者)、王仲远(第二作者)的论文《Short Text Understanding Through Lexical-Semantic Analysis》被数据库顶级国际会议ICDE 2015 (31st IEEE International Conference on Data Engineering)作为长文录用,并获得最佳论文奖(Best Paper Award)。这是中国学者第一次在数据库顶级国际会议上获得最佳论文。

华雯是我院2006级计算机专业本科生,曾获得“全国信息安全竞赛一等奖“,获评第一届萨师煊精英基金奖学金。本科毕业后继续在我院攻读博士研究生,师从周晓方教授,后被我院选派前往澳大利亚昆士兰大学交流学习。

王仲远是我院2003级计算机专业本科生,曾获得我校最高学生荣誉“吴玉章奖学金”,获评ACM SIGMOD Undergraduate Scholar国际奖学金(全球共七人),硕士毕业后在微软亚洲研究院工作,并继续攻读我院在职博士,师从文继荣教授。

ICDE是数据库研究领域历史悠久的国际会议,与SIGMOD、VLDB并称为数据库三大顶级会议,入选中国计算机学会(CCF)A类会议,每年吸引到世界上数千名学者聚到一起探讨数据库的发展前沿。本年度会议于2015年4月12日至16日在韩国汉城举行,我院博士生华雯、王仲远作为最佳论文奖获得者与会,并与世界各国的学者深入交流。

该论文针对现有自然语言处理(Natural Language Processing, NLP)算法在短文本理解上的不足,提出了一个基于语义知识的短文本理解框架,大大提高了短文本处理的精确度(从52.6%提高到89%)。论文的基本信息如下:

Title: Short Text Understanding Through Lexical-Semantic Analysis

Authors: Wen Hua, Zhongyuan Wang, Haixun Wang, Kai Zheng, Xiaofang Zhou

Abstract: Understanding short texts is crucial to many applications, but challenges abound. First, short texts do not always observe the syntax of a written language. As a result, traditional natural language processing methods cannot be easily applied. Second, short texts usually do not contain sufficient statistical signals to support many state-of-the-art approaches for text processing such as topic modeling. Third, short texts are usually more ambiguous. We argue that knowledge is needed in order to better understand short texts. In this work, we use lexical-semantic knowledge provided by a well-known semantic network for short text understanding. Our knowledge-intensive approach disrupts traditional methods for tasks such as text segmentation, part-of-speech tagging, and concept labeling, in the sense that we focus on semantics in all these tasks. We conduct a comprehensive performance evaluation on real-life data. The results show that knowledge is indispensable for short text understanding, and our knowledge-intensive approaches are effective in harvesting semantics of short texts.