搜索引擎分词的设计与实现
1.无需注册登录,支付后按照提示操作即可获取该资料.
2.资料以网页介绍的为准,下载后不会有水印.资料仅供学习参考之用.
密 惠 保
摘要:
在中文搜索引擎中,中文分词的作用显而易见,其结果直接影响到搜索引擎的性能。当前的中文分词主要有三类方法:基于字符串匹配的分词方法、基于理解的分词方法和基于统计的分词方法。中文分词发展过程中遇到最大的问题是歧义识别和新词识别。中文分词的未来发展方向既要解决这类问题,使得达到较高的分词正确率,又要进行行业分词不断拓展中文分词的应用范围。本文在研究分词算法的基础上,同时设计实现了一个对搜索到的网页上的中文进行分词处理的系统。实验结果显示,本系统分词效果良好,分词算法可行,对搜索引擎的开发具有现实意义。 think58好,好think58 [资料来源:http://www.THINK58.com]
[资料来源:http://THINK58.com]
关键词:搜索引擎 中文分词 字符窜匹配 [来源:http://think58.com]
think58.com
[版权所有:http://think58.com]
Summary:
In the Chinese search engine, the role of Chinese word segmentation is obvious, and the results directly affects the performance of search engines. The current Chinese word there are three main ways: the word on the sub-string matching methods, methods based on understanding of the word and word-based statistical methods. Chinese word encountered in the development of the biggest problems is ambiguous to identify and recognize new words. Chinese Word of the future direction is necessary to solve such problems, making the correct word to a higher rate, but also for the industry continued to expand Chinese word segmentation range of applications. This paper studies segmentation algorithm based on the same design to achieve a web search to be on the Chinese word processing system. The results show that word of this system works well, segmentation algorithm feasible, the development of search engines have practical significance.
Keywords: Search engines match the Chinese word characters channeling think58.com [来源:http://think58.com]
[资料来源:www.THINK58.com]