WEB文档预处理的文本特征提取
1.无需注册登录,支付后按照提示操作即可获取该资料.
2.资料以网页介绍的为准,下载后不会有水印.资料仅供学习参考之用.
密 惠 保
摘要:本文通过对词频矩阵中的每个项的出现频率(词频)进行统计,按照词频的大小选出预定数目的特征项构成为特征子集(即关键字),设计出词频空间特征提取方法。首先利用最大匹配算法对文件进行词语切分,然后导入词频矩阵,统计词频矩阵中各项出现的频率,最后提取出文本特征。 think58
[来源:http://www.think58.com]
关键字:词频矩阵 特征 词频空间 最大匹配算法 词语切分 think58
[版权所有:http://think58.com]
copyright think58 [资料来源:http://www.THINK58.com]
think58好,好think58 [资料来源:THINK58.com]
Abstract: Based on word frequency matrix in the frequency of occurrence of each item (word frequency) statistics, the size of word frequency selected in accordance with a predetermined number of items constitute the characteristic feature subset (ie, keyword), word frequency design space feature extraction method . First, using the maximum matching algorithm for word segmentation file, then import the word frequency matrix, word frequency statistics of the frequency matrix, and finally extract the text feature.
Keywords: word frequency space, word frequency matrix eigenvalue algorithm for maximum matching word segmentation
[版权所有:http://think58.com]