
密 惠 保
摘 要
关键字: 频繁项集;MapReduce并行框架;MMR-Apriori;事务压缩映射表
Apriori algorithm is a classical data mining algorithm. For the mining of frequent itemsets, it is mainly achieved through connecting and pruning step. However, in the traditional Apriori algorithm, there are two major problems: (1) A large number of candidate sets need to be generated during the calculation of frequent itemsets (2) The database needs to be scanned repeatedly, and each generation of a candidate itemset requires a database scan.
The thesis mainly studies how to solve second questions mentioned above of the traditional Apriori algorithm. The purpose is to improve the efficiency of the algorithm by reducing the number of times of scanning the original transaction set database, thus achieving the effect of processing large data sets.
The research results show that compared with the traditional Apriori algorithm, the MMR-Apriori based on the MapReduce parallel framework can not only achieve the effect of processing big data, but also improve the running efficiency of the parallelized Apriori algorithm.
The characteristic of this paper lies in: The optimal design of MMR-Apriori algorithm establishes the corresponding 1- frequent itemsets index table bythe 1-frequent itemsets generated, and establishes the transactioncompression mapping table according to the index table. On the one hand, the original transaction set database can be greatly compressed, and on the other hand, the 1-frequent itemsets index table and the transaction compression mapping table are established, which greatly reduces the statistical time of occurrence of each itemset in the k-candidate set generated by iteration.
KeyWords:frequent itemsets;MapReduce parallel framework;MMR-Apriori;transaction compression mapping table
