A Parallel Max-Miner Algorithm For An Efficient Association Rules Learning (ARL) And Knowledge Mining Using The Intel® oneAPI Toolkit
Arthur V. Ratz
Lviv, Lviv Oblast
This project demonstrates the using of the Intel® oneAPI library to deliver a modern code in Data Parallel C++, implementing a Parallel Max-Miner algorithm to optimize the performance of the association rules learning (ARL) process ...learn more
Project status: Published/In Market
Overview / Usage
Nowadays, a variety of complex multi-dimensional data is generated as a part of the modern scientifical and commercial processes. Naturally, the various of those knowledge, trends and insights might be acquired from the complex multi-dimensional data, being generated, or delivered, from the “edge”, by real-time streams.
In fact, the computer-aided knowledge discovery is one of the most complex, and, at the same time, interesting problems of engineering and computer science, having a widespread of applications in many fields of science and technology, including mathematical and economical statistics, marketing and e-commerce, medical care and bionomics, linguistics, information security, computer graphics and visualization, geolocational systems, artificial intelligence (AI), expert systems, etc.
For a past decade, the knowledge mining has emerged as an efficient approach, used as an essential part of the various data mining solutions, addressing the specific real-world problems, which could not be previously accomplished by solely performing the conventional data processing, itself.
Presently, there is a number of methods for discovering a potentially “interesting” knowledge and insights.
The Association Rules Learning (ARL) is an approach that allows us to do the knowledge discovery efficiently. To perform the association rules learning, the various of data mining algorithms are used.
Specifically, the famous Apriori and FPGrowth algorithms allow to perform the finding of the potentially “interesting” and “meaningful” association rules, exhibited on the arbitrary input data, arranged into sets of transactions, from which the various of knowledge and insights are acquired.
However, the dynamically growing amounts of multi-dimensional big-data, generated or delivered in real-time, typically makes it impossible to use the following algorithms for performing the efficient association rules learning.
Obviously, that, we need a slightly different approach to perform the finding of those association rules, exhibited on the big-data, the potential sizes of which, are growing exponentially.
The Max-Miner algorithm formulated and introduced by Roberto J. Bayardo Jr., in his research work, is a highly efficient alternative to the classical algorithms, mentioned above. Unlike the Apriori and FPGrowth, the Max-Miner algorithm allows to significantly decrease the complexity of the association rules process, from the NP-hard to just O(N^3), as well as to reduce the number of dimensions of the entire association rules learning process. This, in turn, makes it possible to perform the association rules learning in the typically huge amounts of big-data.
Another benefit of using the Max-Miner algorithm is that the entire process of association rules learning, in this case, can be scaled, providing an even more performance speed-up and efficiency, while being executed on the multi-core symmetric Intel® CPUs and other innovative hardware acceleration targets, such as GPGPUs or FPGAs. The following, basically, allows to finally eliminate the persistent problem of the sufficient performance degrades due to the enormously high computational complexity and potentially huge amounts of big-data, itself.
The main goal of this project is to demonstrate the using of Intel® oneAPI HPC Toolkit and other specific performance libraries and tools to deliver a modern code, implementing the parallel Max-Miner algorithm, applying it for finding those “interesting” and “meaningful” association rules, acquiring knowledge and insights, exhibited on the data, being processed.
Methodology / Approach
Roberto J. Bayardo Jr., at the **IBM Almaden Research Center **(San Jose, CA) and **MIT **(Cambridge, MA), in his incredible research work -“Efficiently Mining Long Patterns from Databases” (1998), formulated and introduced a generalized “vanilla” **Max Miner Association Rules Learning (ARL) Algorithm **(https://www.bayardo.org/ps/sigmod98.pdf);
Abhi Sharma, at the “**Stealth-Mode” Startup Project (Mountain View, CA), designed an amazing presentation slides, published in Prezi, demonstrating the using of Max Miner algorithm, by an example, previously formulated by Roberto J. Bayardo Jr., **in his research work, listed above (https://prezi.com/dut1qsejzdwe/max-miner-algorithm/);
Akobir Shakhidi, at the **Loginom Holding **(Moscow, Russia), in a series of his articles, thoroughly discussed about the **Association Rules Learning (ARL) **concept, based on a several of the known classical algorithms (Apriori, FPGrowth, ...), used by Loginom Holding for providing the association rules learning (ARL) functionality of their branded **Loginom AI Analytics Suite **( https://loginom.com/ );
Technologies Used
Intel® oneAPI HPC Toolkit, Data Parallel C (DPC)
Documents and Presentations
Repository
https://github.com/arthurratz/intel_max_miner_oneapi