Logic of Data Mining: The GUHA method

The GUHA method is an original Czech method of exploratory data analysis. Its principle is to offer all the interesting facts yielded by the given data in relation to the given problem. GUHA is realized by the GUHA procedures. The input for the GUHA-procedure consists of the analyzed data and of a few parameters defining a very large set of potentially interesting patterns. The output is a list of all prime patterns. The pattern is prime if it is both true in the analyzed data and it does not immediately follow from the other more simple output patterns.

GUHA has been developed since the 1960s. The oldest GUHA procedure is the ASSOC procedure, which mines for association rules. It mines not only for "classical" association rules with confidence and support, but also for additional association rules describing various relations of two Boolean attributes, including relations corresponding to statistical hypotheses tests. The ASSOC procedure was implemented several times. The implementation is not based on the well-known a-priori algorithm but it uses suitable strings of bits to represent analyzed data. The most used GUHA procedure today is the 4ft-Miner procedure, which mines for various types of association rules including conditional rules.

Software tools that make it possible to very quickly compute various contingency tables were developed for the 4ft-Miner procedure. These tools were used to implement five additional GUHA procedures. All of these procedures are included in the academic software system LISp-Miner. The GUHA procedures of the LISp-Miner system mine for various patterns that are verified using one or two contingency tables. All of these procedures have very fine tools, to adjust the set of relevant patterns that are to be generated and verified. The LISp-Miner system was many times applied to solve real practical tasks of data mining.

Several research activities are also related to GUHA methods and to the LISp-Miner system. They concern namely logical calculi for data mining, applications of semantics in data mining, analytical reports summarizing results of data mining and automatic converting generalized association rules into sentences of natural language.

Ourgroup doing both theoretical and applied research, however, the main activites are going on in Prague; see more here. Here are some of our recent results

  • Turunen, E.: Interpreting GUHA Data Mining Logic in Paraconsistent Fuzzy Logic Framework. In A. Tsoukias (Ed.): ADT09, LNAI, (2009), 1-10.
  • Sainio, E, Mesiar, R. and Turunen, E.: A Characterization of Fuzzy Implications Generated by Generalized Quantifires. Fuzzy Sets and Systems 159(2008), 491-499.

  • Ylirinne, E. and Turunen, E.: Interpreting Data Mining Quantifiers in Mathematical Fuzzy logic. FSCS 2006. Symposium on Fuzzy Systems in Computer Science 2006. 27.-28. September. Magdeburg, Germany. (2006), 33-41.

  • Coufal, D. and Turunen, E.: Short Term Prediction of Highway Travel Time Using Data Mining and Neuro-Fuzzy Methods. Neural Network World 3-4(2004), 221-231.