This paper studies the problem of categorical data clustering, especially for transactional data characterized by high dimensionality and large volume. Starting from a heuristic method of increasing the height-to-width ratio of the cluster histogram, we develop a novel algorithm – CLOPE, which is very fast and scalable, while being quite effective. We demonstrate the performance of our algorithm on two real world
标签: data transactional categorical clustering
上传时间: 2015-10-24
上传用户:evil
From helping to assess the value of new medical treatments to evaluating the factors that affect our opinions and behaviors, analysts today are finding myriad uses for categorical data methods. In this book we introduce these methods and the theory behind them. Statistical methods for categorical responses were late in gaining the level of sophistication achieved early in the twentieth century by methods for continuous responses. Despite influential work around 1900 by the British statistician Karl Pearson, relatively little development of models for categorical responses occurred until the 1960s. In this book we describe the early fundamental work that still has importance today but place primary emphasis on more recent modeling approaches. Before outlining
标签: evaluating treatments the helping
上传时间: 2014-01-25
上传用户:jennyzai
In this paper, we present LOADED, an algorithm for outlier detection in evolving data sets containing both continuous and categorical attributes. LOADED is a tunable algorithm, wherein one can trade off computation for accuracy so that domain-specific response times are achieved. Experimental results show that LOADED provides very good detection and false positive rates, which are several times better than those of existing distance-based schemes.
标签: algorithm detection containi evolving
上传时间: 2014-01-08
上传用户:aeiouetla