The WEKA open-source software is a state-of-the-art integrated development environment that is very popular among data mining researchers and practitioners. WEKA is implemented in Java.
Multi-class classification is one of the central tasks in data mining, and is concerned with automatically classifying a given sample (e.g. a handwritten digit) in one of pre-defined classes (e.g. numeric digits).
WEKA features a variety of classification algorithms that compute scores of the samples to belong to the classes.
Example for such algorithms are Bayesian Classifiers, Hidden Markov Models and Neural Networks.
Visualization is a very valuable tool to aid classifier designers in analyzing the performance of their algorithms, find sources of classification errors (i.e. false negatives, false positives), and investigate how these errors can be reduced using available data features. Recent research in visualization proposes new ways to analyze probablistic classification results using interactive visualization (see figure below).
The purpose of this work is to implement parts of the proposed visualizations in the WEKA environment, using Java.
An existing Java-based prototypical implementation of these visualization will be provided.
Data mining experts should be able to visualize the classification results of any probabilistic classifiers they apply in WEKA, and to interactively select certain samples in the visualization for further investigation and detailed feature analysis.
Benefit for the Student
- Learning the WEKA environment, the mainstream open-source data mining software
- Gaining experience in developing a plugin to a large open-source project such as WEKA
- Learning how to develop and apply interactive visualizations to support solving real problems.
- Gaining experience in implementing new research results in a software project
Benefit for the Project
- Providing a new interactive visualization functionality for WEKA
- Enabling advanced analysis of classification performance in WEKA
- Making research results accessible to data mining practioners who use WEKA
- Solid programming skills in Java, preferably with documented participation in open-source development.
- Solid understanding of classification, preferably through documented participation in data mining courses
- Experience in developing interactive user interfaces, preferably with good knowledge about the Java2d graphics library or similar.
Overview of the topic: http://www.cvast.tuwien.ac.at/ConfusionAnalysis
Article describing the visualization: http://publik.tuwien.ac.at/files/PubDat_229886.pdf
Demonstration video: https://youtu.be/QUZfPImmeEs
The Developer version of WEKA (http://www.cs.waikato.ac.nz/ml/weka/downloading.html
The WEKA Manual (Ch. 17 Extending WEKA): http://www.cs.uu.nl/docs/vakken/dm/WekaManual.pdf
Adding tabs in WEKA Explorer:http://weka.wikispaces.com/Adding+tabs+in+the+Explorer
Explorer error visualization plugins: http://weka.wikispaces.com/Explorer+error+visualization+plugins