Almost every data set include missing information but methods for analyzing data almost always needs complete information.
Visualization of incomplete data allows to simultaneously explore the data
and the structure of missing values. This is helpful for learning about the distribution
of the incomplete information in the data, and to identify possible structures of the
missing values and their relation to the available information that can be used for better imputation of missing values.
Imputation of incomplete data is necessary to obtain a complete data set. Imputation methods varies from distance-based methods to model-based iterative procedures.
The R package VIM already include basic tools for visualization and imputation of missing values and it also provides a point and click graphical user interface.
The aim of the GSoC project is to
- integrate a selection of available imputation methods from R in the VIM system
- the tcltk-based graphical user interface should be replaced by a graphical user interface based on Gtk2.
- to allow to work with complex survey objects from the survey package and to generate survey objects. In addition basic operations from the survey package should be accessed through the GUI
Benefit for the Student
The student will get deep knowledge in programming in R and Gtk2 but also basic knowledge in complex sampling designs.
He will understand the needs of imputation in data from complex survey designs.
Basic knowledge in complex survey methodology. Advanced knowledge in any scripting language and at least basic knowledge in R.