most-hadoop

Description

The student is integrating high performance, parallel data processing routines into the MOST framework. Based on the MOST NoSQL modules (Cassandra, neo4j), the available data processing routines (e.g. periodic data calculation) were moved to an independent Java module. Based on this, the student includes hadoop support to distribute these calculations.

Benefit for the student

The student works with state of the art database technologies. He/she gains expertise in the development of high-scalable and distributed data processing frameworks.

Benefit for the Project

Handling building data on an urban level requires scalable architecture. This work offers the possibility to distribute the various available data processing routines with the goal to improve performance.

Requirements

Good Java programming skills, Interest in NoSQL datastores and data processing algorithms

Mentors

Harald Hofstätter, Stefan Glawischnig, Rainer Bräuer, Robert Zach

Contact

Mentors are regularly around in our GSoC IRC channel #TU-CSE-SoC at irc.freenode.net. You can also reach us via the mailinglist – send an email to This email address is being protected from spambots. You need JavaScript enabled to view it. using the prefix [MOST] (a subscription is required).

More information

http://www.iue.tuwien.ac.at/cse/wiki2014/doku.php?id=distributed_data_preprocessing_via_apache_hadoop

 

Yellow elephant in Image by hadoop.