Inhaltsverzeichnis

1 Data Mining for High Dimensional Dynamic Data

Data Mining for High Dimensional Dynamic Data

Motivation

Modern data impose new challenges and requirements for the data mining area due to their special characteristics. High dimensionality is one of these characteristics; an object might be described by a large number of attributes and also, there might be correlations or overlaps between these attributes. Modern data are also characterized by a high degree of variability that is, they evolve over time as new data records are inserted and old records are moved. A special particularly interesting category of dynamic data is stream data that continuously flow in and out of systems at high-speed (e.g. Internet traffic data, sensor data, position tracking data etc.) and its usually impossible to store them all or scan them multiple times. Both these aspects of modern data are considered within this project.

Goal

This project is about data mining in high dimensional dynamic data. To deal with the high dimensionality issue, we would exploit the area of subspace clustering which aims at finding clusters at different subspaces of the original feature space. For the high degree of variability, we would rely on the stream mining area, especially on clustering and evolution monitoring over data streams.

Open issues (project, diploma, bachelor's, master's thesis)

Incremental subspace clustering
Subspace clustering over data streams
Modeling, detecting and monitoring subspace cluster changes in order to gain insights on the population evolution.
Summarize cluster changes over an evolving population
...

Requirements

Good programming skills
Knowledge of KDD concepts (e.g. clustering, classification)
Motivation

Von „http://fogo.dbs.ifi.lmu.de/cms/OffeneThemen/Theses-STREAM“