Kolloquium Dienstag 11.6. 10:15 Uhr, WE5/5.013
Stefan Betzmeir (Master WI): Inducing Structural Prototypes for the Classification of Semi-Structured Data
The internet has evolved from a network of documents into a network of heterogeneous web applications. Semi-structured data offers enough structural flexibility, while providing the necessary semantic annotation to transfer information between those applications. Today the XML Data Model is the de facto standard meta-model for semi-structured data on the internet. Due to its generic nature, XML has become the foundation of numerous domain specific formats, causing an abundance of XML data that holds considerable potential for data analysis. This paper presents a case-based reasoning framework for classifying semi-structured data. It supports the extraction of a prototype tree from a cluster of data trees, as well as the classification of a data tree, based on its similarity to a set of prototype trees. By representing each cluster with only one prototype, an acceptable runtime of the classification can be maintained, even if the data clusters keep on growing.