Monday, April 2
Michael Grossniklaus is an SNSF-funded post-doc in Prof. David Maier’s group at Portland State University, where he works on the project entitled “Exploiting Object Database Technologies for Data Management in the Cloud”. Michael obtained his PhD in Computer Science from ETH Zurich, Switzerland in the research group of Prof. Moira Norrie. His work is situated in the area of databases and information systems with a focus on developing innovative technologies to support emerging and new application domains. In the past, he has defined an object-oriented version model that supports context-dependent data management and query processing, which has been successfully applied in the XCM content management system and the EdFest mobile tourist information system. During his post-doc in Prof. Stefano Ceri’s group at the Politecnico di Milano, Italy, Michael has worked on the C-SPARQL query language and processor for streaming RDF data as well as on the Search Computing (SeCo) project, where he contributed to the dataflow language Panta Rhei. More recently, Michael has begun to work on graph data management and processing. In particular, he is interested to understand the impact of different graph topologies and processing tasks on existing and novel database technologies.
Social network analysis, ontologies in the semantic web, interaction of proteins in biology, planning transportation grids, and routing network traffic are all applications that use graph data. The graph data instances found in these application are typically too large to fit into the main memory of a single machine, as graphs with millions or even billions of nodes and edges are not uncommon. As a consequence, classical graph data structures and algorithms are often not applicable and new techniques to manage and process data graphs are required. To make matters worse, graph topologies and processing tasks can vary greatly from one application to the next, making it very difficult to build general solutions. In this talk, we have a look at how different application scenarios of graph data can be characterized and matched to suitable data management technologies. In order to do so, we define a benchmark in terms of a data model, query workload, sample data sets, and usage scenarios. We also report initial performance figures measured based on an open-source graph database as well as commercial relational and object databases.