Research

The last research project concerned the area of visually comparison of a set of Web pages This research successfully peaked in an accepted poster submission to the World Wide Web Conference 2010 in Raleigh, NC.

The research I was doing during my studies lied in the area of Information Extraction from the WWW. To extract information from the Web, it is necessary to locate and identify entities of information on Web pages. Check out the online VENTex System to test our Java-based Table-Detection algorithm. After detecting tables (without looking at the implementation), the next step is the Table Recognition. This means that of all returned tables only those tables which may contain "important" information are filtered. With having identified these pages we tried to identify logical parts of tables for further processing like identifying headers, subheaders and cells. The algorithms of categorizing such entities of information are based on spatial and visual reasoning like "the only cell in a table that has an other background than all the other cells and its content is bold, must be of special interest".