Research
The last research project concerned the area of visually comparison of a set of
Web pages This research successfully peaked in an accepted poster submission to
the World Wide Web Conference 2010
in Raleigh, NC.
The research I was doing during my studies lied in the area of Information Extraction
from the WWW. To extract information from the Web, it is necessary to locate and
identify entities of information on Web pages. Check out the online VENTex System to test our Java-based Table-Detection
algorithm. After detecting tables (without looking at the implementation), the next
step is the Table Recognition. This means that of all returned tables only those
tables which may contain "important" information are filtered. With having identified
these pages we tried to identify logical parts of tables for further processing
like identifying headers, subheaders and cells. The algorithms of categorizing such
entities of information are based on spatial and visual reasoning like "the only
cell in a table that has an other background than all the other cells and its content
is bold, must be of special interest".