April 30, 1998-Vol29n30: Finding pictures on the World Wide Web; 'WebPiction' developed at UB is first to combine text, image-processing technology

VOLUME 29, NUMBER 30

THURSDAY, APRIL 30, 1998

Finding pictures on the World Wide Web; 'WebPiction' developed at UB is first to combine text, image-processing technology

By ELLEN GOLDBAUM
News Services Editor

You wouldn't hire a private detective who couldn't distinguish one face from another.

Yet, strange as it sounds, that's about what Internet surfers do when they use commercial search engines to track down pictures of people.

That's because these search engines conduct picture searches the same way they conduct document searches: by matching keywords.

The drawback is that there is no guarantee that the faces of people specified in the search will appear in the retrieved images or even that the images will include any faces at all.

But UB computer scientists are developing a prototype of a World Wide Web picture search engine, called WebPiction, that has a unique face-detection component and is designed to produce many more "hits" than searches produced by existing commercial image-search engines.

That's because it is the first one that finds pictures by combining text and image processing.

Developed at the Center of Excellence for Document Analysis and Recognition (CEDAR), a demonstration version of WebPiction soon will be available to Web surfers and search-engine designers.

"We are a research group and our main objective is developing new technologies," said Rohini Srihari, associate professor of computer science. "The big search-engine companies don't have the researchers to develop sophisticated processing and we obviously don't have the resources to index the whole Web.

"With this prototype, we are saying to search-engine companies, 'These are the technologies that we have developed for picture searching; if you like them, incorporate them into your search engines,'" Srihari explained. Companies or organizations interested in licensing the WebPiction technologies should make inquiries through UB's Office of Technology Transfer.

The techniques used by WebPiction can be incorporated into Intranets-internal networks used by individual organizations-or into commercial search engines that index the entire Web.

Developed from a research project called "Show and Tell" that initially was funded by the Advanced Research Projects Agency with additional support from Eastman Kodak, the demonstration version of WebPiction will function with a subset of Web documents, thousands of them culled from a news service that posts hundreds of photographs each day.

WebPiction is among the first search engines to allow users to search for people in pictures with specific visual contexts, such as Bill Clinton at Niagara Falls, queries that today often produce the frustrating and familiar conclusion: "No documents found."

"This research will have a tremendous impact on multimedia information retrieval-where you are trying to retrieve information from text, images, video and audio," said Srihari.

It has the potential to organize pictures in all kinds of databases, from vast photo archives kept by wire services and intelligence agencies to photographs and videos in personal family albums.

The researchers are working with Kodak to further refine the technology for use in indexing and annotating images taken on home videocameras and still cameras.

WebPiction works by exploiting the accompanying caption or Web-page text to figure out what's in a picture.

It does so using natural-language understanding, the same colloquial language that humans use in conversation.

"In this research, we exploit the fact that images do not appear on the Web in isolation, but rather with accompanying text," said Srihari.

For example, she said, with other search engines, a query for pictures of Pope John Paul might turn up an image with the following caption: "President Clinton preparing for Pope John Paul's visit."

But because it understands that the caption is saying that the Pope does not appear in the picture, WebPiction would reject the image.

It also eliminates the possibility of retrieving a graphic, such as a chart, that doesn't have a single face in it.

"Because other search engines don't do any image analysis, they can produce a graph captioned 'Bill Clinton's approval rating' for a search for pictures of the president," Srihari said.

"Ours employs a face-detection system, so that ensures that there is a face in the picture and that it is, indeed, a picture."

WebPiction uses textual information in a file to first narrow down the list of candidate images, say, all of the files that show that Bill Clinton is actually in the image, already an improvement over conventional search engines.

It then uses image-processing techniques to search through those candidate images, producing the ones that best match the parameters of the query.

"People want to search for pictures based on visual attributes, for example, indoors or outdoors, by a building or a natural landscape," said Srihari. "Currently, without someone manually annotating each image, this is impossible. Our objective is to design systems that will automatically sort images based on these criteria, as though the computer is filling out a form that provides more information about what is in each image. We hope that if, one day, someone is searching for a picture of Bill Clinton's cat, Socks, in the Rose Garden, our system will be able to find it."

Members of the UB team that developed WebPiction are Zhongfei Zhang, CEDAR research scientist; Jianyong Hu, Yao Pu and Guizhen Yang, master's candidates in the Department of Electrical and Computer Engineering; Shuwei Chen, Joe Koontz and Xiaoyun Wu, master's candidates in the Department of Computer Science, and Aibing Rao, a master's candidate in the departments of Computer Science and Mathematics.