Researchers at the Cornell University Center for Advanced Computing are using a supercomputer to download and analyze almost 35 million photographs, taken by more than 300,000 photographers, posted on the photo-sharing Web site Flickr. The goal was to develop new methods for automatically organizing and labeling large-scale data collections. Cornell developed techniques to automatically identify places that people want to photograph, which included thousands of locations. “We developed classification methods for characterizing these locations from visual, textual, and temporal features,” says Cornell professor Daniel Huttenlocher. “These methods reveal that both visual and temporal features improve the ability to estimate the location of a photo compared to using just textual tags.” The scalability of Cornell’s method could allow the system to be used for automatically mining the data in extremely large sets of images, possibly leading to an online travel guidebook that automatically identifies the best places to visit. The researchers used a mean shift procedure for the data analysis, and ran their program on the Hadoop Cluster, a 480-core Linux-based supercomputer at the Cornell Center for Advanced Computing. Hadoop uses a computation paradigm called Map/Reduce to divide applications into small segments of work, each of which can be executed on any node of the cluster.
For more information please visit: http://www.cpccci.com