frank's blog

80legs, Webcrawling For The Masses

Posted in ideas, Media, web by aldorf on January 12, 2010

Web crawling means browsing and indexing online content in an automated fashion. Its most prominent usage is in creating the databases for search engines like Google, but it’s important for anyone who wants to find content on the web, such as a movie studio that wants to find pirated footage, or an ad network that wants to see where its ads are being placed. For now, the main options are to build your own web crawler, usually using your own data center, or to take advantage of online services like Amazon Elastic MapReduce, says 80legs chief executive Shion Deysarkar.

You just make your choices from several menus, telling 80legs where you want it to crawl and what you want it to look for, and it returns a data file with your results.

80legs can crawl 2 billion pages a day!

You can track your brand in every nook. You can initiate new data crunching initiatives that would never have been funded otherwise. The service is like having a “mini-Google” at your disposal.

80legs is also opening an application store, where developers sell apps that further refine the web crawling results. For example, Deysarkar says, developers could sell apps that perform sentiment analysis, look for video fingerprints, or analyze sentence structure.

Similar webcrawling tools are Crimson HexagonRadian6

Advertisements
%d bloggers like this: