ShoppingNotes.com will be the monkey for you re: Yahoo! SearchMonkey

May 13, 2008

Yahoo! is launching its search developer platform SearchMonkey (I don’t know why they name it monkey. Maybe because monkeys have some intelligence but only enough to handle trivial tasks?!). Early responses from the blogosphere seem quite positive. So I signed up for an account and played around with it a little bit. A couple of thoughts came to my mind:

First, it’s a really cool concept! This will for sure significantly improve search results. Actually, search engines are already showing this type of metadata for queries about maps, stock tickers, celebrities, etc. with Google OneBox and Yahoo! Shortcut. Now SearchMonkey is taking this concept one step further to include broader subjects and more web sites. Just imagine some day you’ll get a lot more Google OneBox type search results for a lot more queries.

However, for the technical part, SearchMonkey uses a DOM-based approach (XPath) to do the data extraction (a.k.a. scraping). That is, SearchMonkey is going to have the same disadvantages that all DOM-based approaches are born with. For example, it requires a person (or monkey) to program new XPath expressions for each new site added. Even for old sites you have dealt with before, you still need to constantly come back to re-program them when their HTML layouts are changed. With SearchMonkey, Yahoo! looks to be trying to enlist and organize an army of people (or monkeys, against Google’s army of robots, I guess) experienced in scraping and making their works shareable among the community. It is basically like a open-source, teamwork approach.

DOM-based approaches have been around for years, and has become people’s choice when you want some smartness in your scraping. But from our experience with the shopping vertical, DOM-based approaches simply don’t work so elegantly.

We at ShoppingNotes.com use a fundamentally different approach. We look at not only the DOM structure of an HTML page but also the semantics and many other things inside it. The result is, given any product page from any shopping site, our intelligent software is able to extract its product price and image. The process is fully automated without any involvement of people (or monkeys). That is, no XPath expressions or templates or scripts or whatever need to be programmed for any particular site.

Sounds impossible?! That’s most people’s response when they first hear about this. In fact, even Wikipedia currently says this is undoable (maybe I should try to get that page revised). Well, maybe not any more. Head to ShoppingNotes.com now and see it working live for yourself! Simply enter any product page URL from any shopping site and your email address, and we’ll scrape its product image and price real-time for you. And there are no monkeys working behind the scene as you send in your request:

While with Yahoo! SearchMonkey, you’ll need to deal with XML, XSLT, XPath, etc., which may just disqualify many people to be SearchMonkeys who don’t understand these things (including me):

So ShoppingNotes.com hopes to be the monkey for you in the shopping vertical so that you don’t have to. We do think that our technology will be an interesting complement to the SearchMonkey platform. In fact, we’d be happy to wrap our product scraping function as a SearchMonkey Data Service. What do you guys think?! Anyway, I’m going to the SearchMonkey Launch Party on May 15. I’d be happy to chat about this. Any ideas on how our technology can be used are welcome.

P.S.: Our scraping algorithm is already working with most shopping sites, although we are still fine-tuning it. We know it’s not perfect yet, but we are confident that we are heading in the right direction, and that we will get there soon.

Update: I had a chance to meet Amit Kumar, Yahoo! Director and product manager of SearchMonkey, at the Launch Party. He let me know that SearchMonkey indeed has another Web Service interface besides the DOM-based approach I previously mentioned. This makes wrapping our product scraping function as a SearchMonkey Data Service possible (and not difficult). So we’ll get started right away. Thanks, Amit!

Posted by myyang
Filed in Geek talk
Tags: scraping, search, searchmonkey, semantic web, shopping, shopping 3.0, shoppingnotes, yahoo

6 Comments »

6 Responses to “ShoppingNotes.com will be the monkey for you re: Yahoo! SearchMonkey”

Markus Says:

May 15, 2008 at 8:52 am
Hi Meng, I think offering your product scraping function as a SearchMonkey Data Service is an excellent idea.

If nothing else, it would get you great exposure on many levels, the yahooligans included.

Congrats on the launch. I always thought George was a surefire winner!

Reply
Alan Brown Says:

May 15, 2008 at 9:14 am
You lie. We know there are gerbils inside shoppingnotes.com

Reply
myyang Says:

May 19, 2008 at 11:16 pm
Markus, thanks for your kind words! I guess I’m still far far away from the 12M+ user base you have (Oh, I’m drooling…). Wish both of us great success!

Reply
myyang Says:

May 19, 2008 at 11:19 pm
Alan, no, I tried gerbils but it didn’t work. That’s why I replaced them with monkeys.

Reply
Semantic Shopping Monkeys « The ShoppingNotes blog Says:

July 3, 2008 at 9:53 pm
[…] July 3, 2008 Yahoo! SearchMonkey, Yahoo!’s initiative to open up its search engine for third-party developers to add more flavors to the currently text-only search results, sounded like such a great idea that we jumped in as soon as we heard about it. […]

Reply
The ShoppingNotes blog » Blog Archive » Semantic Shopping Monkeys Says:

August 1, 2008 at 11:24 am
[…] Yahoo! SearchMonkey, Yahoo!’s initiative to open up its search engine for third-party developers to add more flavors to the currently text-only search results, sounded like such a great idea that we jumped in as soon as we heard about it. […]

Reply