I just figured out the search engines’ next foray!

technology

I know, I know, some of you are probably sick of me idly speculating on what Google, Yahoo!, and Microsoft are going to do next, but I just had yet another vision that I wanted to share with you.

One of the search engines is going to build or buy a leading OCR and/or photo scanning software package.

Why?

Well, just do the plotline in your head. Google just built a system (Google Base) which — if, perhaps rather inelegantly — lets people add additional content in bulk for that search engine to slurp up.

Google and (separately) the Open Content Alliance are busy scanning the world’s books.

So we have Web pages, music, images, scholarly research, books, and more being indexed… but what about all those zillions of papers folks have laying around? Like the ones I just set about scanning this evening to reduce some of the clutter around my desk.

What have I been scanning? A list of waltz moves, an e-mail directory, a memorable schedule of a recent dance camp I attended, and a funny article I wrote for my high school newspaper.

How much of this would the world be interested in? How much would I really WANT to share? Not all of it, to be sure.

But from older academic papers to newspaper clippings to home photos and more… there’s a TON of information out there that’s not digitized.

Not digitized yet, that is.

And interestingly enough, decent scanners (albeit not slide scanners) are pretty darn cheap ($50 or less, especially used ones on ebay). But really good OCR software? At least $150, from what I’ve gathered. Students, families, home-office professionals… I bet most of them have scanners. But I doubt most of them have OCR software.

Then again, perhaps the search engines could simply piggyback onto non-OCR scanning software and do the OCR on their supercomputers inhouse. Greater ability to iterate, do A|B testing on scan quality, etc., without depending upon users to update software.

* * *

Benefit to engines:

  • A huge database to improve NLP (natural language processing) algorithms… better understanding the interplay of text, graphs, photos, etc.
  • Access to a ton of new content
  • Further enticement to consumers to get onto their desktops (e.g., perhaps bundled in with Google Desktop or MSN Search or Yahoo-X1 search, etc.)

Benefit to consumers:

  • Ability to archive documents and/or photos online with greater accuracy, and for less money (even free) for personal retrieval.
  • Easier way to share not-yet-digitized documents with colleagues, using an OCR’d (much less bandwidth intensive) format
  • Probably other stuff I’m overlooking

* * *

What are your thoughts on this?

1) How feasible do you think it is that one of the search engines will buy/build such a service?
2) Which search engine’d do this first?
3) How useful would it actually be to general consumers? Small business folks? Others?

3 comments… add one
  • J. Nov 28, 2005

    *cough* Riya buyout rumors *cough*

  • Adam Nov 28, 2005

    Heh heh, J, well, one of the head advisors of Riya specifically denied that Google was buyin’ them.  But admittedly that leaves Microsoft and Yahoo.

    Or……… for the wacky-but-could-happen-idea… how ‘bout AOL buying Riya 😉

  • ramon Nov 30, 2005

    I feel sorry for the people who has to scan all those 🙂

What do you think?