Google to Archive Newspapers to 1764

James Joyner · Tuesday, September 9, 2008 · 7 comments

Google is planning to create a newspaper archive back to the Colonial Era.

Google Inc. is trying to expand the newspaper section of its online library to include billions of articles published during the past 244 years, hoping the added attraction will lure even more traffic to its leading Internet search engine.

The project announced Monday extends Google’s crusade to make digital copies of content created before the Internet’s advent, so the information can become more accessible and, ultimately, Google can make more money from ads shown on its Web site.

As part of the latest initiative, Google will foot the bill to copy the archives of any newspaper publisher willing to permit the stories to be shown for free on Google’s Web site. The participating publishers will receive an unspecified portion of the revenue generated from the ads displayed next to the stories.

This would be tremendously useful for writers, researchers, and other curious folk and there’s no obvious down side.

I don’t know how lucrative selling access to archives is for the major papers but I’d guess not very. If Google were willing to take up all of the administrative and logistical burden and return anything close to the current profits back to the papers, I would imagine they’d jump at the opportunity.

Comments

Patrick T McGuire says:

Tuesday, 9 September 2008 at 07:38

This would be tremendously useful for writers, researchers, and other curious folk and there’s no obvious down side.

Except that, in the long run, it moves readers away from printed paper to the internet and thereby furthering the ultimate death of the newspapers themselves.
James Joyner says:

Tuesday, 9 September 2008 at 08:30

Except that, in the long run, it moves readers away from printed paper to the internet and thereby furthering the ultimate death of the newspapers themselves.

I don’t see why. In the first instance, there’s not a lot of market for old newspapers. Now, if someone wants to research a NYT story from 1984, they go to the NYT Index at a library and view it on microfilm/fiche or whatever.

Further, the business of newspapers isn’t printing and distributing paper but rather writing and publishing news. They need to figure out how to better monetize their online presence but that’s their future.
Bithead says:

Tuesday, 9 September 2008 at 08:54

This would be tremendously useful for writers, researchers, and other curious folk and there’s no obvious down side.

Assuming accuracy. And who checks for this aspect? If the thing is to be used for research… and I’ve no doubt it will be, far more than would be printed matter, going forward, then it seems to me some external checking is in order, particularly given recent complaints about bias at Google… up to and including their board room.

That aside,

Further, the business of newspapers isn’t printing and distributing paper but rather writing and publishing news. They need to figure out how to better monetize their online presence but that’s their future.

And that’s really the thing, assuming that people aren’t running away from the papers because of their biases, as we’ve discussed here previously. I’m not suggesting governmental involvement in sucha project, which would by definition make at least the perception of problems even worse, but it would be helpful, I’d think, for a second party involved with the process.
Michael says:

Tuesday, 9 September 2008 at 11:04

Assuming accuracy. And who checks for this aspect? If the thing is to be used for research… and I’ve no doubt it will be, far more than would be printed matter, going forward, then it seems to me some external checking is in order, particularly given recent complaints about bias at Google… up to and including their board room.

Presumably they will all be scanned and OCR’d. To actually try and manipulate the final product for any kind of personal, business, or financial gain would make the whole endeavor exponentially more expensive.

I’m not suggesting governmental involvement in sucha project, which would by definition make at least the perception of problems even worse, but it would be helpful, I’d think, for a second party involved with the process.

The newspapers themselves, assuming they have their own copies, could do the same scan+OCR, and they directly compare texts, which would cost them proportional to their representation in Google’s sources. Or Universities could do it, even the government, but really why would any of them bother to spend the time and money that only Google has to accomplish the same thing? The only 2nd party that could possibly have the desire and resources to do it would be Microsoft, and history shows that they won’t become interested until Google has completely dominated the market, and even then their implementation would suck.
Bithead says:

Tuesday, 9 September 2008 at 13:19

Presumably they will all be scanned and OCR’d. To actually try and manipulate the final product for any kind of personal, business, or financial gain would make the whole endeavor exponentially more expensive.

OCR doesn’t seem likely for the vast majority of what they propose, given the antiquity of it.

That aside, I don’t anticipate, (though I don’t discount either) corporate meddling in such matters. But in terms of manual inputting… what do we end up with those involved in higher edcation. How many Scott Erbs are you going to trust not to be slanted on what they type?
Michael says:

Tuesday, 9 September 2008 at 13:28

OCR doesn’t seem likely for the vast majority of what they propose, given the antiquity of it.

Assuming they were all type set, OCR will probably work pretty well. For parts that it can’t handle, there’s always recaptha.

But in terms of manual inputting… what do we end up with those involved in higher edcation. How many Scott Erbs are you going to trust not to be slanted on what they type?

If I believed that Google would use manual data entry, I would allow for personal malice in the part of the individual. However, I seriously doubt that Google will be using such a slow, expensive, non-technical process when they have some of the smartest engineers on the planet on their payroll.

Even if I’m totally wrong, they’d just use guys in India, who would have no interest in the subject matter to begin with.
SJ Reidhead says:

Tuesday, 9 September 2008 at 16:24

I do hope the project works. If it did, I would gladly pay to use the service, if small town papers were archived. As a historian and a researcher, it is possible the service might save travel money. If Google does as other archives, the page will be scanned and entered that way. As a Wyatt Earp scholar, it may just help with a little bit of research.

SJR
The Pink Flamingo