Microsoft Office File Formats to Go XML
Microsoft Office File Formats to Go XML (PC World’s Techlog)
We don’t know a lot about the next version of Microsoft Office. That isn’t a huge surprise considering that the new version (which bears the catchy code-name “Office 12”) isn’t due until some time in the second half of 2006. But Microsoft just made one announcement about the suite upgrade, and it’s a potentially huge one: Word, Excel, and Powerpoint will use XML-based files as their native file format. (These files will use a schema of Microsoft’s own devising; the company has dubbed the format “Microsoft Office Open XML.”)
Why is that big news? As long as there have been Microsoft Office apps, they’ve saved files in proprietary file formats that have been nebulous and poorly documented…which meant that any company that wasn’t Microsoft has had a hard time supporting them. That’s one reason why the road’s been rough for Office alternatives such as WordPerfect Office and OpenOffice.org. Realistically, they’ve had to support Microsoft’s files. And realistically, they can never support the Office formats perfectly. (They’ve gotten better over the years, but PowerPoint, especially, seems difficult to handle.) And even third-party apps that complement Office rather than aiming to replace it (such as DataViz’s Documents to Go, one of my favorites) must figure out how to speak the Office file formats to get their work done. I’ve never heard anyone claim that’s a cakewalk, or that Microsoft’s done everything it could to make the job easy.
XML won’t instantly let the rest of the world grok Office formats, and since Microsoft is writing its own schema, it’ll still have tight control over what documents can and can’t do. (Note that it calls the formats “Open” rather than “Open Source.”) But XML is, by its nature, vastly more transparent than Microsoft’s old way of doing things. And Chris Capossela, the Microsoft vice president we spoke with today, says that the company will work hard to document the XML-based formats well, so that third-party companies have a much simpler time developing applications that play nicely with Office files.
Any time that a major application makes any changes to its file formats, there’s the possibility of compatability problems. (Especially when there are multiple versions of the app still in wide use–and I know folks who are still running Office 97.) But Microsoft’s plans for smoothing the transition seem sound. For one thing, it says it’ll release free updates that let Office 2000, XP, and 2003 read and write the new file formats. It also says that it will release a bulk converter that will let you translate lots of files automatically. And it says that you’ll be able to opt to go back to the binary file formats as the default if you choose. Still, if you upgrade to the new Office, be prepared for a period during which not every other application you use–or every person you collaborate with–is ready to work with Microsoft Office Open XML files. There’s going to be some confusion, at least in the short term.
Other side benefits of the new formats: Microsoft says that the XML files will be compressed, so they’ll be around 50 to 75 percent smaller than their current equivalents. And it says that if a file gets corrupted, Office should still be able to read everything except for the specific element of the document that got damaged.
The WaPo/AP story on this elaborates on the file size issue:
Next Office Edition to Default to XML
First, the file sizes will be much smaller, letting people send files as e-mail attachments more easily. Also, within a single document, the new XML format will store text, charts, images and other chunks of data as separate components. That will make it easier to access the data and recover undamaged parts of any files that get corrupted, Leach said.
Computer Business Review (Microsoft set to open office via XML formats ) adds,
It has been widely reported that the new file extensions will see an “x” added to the famous .doc, .xls, and .ppt extensions, although Pryke-Smith said the company is not confirming the exact file extension details at this stage.
What it has confirmed is that users will still have the choice of saving files with the more traditional formats. The company will also release patches to coincide with Office 12 that will enable Office 2000, Office XP, and Office 2003 users to open, edit, and save files using the new formats.
The company will also release patches to coincide with Office 12 that will enable Office 2000, Office XP, and Office 2003 users to open, edit, and save files using the new formats.
The company has previously declined suggestions that it should open up its file formats to an industry standards body, and Pryke-Smith again stated that Microsoft is unlikely to do so because it would rather keep control of the formats so that it can provide backward compatibility for the estimated 400 million Office users worldwide. “Something that’s unique to Microsoft is that backwards compatibility is important and the best way we can control that backwards compatibility is to maintain the management of the format,” he said.
This is very interesting, giving the ubiquity of Office. It’s ironic that Microsoft is finally doing something about file size bloat at a time when storage is cheaper and more convenient than ever. Not long ago, the inability to email large documents was a huge problem, requiring the saving of materials to disc. Now, with thumb drives and other easy file sharing mechanisms, it’s less problematic. Still, there are often times when I need to send a large file, especially a PowerPoint presentation, and run up against file size limitations.
Talked to my engineerring prof about this today. He’s a big OpenOffice advocate and used to use StarOffice. The implications related to MS responding to OpenOffice in this fashion are an even bigger deal than just the file size issue.
I’d wait until MS shows the specs of “MS Open XML” before making the breathless proclamations this article does. IIRC, MS was part of the team (with Netscape) that managed to tear up HTML with proprietary tags and inconsistent support for standard language.