January 16th 2007

To DTD or not to DTD

Posted by Chris Finke

› tags: dtd, netscape, rss

Over the weekend, the tech community noticed that a file crucial to the operation of certain RSS readers was MIA. This file, the DTD for RSS 0.91, had been hosted at my.netscape.com, and its purpose was essentially to explain the structure of RSS 0.91 documents and to provide definitions for a set of character entities that could be used in such documents.

Theoretically, RSS readers load this file when parsing an RSS 0.91 feed. However, In practice, most readers (including those built into Firefox and Internet Explorer) either just ignore the file or load their own cached copy.

my.netscape.com is undergoing a redesign, and when we announced the redesign about 10 days ago, the DNS entry for my.netscape.com was changed to point to the new server where My Netscape will be living. This had the effect of making anything under the old my.netscape.com unavailable, since the only thing public on the new server is a splash page. So, ipso facto, the DTD was no longer available.

The unavailability of this file had the effect of causing certain feed readers - Microsoft's Live.com RSS gadget, for one - to refuse to display RSS 0.91 feeds. This is what we call in the technical community "not good." So, we've restored the file (along with the DTD for RSS 0.9) for the time being, but this experience has raised a few important questions: should feed readers be relying on the availability of a static document on a third-party Web server (and thus a connection to the Internet)? Is it truly necessary to request this document every time an RSS 0.91 feed is being parsed? (The RSS 0.91 DTD is requested over four million times per day - that's a lot of wasted bandwidth for a file that won't ever change.) In our opinion, the answer to both of these questions is no.

So until July 1, 2007, the DTDs for RSS 0.9 and 0.91 will be available via my.netscape.com. If you are a software developer, use this time to ensure that your RSS software is capable of displaying RSS feeds even if the DTD is unavailable, or have a backup copy cached locally for your parser to use in the absence of the specified DTD. If you are a content provider, either update your feeds to point to another copy of the DTD, or accept the fact that your feed may not be available through feed readers that don't have a backup plan in the case of a missing DTD.

related links ›

Comments

(Page 1)

11:00AMSel

I think this makes a very good point. I don't run an RSS myself, but I read sites that use them every day. I do not understand why you would rely on a file stored elsewhere to make your rss software work. You have no control over the third parties site, and they may do exactly what happened... make changes. Relying on a remote file, especially a static one, is nonsence. Hopefully people take notice and correct the situation. Otherwise, its going to be a hot summer for RSS.

11:01AMDaniel Glazman

disclaimer : I am a former Netscaper, CPD, Composer team and Layout team.

When RSS was designed by Ramanathan Guha in '99, it was an official Netscape technology, extensively used for dogfood and the portal. The URL of the dtd was public, and it became a vital piece of markup for the current web.
Chris, "cool URIs never change". This sentence has been on W3C's site for ages, constantly repeated by W3C staff in Web conferences.

RSS DTD's url should not change, cannot change. The resource it targets should NOT be cached. It's Netscape's responsability over time to provide that resource to the whole community because that's how it goes : user agents have the right to check the availability of the dtd because they have the right to check if the RSS feed they deal with is a valid or not. DTD urls are unique and stable only for that reason, to avoid cached copies, to provide the world with one and only authenticated, verified, stable copy of the DTD.

I think this decision to cancel access to the RSS DTD with its current public URL is counter-productive and reveals that the current Netscape, that is definitely not the one I worked for, just tramples Web Standards under foot. I urge you to reconsider this decision, that can only hurt Netscape for only a minor win.

11:42AMDave

When the file goes away, any chance you could retain and publish the data on how often the (missing) file is subsequently requested? No particularly good reason, but I'd find it interesting to see how this sort of thing tails off, and it would make for a pretty graph...

11:45AMSam

"Cool URIs never change", but it is unreasonable to expect someone to host something forever. What would have happened if Netscape had simply gone out of business instead of becoming another company's subsidiary? Netscape's original mistake was to host the DTD at "my.netscape.com". Ideally, it should have been hosted at something like "publish-dtd.netscape.com" or some ".org" address. This would at least allow Netscape to point the relevant DNS entries to off-site mirrors of the DTD document, and volunteers could step up to the plate as commercial websites go out of business or change business models.

11:58AMBrian J King

There is no reason this file should have to be hosted there, this is hot linking of a file that doesn't even need to be loaded, it's a rapacious and silly waste of bandwidth and the final decision should be made by Netscape and Netscape alone as they are the ones that are paying for the bandwidth.

12:12PMlou

Can you provide the user agents that are making the 4 million daily hits? are they all browser based from live.com?

12:48PMEric Haszlakiewicz

Daniel, you said:
> RSS DTD's url should not change, cannot change. The resource it targets should NOT be cached.

IMO, those are two contradicatory statements. If the url will never change, and the content at the url will never change, why wouldn't you want to cache it?

No-one is saying anything about changing the url. I agree that it shouldn't change, but that isn't a reason to expect that accessing the url is actually going to do something valid. The url is just a convinient way to create a unique identifier for the information you are referring to, and once that information is common knowledge there's no reason to keep going back to the original source to get it.
For instance, if you happen to learn a new word from some dictionary you might refer back to it a few times until you memorize it, but it gets into common use you'll have it cached, and you'll be able to get it from various other places too. The word, like the url for this dtd, doesn't change, but there is no need for the original location of the definition to keep it.


12:48PMniz

"Why do they call it a _Universal_ Resource Locator?"

The U in URL stands for "Uniform".

1:00PMSean Harlow

I'm with Netscape on this one. There's no reason to keep hosting that file which will NEVER change (seeing as that RSS 1.0 came out in 2000 and 2.0 in 2002, future development is clearly being done elsewhere) nor is there a reason why RSS readers should be requesting the file so often.

That said, seeing how often it's been requested from so many diverse sources, it would have been better for Netscape to warn the community before taking it down the first time. This blog post having reached Slashdot now qualifies as more than sufficient warning for when it will finally be taken down. Like Dave above, I'd also like to see data on how the traffic to this file falls off following this announcement.

1:27PMDan Brennan

In response to Sean, I am an avid Slashdot reader, but I don't believe that "having reached Slashdot now qualifies as more than sufficient warning". While many developers read Slashdot, that is not a guarantee that all effected developers will hear the news. I love Slashdot, but simply being posted to one web resource is not nearly enough notice and warning for Netscape to remove such a valuable file from their server.

I am in complete agreement with Mark that if Netscape will not continue to hold the file, it ought to be released to an outside body (W3C is a good first thought).

3:50PMTagNe.ws - Posted in RSS News

Tagged and Posted. Vote for it if it's news.

http://www.tagne.ws/RSS/Netscape-is-going-to-break-RSS-091-feeds-on-July-1-2007/

4:52PMChristian Schmidt

4,000,000 hits/day x 8 KB/hit ~ 32 GB/day

Assuming that Netscape pays less than 1 $/GB, the bandwidth costs are hardly the reason for not wanting to host the DTD.

6:49PMMT

Then they should have kept out the front line while the web evolved. Most inventions come along with negative side effects. In their case now, that means costs for 8 GB daily traffic. Theoretically avoidable packets. But Withstanding such risks shows if a organisation can handle with the consequences of their own on-the-edge technology. Developers all over the world trust for several thousands of iconic files and artefacts to be kept up and running.

Taking that resource offline because of financial issues? Sorry, you cannot deal with your heritage.

8:35PMRich Boakes

The disappearance of this schema file is perhaps a good example of where semantic web technologies can provide a quick win.

It would be relatively simple to look for documents that are marked up as "cacheOf" or "mirrorOf" or "sameAs" or "superseeds" a particular schema (such as this one) - so you'd just need a few people and places to start cacheing schemas and the problem would become "just another step" in the validation process.

8:30AMPeter Flynn

Persistence of URIs and their resolution is only half of the story. The real problem is that we screwed it up when we made only the System Identifier compulsory in XML Document Type Declarations. By failing to make FPIs an alternative, rather than an adjunct, and by failing to make catalog resolution an inherent feature of XML, we bound ourselves to URLs (sic) for the foreseeable future, with all the attendant disadvantages.

There were lots of compelling reasons at the time for doing it the way it was done, but few of them were based in reality.

There were other mistakes: the dereliction of the ISO 9070 RPO Registry by the GCA (now IdeAlliance), the poor implementation of catalog resolution by most browser/editor authors, my own failure to persist in the project to build an FPI resolver using a suitable distributed registry model like ISO 11179, and the neglect of RSS 2 which led to the version I posted in August 2003.

None of which is going to change any time soon, but hopefully this is enough of a warning shot to developers to learn a little more about the technology before putting our products which break the model.

Next 15 Comments
Most Commented On (7 days)
Top Stories From Netscape
Subscribe
Powered by Blogsmith