Thursday, June 30, 2005

JavaOne: Beyond Blogging - Feed syndication and publishing with Java

Notes from JavaOne 2005...

Kevin Burton, Rojo Networks
Patrick Chanezon, Dave Johnson, Sun Microsystems

RSS seems to evolve towards REST web services.

What is syndication? Have a client scan what's new on a server, return simple easy to parse XML format.

Syndication engines: Meerkat, PlanetPlanet, Technorati, Feedster, PubSub, Feedburner

Use cases: blogging and news, monitoring wiki changes, search results, file distribution (e.g., podcasting), sharing photos (flickr), sharing bookmarks (del.icio.us), monitoring auctions, weather, stocks, etc...

Monitoring bug reports (JIRA), source code checkins, software build results, etc.

What is a feed? XML representation of time-stamped data. Available on the web at a fixed URL. Frequently updated.

Shortcomings: Users need one-click subscribe - many proposals, no clear consensus. Most users can't create well-formed content (estimate 10% of feeds are not well-formed). Feeds can be "lossy" if you don't poll frequently enough - RFC3229 (FeedDiff) can be used to address this. Based on polling, pain can be eased with HTTP conditional get and caching.

Atom publishing format is on tract to become an IETF standard, incorporates lessons learned from RSS. Comprehensive and rigorous specification, also basis for Atom Publishing Protocol.

How to parse a feed. Use your favorite XML parsing technique like DOM, SAX, pull-parser, JAAXB, XMLBeans, Castor-XML, etc. (But not recommended as feed quality can be spotty.) Use a parser library (Java options include ROME and Feed Parser)

Be polite in fetching feeds - use HTTP if-modified-since header, etags, compression.

Dealing with malformed feeds - Use a liberal parser such as Feed Parser or Universal Feed Parser.

Serving feeds - generate the XML using ROME, XML-DOM serialization, template languages like Velocity, JSP. Use of libraries is highly recommended. Serve it via web server or app server. Use correct MIME type. Support conditional HTTP GETs. Cache, cache cache!

Publishing protocols: Old - based on XML-RPC. Simple ad-hoc protocols like Blogger API, MetaWeblog API, Movable Type API, WikiRPCInterface. New - the REST-based Atom Publishing Protocol.

Why not SOAP? Most of the older protocols came before SOAP, and newer ones are coming out of frustration with complexity of SOAP.

REST = Representational State Transfer. Uses HTTP GET, POST, PUT, DELETE verbs.

What is ROME? A Java library. RSS/ATOM feeds parsers and generators. Homogenous Java representation of feeds. Built on top of Java collections and J-DOMs.

ROME pros: Simple to use, well documented. Single Java representation of feeds. Lean. Fully pluggable. Widely used, has momentum. It's Java. Cons: Loss of information at SyndFeed level. DOM overhead.

ROME subprojects - Fetcher, Aqueduct, Modules.

Who's using it? Sun, SnipSnap, Roller, ScheduleWorld, FeedPod, Parss (Antville), Public Interactive, Blog-City, Reger.com...

Feedparser library goals: High performance operation and with thin API requirements. Support all RSS versions including Atom (and other formats like FOAF, OPML, etc.) Event-based, not DOM based. Use only Java primitives, no objects. Open source, donated to Apache. Used by rojo.com.

Feedparser 0.5 release is pending. Millions of feeds, high performance, rock solid, in Apache commons. HTTP implementations support etags, if-modified-since, compression, status code return, and fixes a number of design issues with java.net.URL. Also supports Autodiscovery API for RSS, Atom, and FOAF. Fixes bugs in common site implementations (LiveJournal, Blogger, TypePad, etc.) Liberal feed support corrects issues with RSS feeds and XML feeds in general - character set issues, encoding issues, namespaces issues.

Roller & BlogClient.
http://www.rollerweblogger.org

No comments:

www.flickr.com