OData is grease to cut data friction

Back in 2007 I talked with Pablo Castro about Astoria, which I described as a way of making data readable and writeable by means of a RESTful interface. The technology has continued to move forward, and I’m now a heavy user of one of its implementations: the Azure table store. Yesterday at PDC we announced the proposed standardization of this approach as OData, which InfoQ nicely summarizes here.

I’ll leave detailed analysis of the proposal, and the inevitable comparisons to Google’s GData, to others who are better qualified. Nowadays I’m mainly a developer building a web service, and from that perspective it’s very clear that wide adoption of something like “ODBC for the cloud” is needed. We have no shortage of APIs, all of which yield XML and/or JSON data, but you have to overcome friction to compose with these APIs.

For example, the elmcity service merges event information from sets of iCalendar feeds and also from three different sources — Eventful, Upcoming, and (recently added) Eventbrite. In each of those three cases, I’ve had to create slightly different versions of the same algorithm:

  • Query for future events
  • Retrieve the count of matching events
  • Page through the matching events
  • Map events into a common data model

Each service uses a slightly different syntax to query for future events. And each reports the count of matching events differently: page_count vs. total_results vs. resultcount. OData would normalize the queries. And because the spec says:

The count value included in the result MUST be enclosed in an <m:count>

it would also normalize the counting of results.

Open data on the web has enormous potential value, but if we have to overcome too much data friction in order to combine it and make sense of it, we will often fail to realize that value. ODBC in its era was a terrific lubricant. I’m hoping that OData, widely implemented in software, services, and mashup environments like the just-announced Dallas, will be another.

16 thoughts on “OData is grease to cut data friction

  1. Anything that usefully normalizes access to web data is a good thing.

    What I particularly like about OData is the focus on resource modeling, a still-underexploited aspect of doing things The HTTP Way.

  2. Jon,

    Yes, OData uses HTTP, URIs, XML, & AtomPub, and even returns JSON, but the spec is huge (and published only in PDF):

    [{spec:”OData MC-APDSU 1.3″, pages:219},
    {spec:”HTTP/1.1 RFC2616″, pages:176},
    {spec:”YQL Guide”, pages:116},
    {spec:”AtomPub RFC5023″, pages:53},
    {spec:”XML 1.0″, pages:31},
    {spec:”OpenSearch 1.1″, pages:21},
    {spec:”GData 2.0″, pages:8}
    ]

    Yes, OData is supported by some Microsoft products, but it’s not core. What’s the “dir” command option in Windows 7 to output an Atom feed or as JSON? Where is the HTML version of the spec so we can hyperlink to a specific section?

    Yes, OData defines m:count, but it’s only normalized for data services that implement OData. Why wouldn’t OData reuse OpenSearch’s more descriptive totalResults, startIndex, & itemsPerPage, like GData does, and get paging for free?

    Perhaps I’m skeptical after so many years of wringing web-friendly data out of and into file systems, SharePoint sites & webparts, Excel worksheets, FileSystemObject metadata, PDFs, enterprise databases, intranets, and other well-meaning but proprietary XML formats.

    For now, OData will require another set of adapters … but at least they will be XML adapters.

  3. What’s the “dir” command option in Windows 7 to output an Atom feed or as JSON?

    That’s a great idea.

    Where is the HTML version of the spec so we can hyperlink to a specific section?

    Coming, I’m told.

    Why wouldn’t OData reuse OpenSearch’s more descriptive totalResults, startIndex, & itemsPerPage, like GData does, and get paging for free?

    That’s a good question, I’ll pass it along.

    Perhaps I’m skeptical after so many years of wringing web-friendly data out of and into file systems, SharePoint sites & webparts, Excel worksheets, FileSystemObject metadata, PDFs, enterprise databases, intranets, and other well-meaning but proprietary XML formats.

    Understandable. The demos I have seen address this longstanding frustration. But as always this is one of those chicken-and-egg ecosystem deals. There have got to be a lot of producers and a lot of consumers. When I see that it’s possible to poke an individual value into a deeply nested item within SharePoint using cURL, as is now the case, I’m hopeful. But obviously a lot of other things have to come together for this to take off in the best possible way.

  4. A friend of mine once said something to the effect that if you’re nested more than three layers deep, there’s a problem with your data structure. (He said that that idea came from Linus Torvalds.)

    I thought that that was really profound, and it got me looking at data in a new way. How we represent data is everything. In the world of programming it should be our number one focus… how it is represented, how it is structured, etc.

    So beyond just communicating between all the various types of software and finding a consensus there, I believe that it is also well worthwhile seeking some holy grail of data structure as well. Maybe I’m mistaken and the two are separate and can be modularized, but if I’m right, it would pay to develop and form a lasting marriage between the two.

  5. Thanks Micah.

    A couple of points to note:

    OData has really a few public API’s available. Mainly, this is because you need to add an endpoint which outputs the OData format.

    Among the public APIs mentioned at http://www.odata.org/producer, Dallas (http://pinpoint.microsoft.com/en-US/Dallas) is notable as a growing catalog of sources, some free and others commercial.

    Sharing exposed data sources for OData is possible, not easy. I have to write about it via their website, and then, maybe, my data source will be included. If I have written a YQL data adapter I can put it on Github and it will be included in the public data tables list.

    If write about it via their website refers to http://www.odata.org/producers, then no, that’s not a registry, it’s just a non-exhaustive list of early adoptions.

    Producing your own OData service can range from easy to hard depending on how much of OData you implement. Minimally it’s just an Atom feed and a service document, as I discussed here:

    http://blog.jonudell.net/2010/02/09/producing-and-consuming-odata-feeds-an-end-to-end-example/

    For an excellent overview of how to implement OData in a modular way, see this post from Pablo Castro:

    http://blogs.msdn.com/pablo/archive/2010/01/26/implementing-only-certain-aspects-of-odata.aspx

    1. Hi Jon,

      Thanks for your kind reply! I like it that the list is growing and I took a peek at the resources, interesting.

      What I ment with ‘Sharing exposed data sources for OData is possible, not easy.’ is that I compared the way data tables or OData Services can be shared. Let’s say, I want to (re)use the OData Service you created in your blogpost (http://blog.jonudell.net/2010/02/09/producing-and-consuming-odata-feeds-an-end-to-end-example/) how would I be able to access it? I think the success of the OData platform is related to the amount of (free or paid) services that are available.

Leave a Reply