Thursday, June 30, 2005

JavaOne: The case for data services - bringing order to SOA

Notes from JavaOne 2005...

Michael Carey, BEA Systems

Goal: Why a data services layer provides a critical foundation for SOA and find out how to effectively model and create one.

How it used to be: Databases designed for applications. E-R business object analysis. Map E-R to RDBMS. Multiple levels of schemas - logical, physical, views. Databases were relational. Small yet powerful core concepts - simple logical data model. Declarative queries and updates. Data independence - major productivity gains. Led to a large established software market. Relational dDBMS goodies included query optimization and efficient execution, views for data independence, access control. App development was easy with manageable architectures and APIs.

Now: Data is everywhere. The world turns out to be decentralized. Databases come in many flavors. Not all data is SQL-accessible. Packaged apps like SAP, PeopleSoft, Siebel, etc. Custom home-grown apps, files of various shapes and sizes, etc.

Painful to develop applications - no one single view of X for any X - what data do I have about X? HOw do I stitch together the info I need? What else is X related to? No uniformity (model or language) - data about X is stored in many different formats. Accessing X requires learning/using many different APIS. Manual coding of distributed query plans across formats, APIs. No reuse of artifacts - different acess criteria/returned data -> different access plans. Even if they were reusable, how would anyone know?

Data-centric apps: Key pain points are disparate data source formats and APIs, relevant info hard to find and manage. Modeling still matters - business entities are still central, and relate to other business entities.

Web Services: Think XML RPC (but not just RPC). Provide some normalization, consume and produce XML documents, described using WSDL. Talk via messaging and SOAP.

SOA: Loosely coupled interfaces, each subsystem is a component with a service API. Tighter management and operational controls. Performance, security management. (ESB trend). Closer to dealing with heterogeneity. Services all have XML web service foundations. Hide custom logic. But what about the data? What are the business entities? How do they relate? How can I find them?

Data service model requirements: Some data sources are data-oriented (queryable like SQL databases, files of various sorts). Other data sources are service-oriented - not queryable, only specific lookup APIs are provided. Multiple APIs with related base semantics. Data canonicalization is required. Transformation are often needed in both directions. Related to presentation, matching, and searching. Need a modeling approach that handles all this.

Defining data services: a data service is a service, a collection of related web service calls about a given business entity. A business entity has a "shape" that describes the data it contains (could be an XML schema). A data service has a set of read methods. Each read method returns one or more instances of the business entity's shape. A data service also has a set of write methods- public change methods. Each write method takes one or more instances of the business entity as input.

Building a data service: Developer must build read, navigation, and write methods. Approaches could include basic hand-coding in Java or using EAI. Hand-coding with XML/XQuery assistance. Declarative coding using XQuery. Other considerations - optimizing performance, service reuse and maintenance, ad-hoc query support.

XQuery engines: Galax, Saxon

XQuery - why not just use / extend SQL for XML? Flat tables vs hierarchical XML. Uniform tables vs. ragged or schema-less XML. Unordered rows vs ordered XML content. Relational data vs. mixed XML content. XQuery is to XML as SQL is to tables. W3C XML Query WG. Recommendation expected in early 2006.

Declarative data services: XML, XML schema, web services, XQuery.

Data service layering: Integration logic encapsulated in main read method of single view of customer. Additional views layered on top, complexity similar to other read methods.

Designing data services for reuse: integrate once and reuse many times. Have one private "get all instances" function to encapsulate the integration/transformation logic. Other public read functions can then all be expressed in terms of the main "get all instances" function. Avoid structural biases. Use separate data services with relationships rather than nesting. Use relationship functions to encapsulate matching logic. Keeps data services and their queries "small" and thus manageable, maintainable, reusable. Keep the presentation layer query-free.

So far mainly talked about read services declaratively specified using XQuery. But need write services as well. Could hand-code updates, automation feasible through lineage analysis of read services. Full automation often achievable for SQL sources. Update overrides needed for web services and other sources. Programming model for writes? Disconnected model highly desirable. Want flexible optimistic concurrency options. SDO (from IBM & BEA) does this.

No comments: