Daniel Jacobson's Blog

Content Modularity: More Than Just Data Normalization

By daniel_jacobson, December 10, 2012 8:10 pm

This post first appeared on ProgrammableWeb.com

As discussed in my previous post, COPE (Create Once, Publish Everywhere) is a fundamental philosophy that drives NPR’s digital publishing and distribution strategy and is the foundation of the NPR API. Supporting it all is a single system that manages all incoming content and funnels it out through a single distribution pipe, regardless of content type or destination. A key principle that supports COPE is ensuring that content is stored in a modular way.

Modular storage of content is more than just database normalization. It requires strategic design of the data model to ensure that discreet objects are stored in distinct locations. To create the right design, you must truly understand your system and the assets that it stores. That is, you need to be able to identify and represent the object (or series of objects) that is at the core of your system. For NPR, the core of the system is a story. We then attach “resources” to the story, each of which is its own object in the database (examples of resources include full text with each paragraph stored as distinct records, audio, video, images, related links, and a range of other object types). Then stories get attached to lists, which are essentially a series of taxonomies that help our systems slice through the stories.

The diagram above is a basic entity diagram of how NPR manages data for a story, some related resources and the list to which the stories are assigned. This is a conceptual model that represents how these entities relate to each other and does not include all resource or list entities in the system. The physical model, obviously, is much more complex. Click here for a larger and more complete view of this diagram (PDF).

NPR’s system is obviously much more complicated than this, but the breakdown of story/resources/lists is the foundation of it all. Accordingly, storage of this information in the database needs to ensure that all of these objects can be manipulated independently. With this approach, NPR is able to create a list of all images in the system, or all stories that have video, or all stories in the News topic, or any number of other combinations of stories or resources. The power of this modularity is that we have tremendous control over what gets distributed to each destination. And the distribution of content for all of these scenarios is the same simple REST-based API, requiring no special coding to generate the content for the different destinations.

The above is an excerpt of XML outputted from the NPR API. Clean, effective storage of the content makes it a simpler and more flexible process to manage it differently as it gets distributed to different destinations. Click here to see an expanded view of the XML with annotation detailing how it maps to the entity diagram.

Conversely, WPT’s tend to store objects to enable the building of a web page. As a result, the content may be bundled together in database fields, storing the actual references to images, video and audio entirely within the story content text. It is still possible that the WPT’s are adhering to some form of data normalization in their storage techniques, but that does not mean that these systems are embracing COPE.

There are two significant problems with the WPT approach of data storage. First, as an example, the image references within the block of text will contain HTML and possibly other markup, making the text block dirty. Any distribution to other platforms could then require special treatment to prepare the content for that destination. More importantly, however, is the fact that these same images are very difficult to repurpose because they are embedded in text. So, it would be quite a challenge to make a feed of images, to identify only those posts that contain images, to resize some or all images in the system, or to consistently restrict distribution of images that do not have the rights cleared.

Building systems that manage the content in a modular way and separates it from display sets it up well to be distributed on a range of platforms. The final piece to the puzzle, however, is content portability. Content portability ensures that the content can actually live and thrive in all platforms to which it gets distributed (even those that do not yet exist). Building a distribution channel, like an API, is simply not enough anymore. Content portability must be applied at the CMS level, which will be the topic of my next article.

APIs, Content Management, COPE, Engineering, NPR, Technology | API, CMS, COPE, Modularity, NPR

Comments are closed

Categories
- Netflix (32)
- NPR (22)
- Org Culture (2)
- Personal (2)
- Public Appearance (40)
  - Article (13)
  - Interview (3)
  - Keynote (1)
  - Presentation (24)
- Technology (50)
  - APIs (49)
  - Content Management (16)
  - COPE (14)
  - Engineering (29)
  - Mobile (2)
Recent Posts
Archive of Posts
- March 2016 (1)
- November 2015 (1)
- July 2015 (1)
- February 2015 (1)
- May 2014 (2)
- April 2014 (1)
- March 2014 (1)
- January 2014 (1)
- October 2013 (2)
- September 2013 (1)
- July 2013 (1)
- June 2013 (1)
- May 2013 (1)
- April 2013 (1)
- March 2013 (1)
- February 2013 (4)
- December 2012 (5)
- July 2012 (2)
- May 2012 (1)
- April 2012 (1)
- February 2012 (2)
- January 2012 (1)
- October 2011 (2)
- July 2011 (1)
- May 2011 (1)
- April 2011 (1)
- February 2011 (1)
- October 2010 (1)
- September 2010 (3)
- July 2010 (1)
- June 2010 (1)
- April 2010 (2)
- October 2009 (1)
- September 2009 (2)
- June 2009 (1)
- July 2008 (1)
Posts by Day
June 2026

S M T W T F S

1 2 3 4 5 6

7 8 9 10 11 12 13

14 15 16 17 18 19 20

21 22 23 24 25 26 27

28 29 30

« Mar

Random thoughts on random topics...

Content Modularity: More Than Just Data Normalization

Categories

Recent Posts

Archive of Posts

Posts by Day