Content Modularity: More Than Just Data Normalization

By , December 10, 2012 8:10 pm

This post first appeared on

As discussed in my previous post, COPE (Create Once, Publish Everywhere) is a fundamental philosophy that drives NPR’s digital publishing and distribution strategy and is the foundation of the NPR API. Supporting it all is a single system that manages all incoming content and funnels it out through a single distribution pipe, regardless of content type or destination. A key principle that supports COPE is ensuring that content is stored in a modular way.

Modular storage of content is more than just database normalization. It requires strategic design of the data model to ensure that discreet objects are stored in distinct locations. To create the right design, you must truly understand your system and the assets that it stores. That is, you need to be able to identify and represent the object (or series of objects) that is at the core of your system. For NPR, the core of the system is a story. We then attach “resources” to the story, each of which is its own object in the database (examples of resources include full text with each paragraph stored as distinct records, audio, video, images, related links, and a range of other object types). Then stories get attached to lists, which are essentially a series of taxonomies that help our systems slice through the stories.

NPR Entity Diagram

The diagram above is a basic entity diagram of how NPR manages data for a story, some related resources and the list to which the stories are assigned. This is a conceptual model that represents how these entities relate to each other and does not include all resource or list entities in the system. The physical model, obviously, is much more complex. Click here for a larger and more complete view of this diagram (PDF).

NPR’s system is obviously much more complicated than this, but the breakdown of story/resources/lists is the foundation of it all. Accordingly, storage of this information in the database needs to ensure that all of these objects can be manipulated independently. With this approach, NPR is able to create a list of all images in the system, or all stories that have video, or all stories in the News topic, or any number of other combinations of stories or resources. The power of this modularity is that we have tremendous control over what gets distributed to each destination. And the distribution of content for all of these scenarios is the same simple REST-based API, requiring no special coding to generate the content for the different destinations.

NPR Sample XML Output

The above is an excerpt of XML outputted from the NPR API. Clean, effective storage of the content makes it a simpler and more flexible process to manage it differently as it gets distributed to different destinations. Click here to see an expanded view of the XML with annotation detailing how it maps to the entity diagram.

Conversely, WPT’s tend to store objects to enable the building of a web page. As a result, the content may be bundled together in database fields, storing the actual references to images, video and audio entirely within the story content text. It is still possible that the WPT’s are adhering to some form of data normalization in their storage techniques, but that does not mean that these systems are embracing COPE.

There are two significant problems with the WPT approach of data storage. First, as an example, the image references within the block of text will contain HTML and possibly other markup, making the text block dirty. Any distribution to other platforms could then require special treatment to prepare the content for that destination. More importantly, however, is the fact that these same images are very difficult to repurpose because they are embedded in text. So, it would be quite a challenge to make a feed of images, to identify only those posts that contain images, to resize some or all images in the system, or to consistently restrict distribution of images that do not have the rights cleared.

Building systems that manage the content in a modular way and separates it from display sets it up well to be distributed on a range of platforms. The final piece to the puzzle, however, is content portability. Content portability ensures that the content can actually live and thrive in all platforms to which it gets distributed (even those that do not yet exist). Building a distribution channel, like an API, is simply not enough anymore. Content portability must be applied at the CMS level, which will be the topic of my next article.

COPE: Create Once, Publish Everywhere

By , December 10, 2012 8:00 pm

This post first appeared on

The digital media world is in the process of dramatic change. For years, the Internet has been about web sites and browser-based experiences, and the systems that drove those sites generally matched those experiences. But now, the portable world is upon us and it is formidable. With the growing need and ability to be portable comes tremendous opportunity for content providers. But it also requires substantial changes to their thinking and their systems. It requires distribution platforms, API’s and other ways to get the content to where it needs to be. But having an API is not enough. In order for content providers to take full advantage of these new platforms, they will need to, first and foremost, embrace one simple philosophy: COPE (Create Once, Publish Everywhere).

NPR Architecture Diagram

The diagram above represents NPR’s content management pipeline and how it embraces these COPE principles. The basic principle is to have content producers and ingestion scripts funnel content into a single system (or series of closely tied systems). Once there, the distribution of all content can be handled identically, regardless of content type or its destinations (Click here for an enlargement of this diagram).

Through COPE, our systems have enabled incredible growth despite having a small staff and limited resources. Although the CMS is home-grown, COPE itself is agnostic as to the build or buy/integrate decision. Any system that adheres to these principles, whether it is a COTS product, home-grown, or anything in between, will see the benefits of content modularity and portability.

In this series of posts, I will be discussing these philosophies, as well as how NPR applied them and how we were able to do so much with so little (including our NPR API).

COPE is really a combination of several other closely related sub-philosophies, including:

Build content management systems (CMS), not web publishing tools (WPT)

  • Separate content from display
  • Ensure content modularity
  • Ensure content portability

These philosophies have a direct impact on API and distribution strategies as well. Creating an API on top of a COPE-less system will distribute the content, but there is still no guarantee that the content can actually live on any platform. COPE is dependent on these other philosophies to ensure that the content is truly portable.

Build CMS, not WPT
COPE is the key difference between content management systems and web publishing tools, although these terms are often used interchangeably in our industry. The goal of any CMS should be to gather enough information to present the content on any platform, in any presentation, at any time. WPT’s capture content with the primary purpose of publishing web pages. As a result, they tend to manage the content in ways focused on delivering it to the web. Plug-ins are often available for distribution to other platforms, but applying tools on top of the native functions to manipulate the content for alternate destinations makes the system inherently unscalable. That is, for each new platform, WPT’s will need a new plug-in to tailor the presentation markup to that platform. CMS’s, on the other hand, store the content cleanly, enabling the presentation layers to worry about how to display the content not on how to transform the markup embedded within it.

True CMS’s are really just content capturing tools that are completely agnostic as to how or where the content will be viewed, whether it is a web page, mobile app, TV or radio display, etc. Additionally, platforms that don’t yet exist are able to be served by a true CMS in ways that WPT’s may not be able to (even with plug-ins). By applying COPE, NPR was able to quickly jump on advancements throughout the years like RSS, Podcasts, API’s and mobile platforms with relative ease. As an example, the public API took only about two developer months to create, and most of that time was spent on user and rights management.

This presentation shows the same NPR story displayed in a wide range of platforms. The content, through the principles of COPE, is pushed out to all of these destinations through the NPR API. Each destination, meanwhile, uses the appropriate content for that presentation layer.

Separate Content from Display
Separating content from display is one of the key concepts supporting COPE. In the most basic form, this means that the presentation layer needs to be a series of templates that know how to pull in the content from the repository. This enables the presentation layer to care about how the content will look while the content can be display-agnostic, allowing it to appear on a web site, a mobile device, etc.

But to truly separate content from display, the content repository needs to also avoid storing “dirty” content. Dirty content is content that contains any presentation layer information embedded in it, including HTML, XML, character encodings, microformats, and any other markup or rich formatting information. This separation is achieved by the two other principles, content modularity and content portability

At a high level, many systems and organizations are applying the basics of COPE. They are able to distribute content to different platforms, separate content from display, etc. But to take some of these systems to the next level, enabling them to scale and adapt to our changing landscape, they will need to focus more on content modularity and portability. In my next post, I will go into more detail about NPR’s approach to content modularity and why our approach is more than just data normalization.

Skating Lessons

By , December 31, 2011 4:48 pm

Yesterday, the seven of us (me, Andrea, Maya, Adam, Jenn, Ryan and Allison) went to the ice rink for a little ice skating. Maya has never gone before, so I wasn’t sure how she would react. But Adam was a great coach and she was able to “skate” before too long (and had a great time along the way).

Maya Learning to Skate from Daniel Jacobson on Vimeo.

Meanwhile, Ryan was flying all over the ice! He was weaving in and out of people and deftly shifting gears to avoid collisions as we played chase. It was a lot of fun ad I was amazed at what Ryan could do on the skates!

Ryan Skating Laps from Daniel Jacobson on Vimeo.

Thanks for a great day out on the ice, Adam and family!

Back on the Saddle Again

By , December 23, 2011 4:29 pm

It has been quite a while since my last blog post.  That might lead someone to believe that I have completely abandoned the blog.  That wouldn’t be an unreasonable assumption.  But the reality is that all of my efforts in writing have simply been dedicated to another project (book).  I will write shortly about that project, but in the meantime, this post is meant to inform the huge audience that I plan to write at least once a week here again in 2012.


It’s a Miracle!

By , March 14, 2011 8:22 pm

After nearly three months, FedEx (well, actually an external firm commissioned by FedEx) found our missing package! Fortunately, we put some cards in the massive, 62-pound box that had our current address on them. The firm found the box, dug through its contents (presumably while holding their noses due to the rotting lemons) and sent us a note in the mail telling us that they would send us the box if we could provide appropriate verification information. Of course, they would send it to us via FedEx, which was a bit concerning…

Anyhow, the result of all of this is that the packaged arrived at our house today. Again, it is 62-pounds, so it is sitting on our front stoop right now Assuming it doesn’t get stolen, it should be interesting to dig through its contents to see what survived and what has been destroyed by rotting lemons. More on that later.

The Case of the Missing Box

By , March 5, 2011 3:05 pm
No FedEx

I have finally summoned the intestinal fortitude to write about our most recent shipping mishap. Here is the backstory…

On Christmas day, the three of us were to fly from San Francisco to Ft. Lauderdale to visit family. The expectation was that Florida would be warm. From there, we were headed to DC where forecasts were projecting massive snow storms and cold weather. We were going to be gone a total of 10 days, heading back to the bay area after New Years.

Traveling with Maya is already less than ideal, especially given that she requires a ton of extra stuff, like a stroller, toys for the plane, etc. Consequently, we did not want to bring more bags with us on the place carrying things like winter jackets, sweaters, boots, etc. Instead, we had the brilliant idea of shipping it ahead to DC so it will be there by the time we arrived. Moreover, since we were already shipping stuff, we decided to also put a bunch of other stuff in the box, including a stereo receiver, hand-me-downs for Allison, some other electronics and a big bag of fresh lemons hand-picked from the lemon trees in our yard. Sounds like a great plan, doesn’t it?

We shipped this package on Friday, Christmas Eve, with an expected arrival date of Monday. We then left for Florida without a worry. A few days later, we touched down in DC and ask my parents if they received the package (at this point, it was Tuesday). They hadn’t. A tad concerned, I called FedEx, expecting that the holidays tied it up a little. We also looked at the tracking for the package. It turns out that the package left the Redwood City FedEx location on the 24th, went to the Menlo Park location, then carried on to Oakland the same day. That was the last stop. FedEx, meanwhile, has no record of what happened next and they have no idea where the package is.

Let me take a moment to explain to you how amazing it is that they lost this package. It is not as if the box was branded and indicated that it could be some prize for someone to snag, like an Amazon package. And this was not a small package that could easily take a walk without anyone noticing or that could be hiding in some corner of the warehouse. Rather, this was a large 65-pound box that stood about 2.5 feet tall, was incredibly cumbersome to move, and looked like it had been shipped a few times already. The only thing that suggested that it was worth stealing was the fact that it was large, that it was shipped around the holidays, and that we elevated the insurance level a little. But it does seem clear to us that it went missing due to bad intent.

The only solace we have at this point is in imagining the look of that poor thief’s face when he opens his prize and sees a bunch of beat up clothes, some crappy electronics, kids apparel and a bag of lemons coating all of the other items with an everlasting sour odor.

As a side note, after doing some research, it seems as though FedEx loses almost 1% of all packages! That is astonishing to me! UPS, while better, still has a high error rate of about .5%.

(I should also report that FedEx did pay out the insurance money pretty quickly and even refunded us the amount of shipping.)

Foot Fun!

By , January 8, 2011 7:13 pm

For those of you that might remember my episode with the glass in my foot about two years ago (culminating in multiple surgical procedures to get it out), last night was the reprise…

Maya and I were playing in the living room last night. Afterwards, I started walking into the other room when the big toe on my left foot caught a rogue toothpick that was caught in the carpet. And by caught, I mean the toothpick went straight into my big toe about 1/4 inch and then snapped off inside.

Given the complications resulting from the glass episode, I decided it was better to spend three hours in Urgent Care in Palo Alto to get it removed professionally.

Again, better my foot than Maya’s. That said, better no foot than mine!

Our First Earthquake as CA Residents

By , January 8, 2011 7:49 am

We can now call ourselves California residents… Today, a
4.1 earthquake registered just southeast of San Jose
Interestingly, I did not feel the quake even though I was only a
few miles away in Los Gatos. I was in a meeting at Netflix at the
time. The topic of that conversation must have been very compelling
because almost everyone else in the building felt it.

One Small Difference Between DC and CA

By , December 12, 2010 5:10 pm

After a few weeks as California residents, we have already seen quite a few differences. I already wrote about the garbage/recycling/compost removal. Other differences include the harsh 60-degree California winter and the fact that everyone here will talk to you if you show a hint of interest in talking to them. All of these, however, were expected and well-advertised prior to moving here.

There is one difference that we didn’t expect… The number (and size) of the spiders! They are all over the place, with monster webs that stretch for yards connecting virtually any two things that don’t move much.

This nasty-looking orange spider was hanging out in a corner near our front door. It was about 2.5 inches including the legs and looked as though it could be poisonous (at least according to very very novice eyes).

I found this spider, perfectly silhouetted against the sunset, just dangling on it’s web between two trees that were about 10-feet apart. My best guess us that this one is 2 inches.

When looking at vacant houses prior to the move, I can’t tell you how many massive spiders in huge webs I almost walked into in their backyards… Eeewww!

Hello Internet!

By , December 5, 2010 8:32 am

Earlier this week, our new house finally got Internet access (thanks to our friends at AT&T). It took them three weeks to set up, which is completely insane, especially when you consider the previous tenants used their service. Three weeks to flip a switch… In their defense, AT&T did tell us it wouldn’t be set up until December 1st and a major holiday did fall within that time frame. But seriously! Anyhow, what’s done is done and we are now connected.

First order of business with the new connection is to set up our TV-based technologies. For those that don’t know this, we haven’t had cable (or comparable) for about eight years. Instead, it has been rabbit ears and supplemental media (most recently Netflix, even before the change in jobs).

The rabbit ears actually took the longest time to set up. Finding the right spot in the room took multiple attempts, each of which required a 30-minute scan of the channels to see how good (or poor) the reception would be. Once that was set, I moved on to the TiVo, which didn’t take long. Finally, I set up our new AppleTV, which was provided courtesy of my great friends at NPR who gave it to me as a departing gift. That setup was a mixed bag.

Connecting the device to the tv was a breeze (after a trip to Best Buy for yet another HDMI cable). But connecting to the wifi was surprisingly inelegant for Apple. Not a big deal, but it really is annoying to scroll around looking for letters. It reminded me of setting up my name in the arcade game Track and Field – the one with the roller ball. Anyhow, once set up, the disappointment continued. Yes, I can use the device to stream Netflix, which is great. But there is also some amazing content available there, most of which is pay on demand, similar to the iTunes model. I knew going into into it that AppleTV is set up like that, but I expected a lot more free content. And I thought it would be much easier to find the free content. All of this provides more validation that the Netflix subscription approach is right on the money…

So, I am thrilled to be connected again! But so far, my AppleTV experience is subpar. More on this later after I play with it more.

Panorama Theme by Themocracy