20 APIs in 20 Days: Feeds and Aggregation

Error message

The spam filter installed on this site is currently unavailable. Per site policy, we are unable to accept new submissions until that problem is resolved. Please try resubmitting the form in a couple of minutes.

This post is an entry in our 20 APIs in 20 Days series. Learn more about how best practices lead to sustainable development at www.trellon.com.

Aggregating news from RSS feeds has always been one of Drupal's great strengths. Unlike other CMS platforms, where feed aggregation is only available via third party modules, aggregation has been built into core for some time now. This allows users to pull news in from numerous web sites, automatically categorize it, and present it alongside other site content.

While the aggregator is pretty useful on it's own, various modules have been built to allow users to extend what it is able to do. The Aggregation module goes beyond basic syndication of content to provide authentication and the ability to import custom XML schemas. The FeedAPI allows feed content to be imported directly into a Drupal site and integrated with other modules like Views, CCK and Organic Groups, and syndicate other popular XML formats such as KML, iCal, and more. Together, these modules have provided important methods for extending the base aggregation functionality present in Drupal.

As with all good ideas, eventually a better one is going to come along. The Feeds module is the successor to the FeedAPI, and solves some isssues that existed with earlier modules. It also offers a pretty comprehensive API for working with newsfeeds containing very different types of data, and this article is going to focus on ways of working with it.

Benefits of the Feeds Module

Trellon uses the Feeds module to solve problems for our clients quite often, for reasons beyond just aggregating content from RSS feeds. This is not a simple xml parser, and it is important to understand the new ways Feeds can be used:

Aggregating Data from Different Sources

The Feeds module supports aggregating RSS feeds, and it can also import data from other sources such as CSV files, HTML pages and uploaded documents. This means you can set up the Feeds module as a tool for importing content you upload to the site, like dumps of data from other systems that need to make their way into Drupal.

Extending Feeds to Support New Use Cases

The Feeds module has a full API that allows you to extend it's base functionality through custom modules. You can write custom fetchers, parsers and processors without much effort. This gives developers a uniform method for solving problems where data needs to be imported.

Queuing Feeds as Jobs

One of the big problems with the FeedsAPI had to do with sites with a large number of feeds, in situations where Drupal could not efficiently process large numbers of feeds all at once (which could get sloppy). The Queue API allows calls to retrieve feeds to be stored as jobs, to be processed when Drupal is good and ready.

Defining Feeds for Portability

Feeds provides an API for allowing module developers to store Feeds as defintions and distribute them with their modules. For developers, this means that, if your module depends on having a custom Feed present in the module, you can define it in your module and ensure it will be there. This makes feeds portable between different servers.

OOP and CTools

The Feeds module is fully integrated with CTools and uses OOP conventions in the code. While this is a challenge for developers who are unused to OOP, this does provide some greater efficicency in that developers can extend classes to create custom parsers in their code.

This set of features is an important toolkit for extending Drupal. Feeds is useful for regularly importing and processing any kind of content before displaying it on your site (not just RSS feeds). Having imports scheduled as jobs in a queue adds transactional support, where start and completiton can be better monitored and manipulated. Being able to port feeds along with modules and between servers means that highly customized feeds can be distributed to multiple systems, introducing new strategies for how to consistently move data from one site to another.

Extending Feeds through Plugins

The most important advantage the Feeds module has over FeedAPI is that is it rewritten using OOP and the CTools module. This is also a very confusing point for developers, as it departs somewhat from the hooks model used to extend other parts of Drupal.

Ctools provides prototypes for various aspects of loading data from a feed. These prototypes are what you use to create plug ins, which is how you extend Feeds to allow you to import

The Feeds module has an important advantage over FeedAPI, it is written using OOP and the ctools module. That means you can extend module without much effort. Feeds support 3 types of plugins:

  • Fetcher - is responsible for getting data from various sources. FeedsHTTPFetcher and FeedsFileFetcher are included in the Feeds module. By implementing its own fetchers, Feeds can get data from various data sources. An example could be an external database fetcher. Fetcher should get only raw data.
  • Parser - this plugin type gets data from fetcher and adjust them. A parser should convert raw data from supported format to PHP structure. This means that a RSS parser gets data from a HTTP fetcher and parses XML to get structured data about posts like title, body, author and when post was published.
  • Processor - represents last part of processing stack in Feeds. It gets structured data from Parser and processes them. Processing could be any code like creating new Nodes, creating Users. Feeds also ships with a Data processor which can fill SQL tables with parsed data.

Every new plugin has to be inherited from a base class i.e. custom parser must be inherited from FeedsParser class. By inhereting plugins the Feeds module know which type of plugin module provides. This is how hook_feeds_plugin() should be defined.

function MYMODULE_feeds_plugins() {
  $info = array();
  $info['MymoduleXMLParser'] = array(
    'name' => 'Custom XML',
    'description' => 'Parses custom data from XML source.',
    'handler' => array(
      'parent' => 'FeedsParser', // A plugin needs to derive either directly or indirectly from FeedsFetcher, FeedsParser or FeedsProcessor.
      'class' => 'MymoduleXMLParser',
      'file' => 'MymoduleXMLParser.inc',
      'path' => drupal_get_path('module', 'MYMODULE'),
    ),
  );
  return $info
}

This function returns an array which tells that module has custom MymoduleXMLParser class in MymoduleXMLParser.inc file. This is a standard plugin definition known from ctools module. Custom class have to implement and override two methods for this type of plugin. Different methods have to be implemented for Fetcher and Processor plugins.

class MymoduleXMLParser extends FeedsParser {
  /**
   * Override parse method
   */
  public function parse(FeedsSource $source, FeedsFetcherResult $fetcher_result)) {
    // this function should process source and create return object
    $result = new FeedsParserResult();

    $item = array('field1' => 'value');
    $result->addItem($item);

    // return results
    return $result;
  }

  /**
   * Provides list of fields that Parser returns
   */      
  public function getMappingSources() {
    return array(
      'field1' => array(
        'name' => t('Item URL (link)'),
        'description' => t('URL of the feed item.'),
      ),
    );
  }   
}

The Feeds module has a very flexible structure of plugins, and they can be combined to aggregate different types of data from various data sources.

Drupal 7 status

Feeds is the most flexible aggregation module in Drupal and a lot sites rely on it. With the news that Drupal 7 was released to beta, more people now will be using D7 for their production sites. The good news is that last week ctools and the Feeds module were ported to D7. They are currently in alpha. We expect that Feeds will continue to rule all data aggregation tasks.