Sign in to PrePAN

Web::Microformats2 Libraries for parsing Microformats2 metadata from HTML or JSON



 use Web::Microformats2;

 my $mf2_parser = Web::Microformats2::Parser->new;
 my $mf2_doc    = $mf2_parser->parse( $string_full_of_tasty_html );

 for my $item ( $mf2_doc->all_top_level_items ) {
    # Each $item is a Web::Microformats2::Item object.
    my $types_ref = $item->types;
    print "I see an MF2 item with these types set: @$types_ref\n";

    my $name = $item->get_property( 'name' );
    print "The value of the item's 'name' property is: '$name'\n";

 my $serialized_mf2_doc = $mf2_doc->as_json;

 my $other_mf2_doc = Web::Microformats2::Document->new_from_json(


From the repository's own README:

The Web::Microformats2 modules provide Perl programs with a way to parse and analyze HTML documents containing Microformats2 metadata. They can pull Microformats2 information from a given HTML document, representing it as a queryable in-memory object. They can also serialize this object as JSON (using the Microformats2 rules for this), or read an already JSON-serialized Microformats2 structure for further analysis.

Why a new module, when CPAN already has three Microformats modules?

CPAN already has HTML::Microformats, Text::Microformat, and Data::Microformat. However, putting aside the fact that the most recent of these was last updated five years ago, they all deal with the original iteration of the Microformats standard. The module I propose, Web::Microformats2, specifically and exclusively addresses Microformats2, a wholly new specification.

Microformats2 is related to its similarly named predecessor in general intent, but its design philosophy and implementation are quite different. As such, software that parses Microformats2 metadata will necessarily be completely separate from that which parses Microformats(1).

Yes, this is a little confusing. I wish it were less so. But that's why my proposed name for this module is "Microformats2": it's simply and literally what the underlying standard calls itself, for good or ill.

Why "Web" when it deals primarily with HTML and JSON input/output?

My motivation for creating these modules is the IndieWeb movement, which uses Microformats2 as a common standard throughout its many proposals and specifications. I wish to implement certain IndieWeb standards in Perl, and that requires Perl's ability to parse Microformats2 metadata, both as found in HTML documents and as pre-processed JSON.

By filing this modules under the "Web" top-level namespace, I hope to signal their usefulness to the ideal of the open web, specially that as espoused by the IndieWeb movement. In this sense, the specific technologies involved (including HTML and JSON) are less important than the philosophies that hope to bring about a more open web, overall.

And, yes, more practically: were I to name this "HTML::Microformats2", I'm afraid that would imply it to be version 2 of Toby Inkster's HTML::Microformats, which it is not. It serves a wholly distinct purpose from those older modules, and I think it ought to have an appropriately distinct name within the CPAN.


Please sign up to post a review.