RDFa Extractor

I’m offering this service to help my own testing of my RDFa parser and help the development of the specification. Please give it a try and let me know if you have any test cases that replicate bugs.

RDFa Specification and RDFLib's RDFa svn workspace. My site has some RDFa now.

 

... or HTML containing RDFa:

 


Usage: If you want to use it in your semantic web application, simply point your SPARQL query to:

http://torrez.us/services/rdfa/[URL of HTML page]


Comments are closed

Comments are currently closed on this entry.
  1. Bob DuCharme 06.05.06 / 10pm
    This is great. I’d been meaning to try to write an RDFA parser and then pump some samples through it, but you’ve done the hard part, leaving the fun part for us. In preparation, I had put together an XHTML 2 document that tried to incorporate lots of examples from the primer, and it’s at http://www.snee.com/xml/rdfa/rdfa1.html. Your extractor seems to do well with it, although I haven’t had a chance to look over the output carefully.

    I will definitely be making more samples to try. Like my test document says, “Future RDF/A test documents could demonstrate more sample movie timetables, flight schedules, lists of restaurants with ratings, and so forth.” After more tests, I’ll write up something on fixing Movable Type templates to add RDFA to HTML (assuming the XHTML 1.0 stuff gets worked out), and you can do the WordPress one, and then just imagine all the triplets that will be possible.

    Bob
  2. Scotty 06.08.06 / 2pm
    This is very cool, however, it looks like the namespace prefixes aren’t coming over properly?
    It looks like it’s generating:

    xmlns:_3=’http://xmlns.com/foaf/0.1/’

    instead of:

    xmlns:foaf=’http://xmlns.com/foaf/0.1/’
  3. Elias Torres 06.08.06 / 8pm
    I’ll add something to the parser to make those prettier.
  4. AndrewJ 06.20.06 / 12am
    The script appears to be broken for any URI.
  5. Elias Torres 06.20.06 / 6am
    AndrewJ,

    I just tried it and it worked. Did you mean you clicked on the extract button or tried the script from source?
  6. AndrewJ 06.23.06 / 1am
    Hi Elias. On the default URI it works ok, but I tried it on this URI (http://torrez.us/rdfa) and it barfs with SAXParseException and a page full of python stack trace. Maybe the XHTML or RDFa isn’t sufficiently well formed? It happens on other URIs as well, e.g. http://alphajuliet.com/. I’m suspect my RDFa is not actually well formed, so I’m trying this service as a validation.

    A.
  7. Elias Torres 06.23.06 / 6am
    Yes AndrewJ you are right. My XHTML is broken on my site. I’ll fix soon. It might not just be your RDFa, but also the fact that you don’t a valid XML document too. Let me know if I can be of help.
  8. Daniel Lewis 06.25.06 / 4pm
    Hello, very nice tool.

    I’ve started working with RDF/a, and I’ve been trying to test my website with this parser, but it doesn’t seem to work. The direct URL is http://daniellewis.blackcurranthost.co.uk/vanirsystems (my other URL http://www.vanirsystems.com/ is a frame around it)

    The only website that I have seen that it works with is Bob DuCharmes: http://www.snee.com/xml/rdfa/rdfa1.html

    I’m not sure whats wrong, but feel free to use my website if you need test data :)
  9. Elias Torres 06.25.06 / 5pm
    Daniel,

    You have two xml:lang attributes at the root element. If you remove one that should fix it.
  10. Daniel Lewis 06.26.06 / 3am
    Elias,

    Thanks it works now, fixed a few other SAX problems too while I was at it. Please do feel free to use my webpage as test data though.

    Thanks again,

    Daniel.
  11. Eric van der Vlist 08.11.06 / 5pm
    Elias,

    Nice service, thanks for that!

    When extracting RDF from a page such as http://geo.dyomedea.com/xhtml/CAN_1340, I would have expected to keep the xml:lang attributes and get a document similar to http://geo.dyomedea.com/rdf/CAN_1340 (the geo:nom element has a xml:lang attribute).

    Is this restriction from RDFa or from your extractor?

    Eric
  12. Elias Torres 08.11.06 / 8pm
    Eric,

    Thanks!

    Regarding xml:lang I believe that it only applies to triples that used the @content attribute. If you don’t it’s considered a Literal of type rdfs:XMLLiteral and typed literals can’t have a language.
  13. Eric van der Vlist 08.12.06 / 2am
    Elias,

    Hmmm…

    The XHTML 2.0 latest WD seems to say that by default, the datatype should be xs:string:

    datatype = QName
    This attribute defines the datatype of the content metadata of the element. If the attribute is not specified, then the default value is string as defined by [XMLSCHEMA].


    Isn’t it the case?

    Anyway, if I used a datatype attribute with type xs:string or xs:token, would my xml:lang attribute be copied into the RDF?

    That woud be much better than having to duplicate the element content in a @content attribute…

    Thanks.

    Eric
  14. Elias Torres 08.12.06 / 9am
    Eric,

    This is all in a state of flux. I have been going mostly by the RDFa syntax but after looking at the XHTML 2.0 spec I believe it’s underspecificied. You are correct that xsd:string is the default when no datatype is specified, but that doesn’t mean it’s a plain literal. In RDF only plain literals can have language. If it’s of type xsd:string then it’s typed and it cannot have a language. However, the spec doesn’t say what to do if the content is xsd:string by default but the element contains childNodes. In RDFa syntax is XMLLiteral by default and if specified plaintext then the spec says to only get the text from the first level of children and concatenate it or something like that. I’ll bring this up to the WG in our next meeting. Thanks!

    Overall I think what you are looking for is datatype=”plaintext” which is barely mentioned in the RDFa syntax spec.
  15. Eric van der Vlist 08.13.06 / 8am
    Elias,

    I have updated these XHTML documents but your extractor doesn’t seem to take my datatype=”plaintext” into account. Can you have a look?

    And many thanks for the RDFa tips!

    Eric
  16. WP_RDFa » Blog Archive » Assistants 08.14.06 / 11am
    [...] Elias Torres is now assisting the WP_RDFa project. He has kindly provided subversion space for the project, he’ll also be very handy for RDFa parsing issues as he made the RDFa Extractor. (Also thank you to Ben Adida for helping to conceive the wprdfa project, and for Mark Birbeck for sending me an email with just “Fantastic!” typed into it) [...]
  17. Al 08.16.06 / 1pm
    I just extracted the RDF created by your example and saved it to a file and opened it in the xml editor conglomerate. I noticed that you do not have any of the xmlns:??? definition for the namespaces that are used.

    Other than that this is a nice piece of work. Thanks
  18. Elias Torres 08.16.06 / 2pm
    Are you using Firefox? You need to view source and copy that instead of the HTML displayed in the browser.
  19. Michael Hausenblas 10.20.06 / 8am
    thanks for this excellent service. just a little question: I tried it with the URI pointing to our Wiki/Sandbox (http://www.w3.org/2005/Incubator/mmsem/wiki/WikiSandBox) and get a lots of errors in the script … is my RDFa not correct or do you know of any other problems? cheers, michael
  20. Steven Pemberton 10.26.06 / 8am
    Hi Elias,

    The “+” character in my telephone numbers are being dropped. Probably just a small parsing error.

    <a rel=”foaf:phone” href=”tel:+31-20-5924138″>+31 20 592 4138</a>
  21. Jay Fienberg 02.16.07 / 10pm
    The parser is working with a live example RDFa page (vCalendar + vCard) that I just created, but not picking up child properties of the Vevents. It may be a mistake / issue on my end, but it’s a “real world” page with a bunch of RDFa, so it might work as a good test case for you too.

    (please contact me if you have any suggestions for changes to the RDFa on the page–thanks)
  22. Vaclav Synacek 02.23.07 / 7am
    Hi Elias,
    this is a really nice service, thanks.

    I decided to put some RDFa on my blog (sioc,dc). Your RDFa service does great job on posts with normal text. However the posts that contain default YouTube embedding HTML result in an error. So does any YouTube video page (I know YouTube is not RDFa enabled, but still it should not return an error, or should it?).

    The embedding code seems xml well formed, so I think this might be a bug of the Extractor.
  23. RDFa is on it´s way... « About the semantic social web 03.19.07 / 1pm
    [...] Posted by ablvienna on March 19th, 2007 Maybe a year old now, but still a really useful Web-Service is provided by Elias Torres: His RDFa Extractor also shows how to insert HTML containing RDFa into a Sparql-Query. [...]
  24. Daniel E. Renfer’s Blog » Blog Archive » Chock Full of RDFa Goodness 12.03.07 / 3pm
    [...] got a bit more data that I can still mark up properly, but for now you can grab your favorite RDFa extractor, highlighter bookmarklet, or Firefox extension and see the semantic goodness hidden just under the [...]
  25. david decraene 12.04.07 / 5am
    I may be wrong, but it doesn’t seem to handle xml namespace specifications very well, consider the following:







    ontology



    namespaces extracted are:
    xmlns:_7=’rdf:’
    xmlns:_6=’rdfs:’
    but should be:
    xmlns:_7=’http://www.w3.org/1999/02/22-rdf-syntax-ns#’
    xmlns:_6=’http://www.w3.org/2000/01/rdf-schema#’
  26. Gökçe’nin Web Güncesi (gwg) - Bloglamaya başlarken 03.29.08 / 9am
    [...] Blog zor tabii. Gene de üstünkörü bu konua insanlar ne yapmış diye baktığımda RDF Tools, RDF Extractor gibi işleri elle yapmanın yolları türetilmiş. Bu sanırım ilerde iyice içine girip tecrübe [...]
  27. Trying to understand Microdata? RDFa? | Garbage Collection 08.14.09 / 12am
    [...] some tags. Didn’t use the primer, just looked at the example content from RDFa4Google. Used Elias Torres RDFa parser to test my results and validator.w3.org for my [...]

About

  • I’m married and father of three.
  • I’m a Christian and worship at CBC.
  • I co-founded Performable.
  • I’m a java, python, javascript hacker.
  • Here’s my FOAF file (and URI).
  • I’m an amateur photographer.
  • I work on the Web.
  • I participate in Open Source software development (Roller, Abdera, RDFLib, WordPress).
  • You may contact me (email or jabber/gtalk) at .

Pages