BigTable saves the Semantic Web
Last time Adam Bosworth mentioned databases , he made a few statements implying that the Semantic Web was doomed because of its complexity and Atom and RSS were going to be the way to structured data. For a moment, I thought Google had given up on the SW and RDF, but we all knew that through a couple of quotes in his talk, what he really wanted was an RDF store.
Adam Bosworth: If you build an open source stack that delivers globally available information, how do you massively distribute it and cause it to scale? Bosworth said you need to limit your queries to those that can be easily implemented by everybody and those that can be handled by a single machine. This requires that your queries run at the item level. This might feel odd to those used to dealing with databases, as this means you are not likely to perform joins, aggregations, or subqueries. There is plenty of SQL that cannot be supported….Bosworth concluded his keynote by saying the potential is that “you guys can handle hundreds of millions of queries per day and scale up and out in ways that Oracle can only dream of. You will be able to effortlessly support hard questions.”
Now in retrospect, we are starting to see that he wasn’t bluffing and the demands he was making to the database community were already a reality at Google. I stumbled upon a post by Greg Linden on talk given by Jeff Dean on BigTable.
BigTable is a system for storing and managing very large amounts of structured data. Data is organized into tables with rows and columns, but unlike a traditional database system, the row/column space can be sparse. Row keys and values are arbitrary strings, and the system allows each row/column cell to store not just a single value but a set of values with associated timestamps, simplifying analyses that examine how values have changed over time. Data in a single table is internally broken at arbitrary row boundaries to form contiguous regions of data called tablets.
This is excellent news for the Semantic Web. Google is building the RDF database we’ve been trying to build and to this date even though conceptually we are on the right track, our implementations do not scale in ways that would even match standard relational models today. Thus, making it very hard for real systems to adopt RDF as their platform today. However, all of this is going to change with BigTable, but let’s pay attention to the details in the description and a summary from Andrew Hitchcock.- Storing and managing very large amounts of structured data
- Row/column space can be sparse
- Columns are in the form of “family:optional_qualifier”. RDF Properties, Yeah!
- Columns have type information
- Because of the design of the system, columns are easy to create (and are created implicitly)
- Column families can be split into locality groups (Ontologies!)
Comments are closed
Comments are currently closed on this entry.