OK, I freely admit I’m fascinated by offbeat database engines. Maybe there is a support group for this. In any event, the highlight of my brief visit to the DAMA International Symposium and Wilshire Meta-Data Conference conference last month was a presentation by Joe Foley of illuminate Solutions , which marked the U.S. launch of his company’s iLuminate analytical database.
(Yes, the company name is “illuminate” and the product is “iLuminate”. And if you look inside the HTML tag, you’ll see the Internet domain is “i-lluminate.com”. Marketing genius or marketing madness? You decide.)
illuminate calls iLuminate a “correlation database”, a term they apparently invented to distinguish it from everything else. It does appear to be unique, although somewhat similar to other products I’ve seen over the years: Digital Archaeology (now deceased), Omnidex and even QlikView come to mind. Like iLuminate, these systems store links among data values rather than conventional database tables or columnar structures. iLuminate is the purest expression I’ve seen of this approach: it literally stores each value once, regardless of how many tables, columns or rows contain it in the source data. Indexes and algorithms capture the context of each original occurrence so the system can recreate the original records when necessary.
The company is rather vague on the details, perhaps on purpose. They do state that each value is tied to a conventional b-tree index that makes it easily and quickly accessible. What I imagine—and let me stress I may have this wrong—is that each value is then itself tied to a hierarchical list of the tables, columns and rows in which it appears. There would be a count associated with each level, so a query that asked how many times a value appears in each table would simply look at the pre-calculated value counts; a query of how many times the value appeared in a particular column could look down one level to get the column counts. A query that needed to know about individual rows would retrieve the row numbers. A query that involved multiple values would retrieve multiple sets of row numbers and compare them: so, say, a query looking for state = “NY” and eye color = “blue” might find that “NY” appears in the state column for records 3, 21 and 42, while “blue” appears in the eye color for records 5, 21 and 56. It would then return row=21 as the only intersection of the two sets. Another set of indexes would make it simple to retrieve the other components of row 21.
Whether or not that’s what actually happens under the hood, this does illustrate the main advantages of iLuminate. Specifically, it can import data in any structure and access it without formal database design; it can identify instances of the same value across different tables or columns; it can provide instant table and column level value distributions; and it can execute incremental queries against a previously selected set of records. The company also claims high speed and terabyte scalability, although some qualification is needed: initial results from a query appear very quickly, but calculations involving a large result set must wait for the system to assemble and process the full set of data. Foley also said that although the system has been tested with a terabyte of data, the largest production implementation is a much less impressive 50 gigabytes. Still, the largest production row count is 200 million rows – nothing to sneeze at.
The system avoids some of the pitfalls that sometimes trap analytical databases: load times are comparable to load times for comparable relational databases (once you include time for indexing, etc.); total storage (including the indexes, which take up most of the space) is about the same as relational source data; and users can write queries in standard SQL via ODBC. This final point is particularly important, because many past analytical systems were not SQL-compatible. This deterred many potential buyers. The new crop of analytical database vendors has learned this lesson: nearly all of the new analytical systems are SQL-accessible. Just to be clear, iLuminate is not an in-memory database, although it will make intelligent use of what memory is available, often loading the data values and b-tree indexes into memory when possible.
Still, at least from my perspective, the most important feature of iLuminate is its ability to work with any structure of input data—including structures that SQL would handle poorly or not at all. This is where users gain huge time savings, because they need not predict the queries they will write and then design a data structure to support them. In this regard, the system is even more flexible than QlikView, which it in many ways resembles: while QlikView links tables with fixed keys during the data load, iLuminate does not. Instead, like a regular SQL system, iLuminate can apply different relationships to one set of data by defining the relationships within different queries. (On the other hand, QlikView’s powerful scripting language allows much more data manipulation during the load process.)
Part of the reason I mention QlikView is that iLuminate itself uses QlikView as a front-end tool under the label of iAnalyze. This extracts data from iLuminate using ODBC and then loads it into QlikView. Naturally, the data structure at that point must include fixed relationships. In addition to QlikView, iAnalyze also includes integrated mapping. A separate illuminate product, called iCorrelated, allows ad hoc, incremental queries directly against iLuminate and takes full advantage of its special capabilities.
illuminate, which is based in Spain, has been in business for nearly three years. It has more than 40 iLuminate installations, mostly in Europe. Pricing is based on several factors but the entry cost is very reasonable: somewhere in the $80,000 to $100,000 range, including iAnalyze. As part of its U.S. launch, the company is offering no-cost proof of concept projects to qualified customers.
Wednesday, 2 April 2008
illuminate Systems' iLuminate May Be the Most Flexible Analytical Database Ever
Posted on 08:07 by Unknown
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment