I spoke earlier this week with SQLStream, which offers software to execute queries against data streams such as stock market prices, Web logs and credit card transactions. These queries can include on-the-fly calculations such as moving averages, as well as scans for patterns like a sequence of failed log-in attempts. Typical applications include security monitoring, fraud detection, and general business activity monitoring. Marketers can use the queries to identify new leads and select cross-sell and upsell offers. Although the connection is a little less obvious, the system can also be used as an alternative to conventional batch data preparation methods for tasks like customer data integration.
SQLStream’s particular claim to fame is that its queries are almost identical to garden-variety SQL. Other vendors in this space apparently use more proprietary approaches. I say “apparently” because I haven’t researched the competition in any depth. A quick bit of poking around was enough to scare me off: there are many vendors in the space and it is a highly technical topic. It turns out that stream processing is one type of “complex event processing,” a field which has attracted some very smart but contentious experts. To see what I mean, check out Event Processing Thinking (Opher Etzion) and Cyberstrategics Complex Event Processing Blog (Tim Bass). This is clearly not a group to mess with.
That said, SQLStream’s more or less direct competitors seem to include: Coral8, Truviso, Progress Apama, Oracle BAM, TIBCO BusinessEvents, KX Systems, StreamBase and Aleri . For a basic introduction to data stream processing, see this presentation from Truvisio.
Back to SQLStream. As I said, it lets users write what are essentially standard SQL queries that are directed against a data stream rather than a static table. The data stream can be any JDBC-accessible data source, which includes most types of databases and file structures. The system can also accept streams of XML data over HTTP, which includes RSS feeds, Twitter posts and other Web sources. Its queries can also incorporate conventional (non-streaming) relational database tables, which is very useful when you need to compare streamed inputs against more or less static reference information. For example, you might want to check current activity against a customer’s six-month average bank balance or transaction rate.
The advantages of using SQL queries are that there are lots of SQL programmers out there and that SQL is relatively easy to write and understand. The disadvantage (in my opinion; not surprisingly, SQLStream didn’t mention this) is that SQL is really bad at certain kinds of queries, such as queries comparing subsets within the query universe and queries based on record sequence. Lack of sequencing may sound like a pretty big drawback for a stream processing system, but SQLStream compensates by letting queries specify a time “window” of records to analyze. This makes queries such as “more than three transactions in the past minute” quite simple. (The notion of “windows” is common among stream processing systems.) To handle subsets within queries, SQLStream mimics a common SQL technique of converting one complex query into a sequence of simple queries. In SQLStream terms, this means the output of one query can be a stream that is read by another query. These streams can be cascaded indefinitely in what SQLStream calls a “data flow architecture”. Queries can also call external services, such as address verification, and incorporate the results. Query results can be posted as records to a regular database table.
SQLStream does its actual processing by holding all the necessary data in memory. It automatically examines all active queries to determine how long data must be retained: thus, if three different queries need a data element for one, two and three minutes, the system will keep that data in memory for three minutes. SQLStream can run on 64-bit servers, allowing effectively unlimited memory, at least in theory. In practice, it is bound by the physical memory available: if the stream feeds more data than the server can hold, some data will be lost. The vendor is working on strategies to solve this problem, probably by retaining the overflow data and processing it later. For now, the company simply recommends that users make sure they have plenty of extra memory available.
In addition to memory, system throughput depends on processing power. SQLStream currently runs on multi-core, single-server systems and is moving towards multi-node parallel processing. Existing systems process tens of thousands of records per second. By itself, this isn't a terribly meaningful figure, since capacity also depends on record size, query complexity, and data retention windows. In any case, the vendor is aiming to support one million records per second.
SQLStream was founded in 2002 and owns some basic stream processing patents. The product itself was launched only in 2008 and currently has about a dozen customers. Since the company is still seeking to establish itself, pricing is, in their words, “very aggressive”.
If you’re still reading this, you probably have a pretty specific reason for being interested in SQLStream or stream processing in general. But just in case you’re wondering “Why the heck is he writing about this in a marketing blog?” there are actually several reasons. The most obvious is that “real time analytics” and “real time interaction management” are increasingly prominent topics among marketers. Real time analytics provides insights into customer behaviors at either a group level (e.g., trends in keyword response) or for an individual (e.g., estimated lifetime value). Real time interaction management goes beyond insight to recommend individual treatments as the interaction takes place (e.g., which offer to make during a phone call). Both require the type of quick reaction to new data that stream processing can provide.
There is also increasing interest in behavior detection, sometimes called event driven marketing. This monitors customer behaviors for opportunities to initiate an interaction. The concept is not widely adopted, even thought it has proven successful again and again. (For example, Mark Holtom of Eventricity recently shared some very solid research that found event-based contacts were twice as productive as any other type. Unfortunately the details are confidential, but if you contact Mark via Eventriicty perhaps he can elaborate.) I don’t think lack of stream processing technology is the real obstacle to event-based marketing, but perhaps greater awareness of stream processing would stir up interest in behavior detection in general.
Finally, stream processing is important because so much attention has recently been focused on analytical databases that use special storage techniques such as columnar or in-memory structures. These require processing to put the data into the proper format. Some offer incremental updates, but in general the updates run as batch processes and the systems are not tuned for real-time or near-real-time reactions. So it’s worth considering stream processing systems as a complement to that lets companies employ these other technologies without giving up quick response to new data.
I suppose there's one more reason: I think this stuff is really neat. Am I allowed to say that?
SQLStream’s particular claim to fame is that its queries are almost identical to garden-variety SQL. Other vendors in this space apparently use more proprietary approaches. I say “apparently” because I haven’t researched the competition in any depth. A quick bit of poking around was enough to scare me off: there are many vendors in the space and it is a highly technical topic. It turns out that stream processing is one type of “complex event processing,” a field which has attracted some very smart but contentious experts. To see what I mean, check out Event Processing Thinking (Opher Etzion) and Cyberstrategics Complex Event Processing Blog (Tim Bass). This is clearly not a group to mess with.
That said, SQLStream’s more or less direct competitors seem to include: Coral8, Truviso, Progress Apama, Oracle BAM, TIBCO BusinessEvents, KX Systems, StreamBase and Aleri . For a basic introduction to data stream processing, see this presentation from Truvisio.
Back to SQLStream. As I said, it lets users write what are essentially standard SQL queries that are directed against a data stream rather than a static table. The data stream can be any JDBC-accessible data source, which includes most types of databases and file structures. The system can also accept streams of XML data over HTTP, which includes RSS feeds, Twitter posts and other Web sources. Its queries can also incorporate conventional (non-streaming) relational database tables, which is very useful when you need to compare streamed inputs against more or less static reference information. For example, you might want to check current activity against a customer’s six-month average bank balance or transaction rate.
The advantages of using SQL queries are that there are lots of SQL programmers out there and that SQL is relatively easy to write and understand. The disadvantage (in my opinion; not surprisingly, SQLStream didn’t mention this) is that SQL is really bad at certain kinds of queries, such as queries comparing subsets within the query universe and queries based on record sequence. Lack of sequencing may sound like a pretty big drawback for a stream processing system, but SQLStream compensates by letting queries specify a time “window” of records to analyze. This makes queries such as “more than three transactions in the past minute” quite simple. (The notion of “windows” is common among stream processing systems.) To handle subsets within queries, SQLStream mimics a common SQL technique of converting one complex query into a sequence of simple queries. In SQLStream terms, this means the output of one query can be a stream that is read by another query. These streams can be cascaded indefinitely in what SQLStream calls a “data flow architecture”. Queries can also call external services, such as address verification, and incorporate the results. Query results can be posted as records to a regular database table.
SQLStream does its actual processing by holding all the necessary data in memory. It automatically examines all active queries to determine how long data must be retained: thus, if three different queries need a data element for one, two and three minutes, the system will keep that data in memory for three minutes. SQLStream can run on 64-bit servers, allowing effectively unlimited memory, at least in theory. In practice, it is bound by the physical memory available: if the stream feeds more data than the server can hold, some data will be lost. The vendor is working on strategies to solve this problem, probably by retaining the overflow data and processing it later. For now, the company simply recommends that users make sure they have plenty of extra memory available.
In addition to memory, system throughput depends on processing power. SQLStream currently runs on multi-core, single-server systems and is moving towards multi-node parallel processing. Existing systems process tens of thousands of records per second. By itself, this isn't a terribly meaningful figure, since capacity also depends on record size, query complexity, and data retention windows. In any case, the vendor is aiming to support one million records per second.
SQLStream was founded in 2002 and owns some basic stream processing patents. The product itself was launched only in 2008 and currently has about a dozen customers. Since the company is still seeking to establish itself, pricing is, in their words, “very aggressive”.
If you’re still reading this, you probably have a pretty specific reason for being interested in SQLStream or stream processing in general. But just in case you’re wondering “Why the heck is he writing about this in a marketing blog?” there are actually several reasons. The most obvious is that “real time analytics” and “real time interaction management” are increasingly prominent topics among marketers. Real time analytics provides insights into customer behaviors at either a group level (e.g., trends in keyword response) or for an individual (e.g., estimated lifetime value). Real time interaction management goes beyond insight to recommend individual treatments as the interaction takes place (e.g., which offer to make during a phone call). Both require the type of quick reaction to new data that stream processing can provide.
There is also increasing interest in behavior detection, sometimes called event driven marketing. This monitors customer behaviors for opportunities to initiate an interaction. The concept is not widely adopted, even thought it has proven successful again and again. (For example, Mark Holtom of Eventricity recently shared some very solid research that found event-based contacts were twice as productive as any other type. Unfortunately the details are confidential, but if you contact Mark via Eventriicty perhaps he can elaborate.) I don’t think lack of stream processing technology is the real obstacle to event-based marketing, but perhaps greater awareness of stream processing would stir up interest in behavior detection in general.
Finally, stream processing is important because so much attention has recently been focused on analytical databases that use special storage techniques such as columnar or in-memory structures. These require processing to put the data into the proper format. Some offer incremental updates, but in general the updates run as batch processes and the systems are not tuned for real-time or near-real-time reactions. So it’s worth considering stream processing systems as a complement to that lets companies employ these other technologies without giving up quick response to new data.
I suppose there's one more reason: I think this stuff is really neat. Am I allowed to say that?