Frans Bouma said:
In theory, that's true. In practise however, streaming data on the
fly is only useful if the data received to process by the provider is
free (== not slowing things down) and doesn't eat extra memory. If
these two restrictions aren't met, streaming isn't worth the effort.
Yes - I think it would have to be a very specialised database which
actually streamed the results. Perhaps a managed embedded database
which already had the data in memory (in the CLR) but which still used
SQL to access it? At that point it's closer to LINQ to Objects in
reality, but still has the veneer of a database.
The thing with linq to objects is that the data to process is already
in memory, so streaming the various datasources as if they're sequences
is useful.
Well, that depends on exactly what you mean by the first part. It's
certainly not true that the *whole* of the data to process has to be in
memory to start with. Each item of data has to be in memory to be
processed by each of the clauses, but streaming enables processing of
huge quantities of data, never holding more than one item in memory at
a time (depending on the query used).
For a datasource which provides data not currently in
memory, it is likely the set of data to process is bigger than the end
result, which means that streaming isn't useful.
On the contrary - it means that streaming is the *only* way of working
in some cases. In those cases you can't use *any* buffering operations
(such as GroupBy or OrderBy) but they can still be useful.
See
http://tinyurl.com/2qehcg (it's my blog, but the link would be
long) for an example of this.
Unless, of course you don't have a choice: if the datasources consumed
aren't supporting any logic, e.g. you have a customer service and an
order service and you want all customers with an order in may 2006: you
can only do that by streaming the data to the client and do the query
execution during the streaming. However, this isn't really an efficient
application: it might be a 'great' way to demonstrate astoria and what
not, but it absolutely sucks as system to use in production with a lot
of queries.
Just because something is streaming doesn't mean that all of the
processing is being done at the client.
Consider a database query which needs to do a lot of processing, but
will then return millions of rows - out of an initial source of
billions of rows. Depending on the database, it may be significantly
better to stream the results than to force the database to re-execute
the query and split it into batches. All the processing could still be
done at the database, but the results returned one at a time as they're
available, rather than all being fetched onto the client side in one
hit.
This would be a pretty specialized application of LINQ (and a pretty
specialized database, I suspect) but it's far from inconceivable.