LINQ - Removing elements in source list prevents enumeration of "c

  • Thread starter Thread starter Todd Beaulieu
  • Start date Start date
T

Todd Beaulieu

Hello, I have a list of elements and I want to perform an operation on a
subset of them and then remove them from the list.

If I build a sequence of elements via LINQ to drive the enumeration, as soon
as I remove the first element the enumeration operation is broken.

This surprises me. I thought the LINQ'd sequence would be its own
collection, impervious to changes in the original source collection.
 
Todd Beaulieu said:
Hello, I have a list of elements and I want to perform an operation on a
subset of them and then remove them from the list.
If I build a sequence of elements via LINQ to drive the enumeration, as soon
as I remove the first element the enumeration operation is broken.

This surprises me. I thought the LINQ'd sequence would be its own
collection, impervious to changes in the original source collection.

No - LINQ streams the data (when it can), so it will still be asking
the original data source for data as it goes.
 
Jon, do you need a place to live? We'll provide food, shelter and beer, in
exchange for a brain tap.

Thank you.
 
Nice shameless plug. ;-)

Looked at the link. So nice to see someone actually spending a chapter on
Lambda Expressions and another on extension methods. I looked at the sample
chapter and I like the way it flows. Don' necessarily need another book on
C#, but I will definitely put it on my list (I figure if I learn one new
thing, it is well worth the price of admission ;->).

--
Gregory A. Beamer
MVP, MCP: +I, SE, SD, DBA

*************************************************
| Think outside the box!
|
*************************************************
 
Cowboy (Gregory A. Beamer) said:
Nice shameless plug. ;-)

I'm trying to at least restrict them to responses to other people's
comments :)
Looked at the link. So nice to see someone actually spending a chapter on
Lambda Expressions and another on extension methods. I looked at the sample
chapter and I like the way it flows. Don' necessarily need another book on
C#, but I will definitely put it on my list (I figure if I learn one new
thing, it is well worth the price of admission ;->).

Well, you'd learn the names and ages of my sons and wife, along with
the names of my eldest son's friends. :)

In terms of C#, there's certainly plenty of stuff in the book that *I*
didn't know before I started. A few examples of things you may not know
(pretty obscure, admittedly):

1) Under C# 2, what's the result of compiling this code?

class Test
{
static void Main()
{
int i=10;
if (i==null)
{
System.Console.WriteLine("Eh?");
}
}
}

2) How does the following code behave under C# 1? And under C# 2?

delegate void StringAction(string x);

class Base
{
internal void Foo(string x)
{
System.Console.WriteLine("Base.Foo");
}
}

class Derived : Base
{
void Foo(object x)
{
System.Console.WriteLine("Derived.Foo");
}

static void Main()
{
StringAction action = new StringAction
(new Derived().Foo);
action("Ooh");
}
}

3) How many types are in the IL for the following C# 3 program?

class Test
{
static void Main()
{
var t1 = new { Name="Jon", Age=10 };
var t2 = new { Name="James", Age=10.5 };
var t3 = new { Name=10, Age="Weird" };
}
}



None of these is likely to be particularly helpful in real code - but
it gives some idea of the level of detail I go into :) I suspect there
are *some* useful things in there which you aren't aware of - I just
wouldn't like to guess which!
 
Jon said:
No - LINQ streams the data (when it can), so it will still be asking
the original data source for data as it goes.

Only outside a db. Any linq-access to a db isn't streamed but batched
(i.e.: fetch everything to the client)

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 
Frans Bouma said:
Only outside a db. Any linq-access to a db isn't streamed but batched
(i.e.: fetch everything to the client)

Well, I suppose to be strictly accurate, it entirely depends on the
provider.

I suspect you're right that most DB providers will batch rather than
stream. I suspect many other providers will work in a similar way,
actually - web service providers, for instance.

I should have been clear that I was talking about LINQ to Objects,
mostly because that's what it sounded like Todd was talking about :)

It's interesting to note that the streaming/buffered distinction is
provider-specific, but whether execution is immediate or deferred
should almost always be consistent across LINQ providers, I believe.
 
Jon said:
Well, I suppose to be strictly accurate, it entirely depends on the
provider.

I suspect you're right that most DB providers will batch rather than
stream. I suspect many other providers will work in a similar way,
actually - web service providers, for instance.

I should have been clear that I was talking about LINQ to Objects,
mostly because that's what it sounded like Todd was talking about :)

It's interesting to note that the streaming/buffered distinction is
provider-specific, but whether execution is immediate or deferred
should almost always be consistent across LINQ providers, I believe.

In theory, that's true. In practise however, streaming data on the
fly is only useful if the data received to process by the provider is
free (== not slowing things down) and doesn't eat extra memory. If
these two restrictions aren't met, streaming isn't worth the effort.
The thing with linq to objects is that the data to process is already
in memory, so streaming the various datasources as if they're sequences
is useful. For a datasource which provides data not currently in
memory, it is likely the set of data to process is bigger than the end
result, which means that streaming isn't useful.

Unless, of course you don't have a choice: if the datasources consumed
aren't supporting any logic, e.g. you have a customer service and an
order service and you want all customers with an order in may 2006: you
can only do that by streaming the data to the client and do the query
execution during the streaming. However, this isn't really an efficient
application: it might be a 'great' way to demonstrate astoria and what
not, but it absolutely sucks as system to use in production with a lot
of queries.

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 
Frans Bouma said:
In theory, that's true. In practise however, streaming data on the
fly is only useful if the data received to process by the provider is
free (== not slowing things down) and doesn't eat extra memory. If
these two restrictions aren't met, streaming isn't worth the effort.

Yes - I think it would have to be a very specialised database which
actually streamed the results. Perhaps a managed embedded database
which already had the data in memory (in the CLR) but which still used
SQL to access it? At that point it's closer to LINQ to Objects in
reality, but still has the veneer of a database.
The thing with linq to objects is that the data to process is already
in memory, so streaming the various datasources as if they're sequences
is useful.

Well, that depends on exactly what you mean by the first part. It's
certainly not true that the *whole* of the data to process has to be in
memory to start with. Each item of data has to be in memory to be
processed by each of the clauses, but streaming enables processing of
huge quantities of data, never holding more than one item in memory at
a time (depending on the query used).
For a datasource which provides data not currently in
memory, it is likely the set of data to process is bigger than the end
result, which means that streaming isn't useful.

On the contrary - it means that streaming is the *only* way of working
in some cases. In those cases you can't use *any* buffering operations
(such as GroupBy or OrderBy) but they can still be useful.

See http://tinyurl.com/2qehcg (it's my blog, but the link would be
long) for an example of this.
Unless, of course you don't have a choice: if the datasources consumed
aren't supporting any logic, e.g. you have a customer service and an
order service and you want all customers with an order in may 2006: you
can only do that by streaming the data to the client and do the query
execution during the streaming. However, this isn't really an efficient
application: it might be a 'great' way to demonstrate astoria and what
not, but it absolutely sucks as system to use in production with a lot
of queries.

Just because something is streaming doesn't mean that all of the
processing is being done at the client.

Consider a database query which needs to do a lot of processing, but
will then return millions of rows - out of an initial source of
billions of rows. Depending on the database, it may be significantly
better to stream the results than to force the database to re-execute
the query and split it into batches. All the processing could still be
done at the database, but the results returned one at a time as they're
available, rather than all being fetched onto the client side in one
hit.

This would be a pretty specialized application of LINQ (and a pretty
specialized database, I suspect) but it's far from inconceivable.
 
Jon,

This is a very classic discussion with a very old answer, as you only have
streaming data, don't use a database, just use a tape.

(This trying to show all your points you are in my idea discussing about)

Cor
 
Cor Ligthert said:
This is a very classic discussion with a very old answer, as you only have
streaming data, don't use a database, just use a tape.

(This trying to show all your points you are in my idea discussing about)

Except it really doesn't, I'm afraid. It doesn't address the way that
some databases in some situations may be able to stream, whereas most
will just return the whole data from the query. It doesn't address the
fact that just because the *results* is streaming data, the *source*
may not be.

It doesn't address the difference between fetching from a database and
using LINQ to Objects; it doesn't address the difference between a data
source which is entirely in memory and a data source which is read from
disk, a line at a time.

In short, I'm afraid I can't see that your answer adds to anyone's
understanding of LINQ - whereas I certainly hope that others are
finding my discussion with Frans interesting.
 
Back
Top