Welcome to Shaun Luttin's public notebook. It contains rough, practical notes. The guiding idea is that, despite what marketing tells us, there are no experts at anything. Sharing our half-baked ideas helps everyone. We're all just muddling thru. Find out more about our work at bigfont.ca.

Query expressions

Tags: csharp-language-specification, c#

DRAFT DRAFT DRAFT DRAFT

These are my raw notes on section 7.16 of the C# Language Specification. Section 7.16 falls within section 7 on expressions.

Heuristic Model

  1. Notes. Add notes for each section.
  2. Definitions. Add definitions for the chapter.
  3. Examples. After adding definitions, then add examples.
  4. Edit. After adding examples, then edit for readability etc.

My Personal Conventions

  • terminology is italicized
  • code is in back ticks

Intro

query expression syntax is similar to that of relational and hierarchical query languages

  • begins with from clause
  • ends with either select or group clause
  • after the initial from can come zero or more of these clauses
    • from
    • let
    • where
    • join
    • orderby
  • each from clause is a generator and includes:
    • a range variable...
    • which ranges over the elements of a sequence
  • each let clause
    • introduces a range variable
    • representing a value computed by means of previous range variables
  • each where clause
    • is a filter
    • that excludes items from the result
  • each join clause
    • compares specified keys of the source sequence
    • with keys of another sequence
    • yielding matching pairs
  • each orderby clause
    • reorders items
    • according to specified criteria
  • the final select or group clause
    • specified the shape of the result
    • in terms of the range variables
  • an into clause
    • can "splice" queries
    • by treating the results of one query
    • as a generator in a subsequent query

Ambiguities

The way to mixing contextual keywords into strings.

  • from
  • where
  • join
  • on
  • equals
  • into
  • let
  • orderby
  • ascending
  • descending
  • select
  • group
  • by

The above are keywords when they occur anywhere within a query expression.

To use these keywords within a query expression, prefix them with @

from @select
in (new string[] { "from", "select" })
select @select

Where a query expression is any expressions that

  • starts with from <em>identifier
  • followed by any token except ; = or ,

Translation

The steps for turning a query expression into fluent syntax.

  • C# does not specify the execution semantics of query expressions.
  • Rather, the compiler translates query expressions into methods
  Where
  Select
  SelectMany
  Join
  GroupJoin
  OrderBy
  OrderByDescending
  ThenBy
  ThenByDescending
  GroupBy
  Cast
  • These methods must have particular
    • signatures
    • result types
  • These methods can be
    • instance methods of the object being queried, or
    • extension methods that are external to the object.
    • [I'd like to see an example of override the Linq Extension Methods]
  • The translation:
    • is a syntactic mapping
    • occurs prior to any type binding or overload resolution
    • is guaranteed to be syntactically correct
    • is NOT guaranteed to produce semantically correct C# code
  • After the translation:
    • the resulting methods are invoked as regular methods
    • and this may result in normal method call errors
  • The compiler repeats the following translation until further reductions are impossible
    • the compiler applies each translation section in order
    • each section is applied exhaustively
    • once exhausted, a section is not later revisited in the same query
  • Two notes:
    • assignment to range variables is NOT allowed in a query expression, though this rule need not be strictly enforced in all C# implementations
    • certain translations inject range variables with transparent identifiers denoted by

1. Select and groupby clauses with continuations

  • from ... into x ...
  • translates into
  • from x in ( from ... ) ...

Example

from c in customers group c by c.Country into g select new { Country = g.Key }

becomes

from g in ( from c in customers group c by c.Country ) select new { Country = g.Key }

then becomes

customers.GroupBy(c => c.Country).Select(g => new { Country = g.Key })

2. Explicit range variable types

from

  • from T x in e
  • translates into
  • from x in (e).Cast<T>()

join

  • join T x in e on k1 equals k2
  • translates into
  • join x in ( e ).Cast<T>() on k1 equals k2

Example

from Customer c in customers where c.City == "London" select c

becomes

from c in customers.Cast<Customer>() where c.City == "London" select c

then becomes

customers.Cast<Customer>().Where(c => c.City == "London")

Note

The .Cast<T>() operates on each object in the collection (as opposed to casting the collection).

3. Degenerate query expressions

A degenerate query expression is one the trivially selects the elements from the source.

  • from x in e select x
  • translates into
  • ( e ).Select(x => x)

Example

from c in customers select c

becomes

customers.Select(c => c)

Notes

  • if a query expression includes only a degenerate query,
    • then the translation appends a .Select()
  • that said, if there are further translations
    • a later phase of the translation
    • will replace the degenerate query with just its source
  • This happens because...
    • it is important to ensure that the result of a query expression is not the source
    • lest we reveal the type and identity of the source to the client of the query
    • [why would that be problematic?]

4. From, let, where, join, and orderby clauses

A query expression with a...

...second from clause followed by a...
  • This is the SelectMany. It isn't a query continuation.
  • The select clause has access to the range variable from both the first and second from clauses.

... select clause

  • from x1 in e1 from x2 in e2 select v
  • ( e1 ) . SelectMany ( x1 => e2, ( x1 , x2 ) => v )
  • from c in customers from o in c.Orders select new { c.Name, o.OrderId, o.Total }
  • customers.SelectMany(c => c.Orders, (c, o) => new { c.Name, o.OrderId, o.Total } )

something other than a select clause

  • from x1 in e1 from x2 in e2 ...
  • from * in ( e1 ) . SelectMany( x1 => e2 , ( x1, x2 ) => new { x1, x2 } )
  • from c in customers from o in c.Orders...
  • from * in customers.SelectMany( c => c.Orders, ( c, o ) => new { c, o } ) ...

Recall that the * is the transparent identifier. It captures multiple range variables and later becomes an anonymous object or function. In the above case, it later becomes new { x1, x2 }

Note, in both the above examples, the range variables of both from clauses stay in scope; that is, both are available in subsequent clauses.

let clause

The variable defined within the let clause has access to the initial range variable and, along with it, is available through the rest of the query.

  • from x in e let y = f ...
  • from * in ( e ) . Select ( x => new { x, y = f } ) ...
  • from o in orders let t = o.Details.Sum(d => d.UnitPrice * d.Quantity) ...
  • from * in orders.Select(o => new { o, t = o.Details.Sum(d => d.UnitPrice * d.Quantity ) } ) ...
where clause
  • from x in e where f ...
  • from x in ( e ).Where ( x => f )
  • from o in orders where o.Id > 0
  • from o in orders.Where(o => o.Id > 0)
join clause without an into followed by a

select clause

  • from x1 in e1 join x2 in e2 on k1 equals k2 select v
  • ( e1 ) . Join ( e2, x1 => k1, x2 => k2, ( x1, x2 ) => v )

something other than a select clause

In this case, the transparent identifier * holds the place of the anonymous new { x1, x2 }

  • from x1 in e1 join x2 in e2 on k1 equals k2
  • from * in ( e1 ) . Join ( e2, x1 => k1, x2 => k2, ( x1, x2 ) => new { x1, x2 } )
join clause with an into followed by a

The into makes the join into a group join.

select clause

The output here is the initial range variable x1 and the group formed from the second range variable x2. In other words, x1 remains in scope but x2 doesn't because it's behind g.

  • from x1 in e1 join x2 in e2 on k1 equals k2 into g select v
  • ( e1 ) . GroupJoin ( e2, x1 => k1, x2 => k2, ( x1, g ) => v )

something other than a select clause

  • from x1 in e1 join x2 in e2 on k1 equals k2 into g ...
  • from * in ( e1 ) . GroupJoin ( e2, x1 => k1, x2 => k2, ( x1, g ) => new { x1, g } )
orderby clause
  • from x in e orderby k1, k2, k3 ...
  • ( e ) . OrderBy ( k1 ) . ThenBy ( k2 ) . ThenBy ( k3 ) ...

followed by descending

  • ( e ) . OrderByDescending ( k1 ) . ThenByDescending ( k2 ) ...

5. Select clauses

  • from x in e select v
  • ( e ) . Select ( x => v )

The =&gt; is a projection from each value of x into v. If v  is simply a repeat of x, then the translation is just ( e ).

6. Group by clauses

  • from x in e group v by k
  • ( e ) . GroupBy ( x => k , x => v )

The exception is when v is the identifier x, in which case the result is ( e ) . GroupBy ( x =&gt; k )

7. Transparent identifiers

  • some translations *inject range variables with transparent identifiers
    • the * denotes these
    • they are NOT a proper language feature
    • rather, they exist only as an intermediate step during translation
  • further translation steps propagate the * into either
    • anonymous functions
    • anonymous object initializers
  • cases:
    • when a * occurs as a parameter in an anonymous function,
      • then the members of the associated anonymous type,
      • are automatically in scope in the anonymous function body
    • when a * occurs as a member of a declarator in an anonymous object initializer
      • then it introduces a member with a transparent identifier
  • As described above, the * are always introduced with anonymous types
  • the intent is to capture multiple range variables as members of a single object
  • a c# implementation is allowed to use a different mechanism to accomplish the same intent.

Pattern

  • Types can implement this pattern to support query expressions on those types.
  • Types have flexibility in how they implement query expressions.
    • implement as
      • instance methods or
      • extensions methods,
      • because the invocation syntax is identical
    • can request
      • delegates or
      • expression trees,
      • because anonymous functions are convertible to both
  • The following is the recommended shape of a generic type C&lt;T&gt; that supports query expressions.
  • It's possible to implement this with a non-generic type.
  • See more details in Specification-QueryPattern.

Terminology in Approximate Order of First Occurrence

  • query expression
    • any expression that starts with "from identifier"
    • followed by any token except:
      • ;
      • =
      • ,
      • Prefix those with @ if we want to use any of those in a string.
  • expression
    • a line of code
    • that evaluates to a value
  • clause
    • a part of a statement
    • that does not constitute a complete statement
  • generator
    • a special type of routine
      • that controls the iteration behavior of a loop
      • yields values one at a time
    • all generators are iterators
    • generators are similar to functions that return arrays
      • a generator has parameters
      • other code can call a generator
      • a generator generates a series of values
    • generators are different from functions that return arrays
      • because generators yield values one at a time
      • instead of returning all the values at once
    • a generator looks like a function but behaves like an iterator
    • https://en.wikipedia.org/wiki/Generator%28computerprogramming%29
  • range variable
    • create these in a from or let clause
    • stores each subsequent value that a generator yields
  • ranges
  • sequence
  • token
    • white space and comments are not tokens
    • the following are tokens
      • identifier
      • keyword
      • integer-literal
      • real-literal
      • character-literal
      • string-literal
      • operator-or-punctuator
  • range variable
  • sequence
  • clauses and keywords
    • from
    • select
    • group
    • by
    • let
    • where
    • join
    • on
    • equals
    • into
    • orderby
    • ascending
    • descending
  • splice
  • contextual keywords vs simple names
  • query expression translation
    • Where
    • Select
    • SelectMany
    • Join
    • GroupJoin
    • OrderBy
    • OrderByDescending
    • ThenBy
    • ThenByDescending
    • GroupBy
    • Cast
  • translation
    • first into another query
    • then into Methods
  • range variables
    • the variable immediately following the from
  • transparent identifier
    • represented with *
    • exists only as an intermediate step in query translation
    • later steps turn it into anonymous functions or anonymous object initializers
    • tend to capture multiple range variables as members of a single object
  • explicit range variable type
  • degenerate query expressions
    • trivially selects the elements of the source
    • [this prevents calling code from being able to modify the source]
  • identifier
  • member declarator