Showing posts with label linq. Show all posts
Showing posts with label linq. Show all posts

Tuesday, 20 May 2008

LINQ To SQL crippled prior to RTM

I was reading Frans' post on the marketing hype around Entity Framework this morning. He left an interesting comment there, quoting Matt Warren's post on writing an extensible LINQ to SQL DataContext. Here's a snippet from Matt's original post (emphasis mine):

"If only LINQ to SQL had a public provider model, I could simply plug a new one in and use it to intercept all interaction with the database. Oh, double irony, as there is no such provider model, at least not a public one. Grin.

LINQ to SQL was actually designed to be host to more types of back-ends than just SQL server. It had a provider model targeted for RTM, but was disabled before the release. Don’t ask me why. Be satisfied to know that is was not a technical reason. Internally, it still behaves that way."

Cue conspiracy theories! :)

Thursday, 20 March 2008

IEnumerable<T> and ForEach()

Why is there no ForEach() extension method defined for IEnumerable<T> in System.Linq.Enumerable for .NET 3.5? There is a ForEach() on Array and List<T>, but not for IEnumerable<T> which would seem a fairly natural place for it.

public static class Extensions {
  public static void ForEach<T>(this IEnumerable<T> source, Action<T> action) {
    foreach (var item in source) {
      action(item);
    }
  }
}

For some reason, when I'm in LINQy mode, my natural reaction is to try ForEach() over foreach, and I'm always a bit surprised when it doesn't work for IEnumerable<T>:

//Lambda goodness:
parameters.ForEach(parameter => doSomethingTo(parameter));

//Old stylz:
foreach (var parameter in parameters) {
  doSomethingTo(parameter);
}

Bit nit-picky I know, but I noticed Daniel Cazzulino made the same observation:

"I added a ForEach extension method to IEnumerable. How come it's missing in .NET 3.5? :S"

So at least I'm not completely alone on this :-) Anyone else wonder about this?

Monday, 4 February 2008

Using LINQ to XML to migrate blog posts

During my recent dalliance with other blog hosts, I ended up playing around with LINQ to XML and was quite impressed with how much nicer it was than the pre-LINQ .NET classes for XML parsing. Here's a quick overview of how I migrated all my Blogger posts to Community Server (before deciding to stick with Blogger :)), with a focus on the LINQ to XML bits.

First my compulsory disclaimers: this was a rush job. This was the most disgusting, hacky code I have ever written. I have written C64 Basic programs with better separation of concerns. If you use any of this code for anything at all you'd have to be insane! I did write the code test-first, but definitely skipped the all-important "design" part of TDD. Ugly ugly ugly. With that said, let's try and get something useful out of the spaghetti.

Getting Blogger posts in XML format

First step was to retrieve all the posts in XML. You can get this directly from a Blogger url:

http://(your_blog).blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500

If you have more than 500 posts, adjust the querystring or use a couple of batches to get the results. You can download this using .NET (webClient.DownloadFile(url, feedFile);, using System.Net.WebClient), or use the highly technical method of downloading from your browser :-) The resulting XML looks something like this:

<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?>
<feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'>
  <!-- Some nodes relating to your blog here. -->
  <entry>
    <id>...</id>
    <published>...</published>
    <updated>...</updated>
    <category scheme='http://www.blogger.com/atom/ns#' term='daves drivel'/>
    <category scheme='http://www.blogger.com/atom/ns#' term='shameless self promotion'/>    
    <title type='text'>...</title>
    <content type='html'>(blog post content here)</content>
    <link rel='alternate' type='text/html' href='http://davesquared.blogspot.com/2008/02/sample-post.html' title='Sample post'/>
    ...
  </entry>
  <!-- Lots more <entry /> nodes -->
</feed>    
    

Basic XML parsing using LINQ to SQL

As you can see from the XML above, the posts are described by <entry /> nodes. Normally we'd be breaking out the XmlDocument or XmlTextReader or similar and working some XPath magic. Or we could just use LINQ to XML. Here's a simple example:

XDocument xml = XDocument.Load(new StringReader(feed));
XNamespace xmlns = "http://www.w3.org/2005/Atom";
var entries = from feedElements in xml.Descendants(xmlns + "entry")
              select new {
                Title = feedElements.Element(xmlns + "title").Value,
                Content = feedElements.Element(xmlns + "content").Value
              };                  
 

The first thing we do is load the XML into an XDocument (in System.Xml.Linq). Looking at the XML feed, the XML namespace used is "http://www.w3.org/2005/Atom". I haven't found an XmlNamespaceManager-style approach to handling namespaces in LINQ to XML, so I just put this in a xmlns variable and will append it as I go. If you know how to specify a default namespace for XDocuments, please let me know :)

The next step is selecting the title and content elements from the each <entry/> node. The xml.Descendants(XName) method returns an IEnumerable<XElement>, which we tell to filter out all elements other than "entry" nodes.  The entries variable will now contain an IEnumerable<> of an anonymous type. Each item in the enumeration will have a Title and Content property representing the blog title and content parsed from the XML. We can iterate over this, use entries.ToArray() to perform our query, or use other LINQ goodness like entries.Take(5) (cue jazz) to further filter our results.

Notice how we specify all the node names as strings? They are actually of type XName (or XNamespace in the case of the xmlns variable). There is no public constructor to create an XName, but instead an implicit conversion from String is defined. This gives us the ease of using strings to specify node names (which is quite natural when working with XML), with the benefits of having strong typing around the name to access properties like LocalName and Namespace. In our query we have to prefix the XName, like entry, with the namespace xmlns, to make sure our nodes resolve properly, hence all the xmlns + "entry" style code.

Getting all the post data using LINQ to XML

Now let's get strongly typed objects from our XML feed.

public class BlogEntry {
  public String Title;
  public String Content;
  public String[] Categories;
  public String OriginalLink;
  public String Published;
  public String Updated;
}

I've been lazy here and am parsing the published and updated dates as simple strings (see, I told you this was hacky!). There are two field declarations of interest here (the rest have one-to-one relationships with XML elements). The first is the Categories array. These are specified in the XML as children of <entry/> nodes, with the term attribute holding the pertinent information:

<entry>
  ...
  <category scheme='http://www.blogger.com/atom/ns#' term='daves drivel'/>
  <category scheme='http://www.blogger.com/atom/ns#' term='shameless self promotion'/>    
  ... 
 

The other is the OriginalLink field. I wanted to put a link back to the original post from the new blog to be clear about the source and so that people could see any comments they made (I would have taken the comments over as well, but only had the MetaWeblog API to work with). So I needed the original post link, which I could parse out of one of the <link /> nodes that has a rel attribute value of "alternate":

<entry>
  ...
  <link rel='alternate' type='text/html' 
      href='http://davesquared.blogspot.com/2008/02/sample-post.html' title='Sample post'/>
  ...
 

Armed with this knowledge, let's tackled the new LINQ to XML query:

var entries = 
  from feedElements in xml.Descendants(xmlns + "entry")
  select new BlogEntry() {
    Title = feedElements.Element(xmlns + "title").Value,
    Content = feedElements.Element(xmlns + "content").Value,
    OriginalLink = feedElements.Elements(xmlns + "link")
             .Where(link => link.Attribute("rel").Value == "alternate")
             .Select(link => link.Attribute("href").Value)
             .First(),
    Published = feedElements.Element(xmlns + "published").Value,
    Updated = feedElements.Element(xmlns + "updated").Value,
    Categories = feedElements.Elements(xmlns + "category")
                   .Select(category => category.Attribute("term").Value)
                   .ToArray()
  };
 

The main changes from our original query are emphasised. First up, we are now working with strongly typed BlogEntry objects, rather than anonymous types. The entries variable is now an IEnumberable<BlogEntry>, which we can actually return from our parser method (vars only work locally).

We are also using nested queries to drag out the Categories and OriginalLink (you can do this using the sugary "from ... in ... select" style as well, but I found it easier to use the methods explicitly in this case). For categories we are simply selecting the term attributes from all the <category/> nodes in our entry. For the original link, we use .Where() to filter all the <link/> nodes to only include ones with a rel attribute equal to "alternate", take the .First() (there should only be one), and then select the value of the href element.

Finishing up

The final steps in the migration where doing some regex-ing of the content to get the post id (slug for Wordpressors), and translate links to my own articles to point to the new site (so clicking on internal links on the new blog kept readers in the new blog. The last bit in particular was a bit tricky, as it needed a first pass to parse into a dictionary to I could look up the new urls as required. Here's the horribly hacky code if you're interested:

BloggerParser parser = new BloggerParser();
IDictionary<String, BlogEntry> entries = parser.ParseFeed(feed).ToDictionary(entry => entry.GetSlug());
foreach (BlogEntry entry in entries.Values) {
  entry.Content =
    Regex.Replace(entry.Content,
    @"http://davesquared.blogspot.com/\d{4}/\d{2}/(?<slug>.*?)\.html",
    delegate(Match match) {
      return entries[match.Groups["slug"].Value].GetNewLink();
    },
    RegexOptions.IgnoreCase | RegexOptions.Singleline);
}
    

After parsing into a dictionary (parser.ParseFeed() returns our IEnumerable<BlogEntry> from our earlier LINQ to XML adventures), we try and replace any internal links in each post's content with the link to the new post, using the slug as a unique index to lookup BlogEntryS. Ugly but effective.

The final step in all of this was to post all the transformed posts to the new site, which you can do using the MetaWeblog API, which, as far as I can tell, all went remarkably well. :-)

So there you go. This was my first real experience working with LINQ to XML, and I found it a fair bit easier than XmlDocument tweaking and mumbling various XPath incantations. Hope this helps. :-)

Friday, 21 December 2007

Messing around with LinqToSql

This post is part of a small series on .NET ORM tools. You can find the rest of them here.

As part of my continuing efforts to fragrantly misuse a number of .NET ORM tools, here is my effort with LinqToSql. The usual proviso applies: all of this is really quick and hacky, as it is just to get a little familiarity with the tool rather than to uncover any "best practices" or similar.

Scene refresher

I have a table of suppliers, and a table of states (or provinces, territories, prefectures etc.). Both suppliers and states have names, which are stored as strings/varchars, and IDs, which are stored as Guids/uniqueidentifiers. Each supplier can service many states. So we have a simple many-to-many relationship between the two main entities. It looks a bit like this:

I am using Aussie states for my tests, so I have populated the State table with the following names: NSW, VIC, QLD, TAS, SA, WA, ACT, NT.

Setting up LinqToSql

I created a new C# class library project, then added a LinqToSql Classes project item, which I named WorkshopDb.dbml. In true Microsoft style, you simply drag and drop the tables from the database onto the designer, which generates the necessary classes for you:

This adds an app.config file to the project, containing the relevant connection string. We are now ready to go!

Populating the database

As per my previous posts, I'll a test fixture to run the remainder of the code. After cleaning out my little database again, I'll add a method to encapsulate the process of creating a supplier and mapping the states it services:

private static void createSupplier(String name, String[] statesServiced) {
  WorkshopDbDataContext db = new WorkshopDbDataContext();

  Supplier supplier = new Supplier();
  supplier.SupplierId = Guid.NewGuid();
  supplier.Name = name;

  List<State> states = (
        from state in db.States
        where statesServiced.Contains(state.Name)
        select state
      ).ToList();

  states.ForEach(
    state => supplier.Supplier_StatesServiceds.Add(
      new Supplier_StatesServiced() {
        SupplierId = supplier.SupplierId,
        StateId = state.StateId
      }
    ));
    
  db.Suppliers.InsertOnSubmit(supplier);
  db.SubmitChanges();
}

As with SubSonic, LinqToSql does not automatically let me traverse the many-to-many relationship, but the new ForEach method makes it pretty easy to map each state to the supplier.Supplier_StatesServiceds collection (man, I really should have aliased that mapping table first).

We can now use that method to add the following test data:

createSupplier("Dave^2 Quality Tea", new string[] { "NSW", "VIC" });
createSupplier("ORMs'R'Us", new string[] { "NSW" });
createSupplier("Lousy Example", new string[] { "TAS", "VIC" });
createSupplier("Bridge Sellers", new string[] { "QLD" });

Querying the data

Let's run through the usual tests. First, getting a list of all suppliers.

[Test]
public void Should_be_able_to_get_all_suppliers() {
  WorkshopDbDataContext db = new WorkshopDbDataContext();
  var suppliers = from supplier in db.Suppliers select supplier;
  Assert.That(suppliers.Count(), Is.EqualTo(4));
}

So far so good. Now let's get all suppliers with an "s" in their name.

[Test]
public void Should_be_able_to_get_all_suppliers_with_s_in_their_name() {
  WorkshopDbDataContext db = new WorkshopDbDataContext();
  var suppliers = from supplier in db.Suppliers
          where supplier.Name.ToLower().Contains("s")
          select supplier;
  Assert.That(suppliers.Count(), Is.EqualTo(3));
}

And finally, let's navigate over the supplier-state relationship to get all suppliers that service NSW:

[Test]
public void Should_be_able_get_all_suppliers_that_service_NSW() {
  WorkshopDbDataContext db = new WorkshopDbDataContext();
  var suppliers = from supplier in db.Suppliers
          join servicedState in db.Supplier_StatesServiceds 
            on supplier.SupplierId 
            equals servicedState.SupplierId
          where servicedState.State.Name == "NSW"
          select supplier;
  Assert.That(suppliers.Count(), Is.EqualTo(2));
}

Pretty straight forward. This actually generates very similar SQL to the NHibernate example, but because I never actually get a list from the suppliers expression, the suppliers.Count() call actually uses SELECT Count(*) ... (I believe you can do similar queries in both NHibernate and SubSonic). The following is roughly what is executed via sp_executesql:

SELECT COUNT(*)
FROM [dbo].[Supplier]
INNER JOIN [dbo].[Supplier_StatesServiced] ON [Supplier].[SupplierId] = [Supplier_StatesServiced].[SupplierId]
INNER JOIN [dbo].[State] ON [State].[StateId] = [Supplier_StatesServiced].[StateId]
WHERE [State].[Name] = @p0

Vague semblance of a conclusion

LinqToSql was extremely easy to use, especially in the initial configuration department. Like NHibernate, the query syntax takes a bit of getting used to, but it is something that becomes familiar fairly quickly.

I should point out that both SubSonic and NHibernate currently target the .NET 2.0 world, so the "Language INtegrated Query" part of LinqToSql was always going to give LinqToSql a bit of an expressiveness advantage. If you are a big fan of LINQ queries, they may be coming to an ORM tool near you in the not-too-distant future now that .NET 3.5 has been released. That said, I still quite like how the query criteria works in the NHibernate example.

I also noticed that NHibernate was more domain model (i.e. classes) focused, whereas the LinqToSql query for retrieving all the suppliers that service NSW was more data-schema focussed (using the JOIN construct rather than have a working knowledge of the relationship). This isn't meant as praise or criticism of either, just a difference in the approaches. As a quick side note, I believe the ADO.NET Entity Framework is meant to have more advanced support for many-to-many relationships.

So far I've actually felt SubSonic was the most difficult to use for this particular scenario, but this is largely a result of the contrived example I used. I have used SubSonic a few times and in general it is exceptionally straight forward to get working.

As I was mainly looking into how each tool tackled this particular scenario, I have not gone into the different architectural approaches of the tools (ActiveRecord vs. DataMapper, implications for testability and persistence ignorance etc.). It's definitely worth looking into this side of things if you are unfamiliar with the tools. Ian Cooper has a great post on some of these issues as applied to LinqToSql.

That's it for now. Good luck on your ORM travels!

Friday, 16 November 2007

Links to LINQ: Writing custom providers

Just storing a couple of links for a rainy day:

Thursday, 8 November 2007

LINQ-to-SQL logging via DataContext.Log

After playing around with LINQ-to-SQL today I noticed that the generated DataContext subclass exposes a Log property of type TextWriter.

I initially replaced this with a StringWriter backed by a local StringBuilder variable so I could read the output, but then decided to take advantage of the fact that the generated class is a partial class:

//Generated class: WidgetDb.designer.cs
public partial class WidgetDbDataContext : System.Data.Linq.DataContext {
  ...
}

//My partial implementation: WidgetDbDataContext.cs
public partial class WidgetDbDataContext {
  private StringBuilder logBuilder = new StringBuilder();
  
  public String GetLoggedInformation() {
    return logBuilder.ToString();
  }

  partial void OnCreated() {
    Log = new StringWriter(logBuilder);
  }
}

I could then perform a sample query or two in my ASPX:

private void doLinqStuff() {
  WidgetDbDataContext db = new WidgetDbDataContext();      
  var widgets = from w in db.Widgets select w;
  WidgetGrid.DataSource = widgets;
  WidgetGrid.DataBind();      
  WidgetCount.Text = widgets.Count().ToString();      
  StatusLog.Text += db.GetLoggedInformation().Replace("\n", "<br/>");
}

The output of StatusLog.Text from this was:

SELECT [t0].[WidgetId], [t0].[WidgetName], [t0].[WidgetDescription], [t0].[WidgetPrice], [t0].[IsActive]
FROM [dbo].[Widgets] AS [t0]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.20706.1

SELECT COUNT(*) AS [value]
FROM [dbo].[Widgets] AS [t0]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.20706.1 

This is potentially helpful for learning how your expressions translate to SQL, but you probably wouldn't want to do this in production :)

I then decided to search Google for "DataContext.Log" and found out that people far smarter than me have already come up with better solutions. Ah well, at least I learnt about extending the generated DataContext, as well as some smart ways of logging from others. :)

Friday, 14 September 2007

Rob Conery on text mining and analysis

Rob has posted part 1 of a series on mining unstructured data. He goes through some basics on ETL (Extraction, Transformation, Loading) and natural language parsing, implements a text miner using LinqToSql, and analyses the results. For the next part we are promised some data warehousing tricks, more analysis techniques, and some OLAP with Excel. It has been an interesting read for me as I am pretty unfamiliar with this area of IT.

Friday, 17 August 2007

LINQ enumeration gotcha

Rick Strahl has posted about enumerating over LINQ results. Essentially, each item enumerated in a LINQ-to-SQL select fires off a new DB query. This means that modifying the result will not affect the resultset on the next enumeration (which probably sounds obvious, but is easy to code without noticing. Especially if you are working on a result and then databinding as in Rick's example). It also means that repeated traversals will incur a repeated overhead, rather than repeating the work in-memory.

Good to watch out for. Rick suggests using ToList, ToArray, or ToDictionary methods if you want to grab a single, in-memory copy of the result (obviously being mindful of the result size :)).

Tuesday, 3 July 2007

Rob Conery on using LINQ to query just about anything

Rob has a nice post on using LINQ to query just about anything. His example works through creating an IQueryable<> implementation to query an Amazon web service.

Monday, 11 June 2007

Persistence ignorance and TDD with LINQ to SQL

Ian Cooper has written an article on LINQ to SQL covering persistence ignorance and exercising LINQ to SQL code using unit and integration tests.

Staccato Signals: Being Ignorant with LINQ to SQL

Found courtesy of Jeremy Miller.

Monday, 30 April 2007

ADO.NET Entity Framework dropped from "Orcas"/.NET 3.5

Microsoft has dropped the Entity Framework, which includes LINQ for Entities, from the .NET 3.5 and VS.NET "Orcas" release. The MSDN Data blog has an informative post on Microsoft's Data Access Strategy, which spells out the differences between the Entity Framework and LINQ to SQL.

Frans has a theory on the motivation for this. I believe there was a lot of constructive criticism at the 2007 MVP Summit recently, and it seemed like the MS developers there were keen to address any issues the MVPs found. So maybe the reason is not just business-related as Frans' speculates? Maybe they just want to get it right first go, instead of the usual MS approach of waiting until version 3 SP1 ;-).

Anyway, shame it is dropped from Orcas, but maybe it will give other ORM tools like NHibernate a chance to get some nice LINQ integration going in the meantime.

Wednesday, 11 April 2007

Example of querying collections in C# 2.0 and 3.0 with LINQ

LukeH has a good example of how simple and expressive LINQ queries can be compared to more traditional methods of filtering collections. His example uses reflection to get all the instance methods of System.String with particularly ordering and grouping.

LukeH's WebLog : In-Memory Query with C#2.0 and C#3.0

Tuesday, 20 March 2007

Linq for Entities: MVP impressions from MVP Summit 2007

Jeremy Miller has a good summary of the general MVP impressions of Linq for Entities after the recent MVP Summit. Everyone seems impressed by the potential of the product, but I was very interested to hear other OR Mappers, specifically NHibernate, being preferred at this stage. Apparently there has been lots of feedback to the developers to make Linq for Entities a bit less invasive (so Plain Old CLR Objects can be Persistance Ignorant and require less configuration).

He also talks about the general feeling amount MVPs that the WebForms model is lacking, and a growing surge of interest in MonoRail. ScottGu is apparently thinking along the same lines. Exciting times. :-)

Monday, 19 March 2007

101 LINQ Samples

MSDN has a collection of LINQ operation samples.

Saturday, 17 March 2007

Linq for NHibernate

Jeremy Miller has pointed out there is work afoot to use Linq as a front-end for NHibernate. ScottGu mentioned earlier that Linq was designed for this kind of interoperability (point #3 under the heading "Built-in System.Linq Extension Methods" which specifically mentions NHibernate et al). It's nice to see this kind of approach from MS becoming more prevalent.

Wednesday, 13 December 2006

LINQ, Lambda expressions and maths

One of the features that enables LINQ to work are the implementation of Lambda expressions. Lambda expressions are basically another way of representing a function, and are used in LINQ to pass function pointers/delegates around without the verbosity of declaring function blocks. For example:

public List GetFirstAiders() {
  List employees = getEmployees();
  return employees.FindAll(IsFirstAider);
}
public bool IsFirstAider(Employee employee) {
  return employee.IsFirstAider;
}

Using lambda expressions this becomes:

public List GetFirstAiders() {
  List employees = getEmployees();
  return employees.FindAll(e => e.IsFirstAider);
}

The (e => e.IsFirstAider) is the lambda expression that represents the old bool IsFirstAider(Employee) method. The types used in the function expressed are not explicitly mentioned, but are instead implied by the List method signature.

Anyway, the initial point of this post was to mention that Wikipedia has information on the mathematical background of lambda calculus.

Tuesday, 12 December 2006

Anders Hejlsberg presentation on C# 3.0 language enhancements

David Hayden has linked to a presentation by Anders Hejlsberg on C# 3.0 and the features that enable LINQ to be implemented. Great video!