Introduction
I understand how malicious code can be created. Years ago I, figured out to embed the Michelangelo virus in the old command.com, have command.com open the speaker port and leave a permanent beep on, and even wrote a keyboard sniffer based on BIOS interrupts. This was almost twenty years ago and that’s where I left my curiosity. So, other then helping the occasional friend get rid of bots, worms, and popup cheese whiz, I am by no means an expert at malicious code. In fact, I wish the authors of such code would just get a life.
That said, a small percentage of people do malicious things. Maybe they do them because they are curious or they may be trying to steal from you or your employer. One such form of malice is the SQL injection attack. SQL injection goes something like this: A programmer writes some SQL that is filled in by some kind of user input and the SQL is passed to the database. The malicious user, instead of putting in an expected parameter, puts in a parameter and some home grown SQL tacked on the end.
In this article, you will learn how LINQ to SQL can support fill in the blank queries and block SQL injection, making LINQ a better alternative to passing SQL to the database. So, not only is LINQ to SQL easier than straight ADO.NET/SQL, LINQ to SQL as a by-product of its design foils SQL injections.
Reviewing Malicious SQL Injection
It’s pretty easy in some cases to figure out whether a site is using user input data to build SQL commands. Browse to a website, enter some goofy data like a single quote which will cause an un-terminated string error in SQL, and see if the website blows up. If it does, and you are really lucky and the website is deployed in debug mode, you will probably actually see the ADO.NET code.
Figure 1 shows a very simple website that lets the user look at company names by the customer id. (The Northwind database was used because most of you will have access to it.) However, if a malicious user enters a single quote, the site crashes and in debug mode gives the malicious user enough information to know his code is being passed to SQL Server.
Figure 1: Enter a valid customer id and get the company name back.
Figure 2: The source lines and the unclosed quotation error tell me that this site is passing user input to SQL Server without a lot of error checking.
Now, this person can craft some SQL instead of a customer id and start fishing. A great place to fish is the master table. First, Listing 1 contains the code for this sample website.
Listing 1: This code is subject to SQL injection attack because user input is sent right to SQL Server.
Imports System.Data.SqlClient Partial Public Class _Default Inherits System.Web.UI.Page Protected Sub Page_Load(ByVal sender As Object, _ ByVal e As System.EventArgs) Handles Me.Load End Sub Protected Sub Button1_Click(ByVal sender As Object, _ ByVal e As EventArgs) Handles Button1.Click Dim connectionString As String = _ "Data Source=.SQLExpress;Initial>span _ Catalog=northwind;Integrated Security=True" Dim sql As String = _ "SELECT CustomerID, CompanyName FROM Customers _ WHERE CustomerID = '{0}'" Using connection As SqlConnection = _ New SqlConnection(connectionString) connection.Open() Dim command As SqlCommand = _ New SqlCommand(String.Format(sql, _ TextBox1.Text), connection) Dim table As DataTable = New DataTable Dim adapter As SqlDataAdapter = New SqlDataAdapter(command) adapter.Fill(table) GridView1.DataSource = table GridView1.DataBind() End Using End Sub End Class
The sample literally reads the TextBox.Text value and plugs it into the SQL without error checking. (It is not recommended that you write code like this. Although there is code like this in production, this example is just a very straightforward way to demonstrate how injection works.)
Suppose next that what your unhappy guest does is, instead of entering a CustomerID, he enters the follow text:
"ALFKI' UNION SELECT 'database name' as dummy, name _ FROM master.sys.databases --"
The preceding text turns the harmless looking SQL from Listing 1 to:
SELECT CustomerID, CompanyName FROM Customers _ WHERE CustomerID = 'ALFKI' UNION _ SELECT 'database name' as dummy, name from master.sys.databases
The comment (–) characters at the end of the injection get rid of your single quote, replacing it with his single quote and a UNION SELECT. The UNION SELECT, if successful, shows the user all of the databases on your server. Now, your web page might produce something like the results shown in Figure 3.
Figure 3: A successful UNION SELECT turns an ionocuous web page into a malicious user’s snoop and destroy tool.
There are a lot of ways to fix this simple example. You could provide a drop down only. You could use regular expressions to screen the input value for unacceptable and potentially harmful values such as DELETE FROM, single quotes, and SQL comments (–), or you could use LINQ to SQL.
Eliminating SQL Injection with LINQ to SQL
To simplify the focus, the Web application sample is shown as a console application in Listing 2. The console application simply shows malicious SQL being injected into a SQL statement. (For the discussions going forward, it doesn’t matter how it got there. In the previous section, one way SQL might be injected was demonstrated, so it is enough to know that bad SQL can end up in undesirable places.)
Listing 2: SQL injection simplified for discussion purposes.
Imports System.Data.SqlClient Module Module1 Sub Main() SqlInjectionToDiscoverOtherDatabase() End Sub Sub SqlInjectionToDiscoverOtherDatabase() Dim connectionString As String = _ "Data Source=.SQLExpress;Initial _ Catalog=northwind;Integrated Security=True" Dim sql As String = _ "SELECT CustomerID, CompanyName FROM Customers _ WHERE CustomerID = '{0}'" _ Dim customerID = _ "ALFKI' UNION SELECT 'database name' as dummy, " + _ “name FROM master.sys.databases --" Using connection As SqlConnection = _ New SqlConnection(connectionString) connection.Open() Dim command As SqlCommand = _ New SqlCommand(String.Format(sql, customerID), connection) Dim reader As SqlDataReader = command.ExecuteReader While (reader.Read()) Dim i As Integer For i = 0 To reader.VisibleFieldCount - 1 Console.WriteLine(reader(i).ToString()) Next End While End Using Console.ReadLine() End Sub End Module
Listing 2 declares a customerID local variable representing the malicious SQL (input by a user). Now, let me talk about LINQ to SQL in general before demonstrating how LINQ to SQL eliminates SQL injection. If you are familiar with LINQ to SQL basics, you can skip the next section.
Understanding LINQ to SQL
LINQ, or Language Integrated Query, is a SQL-like language written with VB or C# code. LINQ to SQL is basically LINQ code that is converted to SQL queries by the LINQ to SQL technology.
To summarize, LINQ to SQL works like this: You define custom entity classes that match tables in your SQL database. The class is tagged with the TableAttribute and properties that represent columns are tagged with the ColumnAttribute. You can use SqlMetal to generate these classes for you. This literally means that only properties in your (or SqlMetal.exe’s) custom classes can be returned from SQL and only those properties that are associated with columns in tables by the ColumnAttribute. Then, you define a simple class that inherits from DataContext. The DataContext represents your connection to the database. That’s it. You don’t use ADO.NET directly. You don’t write SQL directly. LINQ to SQL takes care of the plumbing for you.
The net result is that only data matching your entities can get returned, your users are working with compiled code and not dynamic SQL strings, and nothing gets passed to SQL directly. The LINQ to SQL engine is writing your SQL for you.
Understanding How LINQ to SQL Stops SQL Injection
Dynamic SQL is potentially problematic because it’s a string of text that gets passed to the SQL engine. LINQ to SQL is compiled code. The SQL is generated by the LINQ to SQL technology.
The next critical key is that LINQ only supports Select queries. There is no Update, Delete, or Insert. This means that a malicious user can only read data. This could be a problem, but I’ll show you why it’s not.
LINQ is converted to method calls at compile time. This means the executable code is already compiled. A user can’t input new LINQ keywords at runtime to compromise your pre-defined queries. That is, a Select can’t become a join or a union at runtime.
Mild Caution: A user could input dynamic LINQ at runtime if your application was designed to use the CodeDOM to compile LINQ queries at runtime, but of course, this is a pretty technically advanced thing to do. Many people won’t know how to do it and there is almost no reason to include dynamic compilation in business applications.
To recap, LINQ to SQL eliminates injection because the user is dealing with compiled code, not dynamic queries. LINQ queries only perform reads; that is, Select exists but there is no Delete, Update, or Insert. Modifications to underlying data are made through method calls and not queries. Listing 3 shows how you can convert the code in Listing 2 (a console application) to perform the same task using LINQ to SQL.
Listing 3: Code to permit a user (represented by the local variable) to input a value to filter a result set.
Imports System.Linq Imports System.Data.Linq Imports System.Data.Linq.Mapping Module Module1 Sub Main() CorrectedVersionWithLINQToSQL() End Sub Sub CorrectedVersionWithLINQToSQL() Dim northwind As Northwind = New Northwind Dim customers As Table(Of Customer) = _ northwind.GetTable(Of Customer)() Dim param As String = "ALFKI" Dim results = From cust In customers _ Where cust.CustomerID = param _ Select cust For Each c In results Console.WriteLine(c.CustomerID) Console.WriteLine(c.CompanyName) Next Console.ReadLine() End Sub End Module Public Class Northwind Inherits DataContext Public Shared ReadOnly connectionString AsString = _ "Data Source=.SQLExpress;Initial _ Catalog=northwind;Integrated Security=True" Public Sub New() MyBase.New(connectionString) End Sub End Class <Table(Name:="Customers")> _ Public Class Customer Private FCustomerID As String <Column()> _ Public Property CustomerID() As String Get Return FCustomerID End Get Set(ByVal Value As String) FCustomerID = Value End Set End Property Private FCompanyName As String<Column()> _ Public Property CompanyName() As String Get Return FCompanyName End Get Set(ByVal Value As String) FCompanyName = Value End Set End Property End Class
The Northwind class represents everything you need to connect to the underlying SQL database. You don’t use ADO.NET directly. Note that Northwind inherits from DataContext and calls its base constructor with the connection string. The Customer class is called an entity class. It is mapped to the Customers table with the TableAttribute, and as designed, it will only return the CustomerID and CompanyName from the Customers table. The DataContext and Table class are incapable of returning any other data from anywhere else.
The Module contains the LINQ to SQL query. Simply create an instance of the Northwind class, request the Table(Of Customers), and write a LINQ query to select the desired data. As demonstrated by the LINQ sample, LINQ starts with a From clause to set the context and aid Intellisense and the Select clause comes last. (A little odd looking at first, but easy enough to get used to.)
Disclaimer: There seem to be many clever people working to wreak havoc, so no single approach is guaranteed to prevent all attacks. The likelihood of attack and the potential for damage must be weighed against the time, effort, and money required to protect information. And, security is a holistic problem that must consider the people inside and outside of your organization and how they behave.
I know all of you are very smart, so I challenge anyone interested to figure out how to inject a malicious value into param and get anything useful to work. The conditions are you can only change the value of param, and you are encouraged to publish your results if successful.
Summary
LINQ to SQL eliminates SQL injection because the user is, in effect, working with compiled code and not dynamic SQL. This means the end user can’t pass a query in masked as a parameter. LINQ to SQL eliminates destructive SQL injection because the query language only supports the Select construct—not Update, Delete, or Insert. All modifications are made using method calls that obviously can’t be injected. (The only way injection is likely to work is if you write the code using the CodeDOM to dynamically compile user-input strings.)
Finally, my favorite part about all of this is that LINQ to SQL is easier to write and use than writing all of that ADO.NET plumbing. If you really want to master LINQ, pick up my book LINQ Unleashed for C# from Amazon.com and fine bookstores everywhere. Even if you only know VB, the explanations in my book are in English, the code is downloadable, and if you can read VB you can read C#; they’re like fraternal twins.
About the Author
Paul Kimmel is the VB Today columnist for www.codeguru.com and has written several books on object-oriented programming and .NET. Check out his upcoming book LINQ Unleashed for C#; order your copy today at Amazon.com. Paul Kimmel is an Application Architect for EDS. You may contact him for technology questions at pkimmel@softconcepts.com.
Copyright © by Paul T. Kimmel 2008. All Rights Reserved.