Saturday, May 10, 2014

Working with XLSX files

It’s not unusual when writing applications, especially business applications, then you will have a need to work with Microsoft Excel files. If you are working with .XLSX format files there is an excellent open source library called EPPlus that makes it’s easy to both read and write these files. In my next few posts I want to show how to use this library to handle various tasks. I will start with a discussion of how to use EPPlus to read from existing XLSX files.
First you will need to download the library. The easiest way to do this is to open your Visual Studio Project and in the Package Manager Console type:
Install-Package EPPlus
For the purposes of these posts I will be using version 3.1.3. At the time of this writing Version 4.0 is in the works but it is currently Beta, so I am going to stick with the older stable version for now.
I am going to be doing this demo in a Console application but you could also use the library in a Winforms or even a web application. First you need to import the following namespace:

using OfficeOpenXml;

Now let’s say that I know I have a number in cell B2 of the spreadsheet, here is the code to read that value:

static void Main(string[] args)
  using (var package = new ExcelPackage(new System.IO.FileInfo(@"c:\temp\demo.xlsx"))) {
       var workbook = package.Workbook;
       var worksheet = workbook.Worksheets[1];

       double data = (double)worksheet.Cells["B2"].Value;


First we need to open the Excel file we want to read. We do this by creating a new ExcelPackage object. ExcelPackage won’t take a filename as a parameter so you have to either pass in an existing FileStream or, as I did in this case, a FileInfo object. I have put everything in a using block to be sure the ExcelPackage gets disposed properly.

Once we have the package open, we get the Workbook object. Workbook contains a collection of all the Worksheets in the file. In this case we will assume there is only one worksheet so we will get the first one into worksheet variable. Note that the Worksheet starts at 1 not 0. Now that we have the worksheet we want to work with we can access any cell by using the Cells property. Here we are getting the Value in cell B2 which will return an object type that in this case we case to a double. In a real application you will want to do some error checking to be sure that cell really contains the type you are expecting.

Another way you can access cells is to use their column and row coordinate like this:

double data = (double)worksheet.Cells[2,2].Value;

Again, note that the rows and columns starting numbering at 1 not 0. This method is useful when you need to iterate over a series of cells. You can also iterate over a specific range of cells like this:

var range = worksheet.Cells["B2:B5"];
foreach (var cell in range)

Up until this point we have been using the Value property of the cell which get the raw value. There is also a property called Text which returns what would be displayed in the cell as a string. For example if a cell contained the value .5 and it was formatted as a percentage, Value would return the double value .5, but Text would return the string “50%”. You can also access the formula in a cell by using the Formula property which will return a string containing the formula, or an empty string if the cell does not have a formula.

One final way of querying cells is to use a LINQ query:

var query = (from c in worksheet.Cells["B2:B13"]
                  where c.Value is double
                  && (double)c.Value > 7
                  select c);
foreach (var c in query) Console.WriteLine(c.Address + " " + c.Value.ToString());

This piece of code searches for all cells in the range of B2 to B13 that have a value greater than seven and then prints that address (e.g. B2) and the value. Note that in the where clause we first check if the value is a double in case the cell contains something that can’t be cast to a double.

No comments: