Sunday, August 7, 2016

Handling Whitespace

In this post I want to offer some tips for a very simply issue, but one that can cause you a lot of headaches if you don’t deal with it.

Let’s say we are validating user input and we need to be sure the user has entered something in a specific field. This data may be coming from a form, or XML file, etc., so for demonstration purposes I will just use string.

  string s1 = "";
  string s2 = null;
  string s3 = " ";
  string s4 = "\t";

  if (s1 == "") Console.WriteLine("s1 not entered");
  if (s2 == "") Console.WriteLine("s2 not entered");
  if (s3 == "") Console.WriteLine("s3 not entered");
  if (s4 == "") Console.WriteLine("s4 not entered");
If I am checking for user input in a field I would want all four of these to be considered invalid, but if I run the program, only the first one will be found, the other three will look ok. This happens because we are comparing against a literal empty string which is not the same as a null, a space, or a tab. Here is another way to do this check...
 
  if (string.IsNullOrEmpty(s1)) Console.WriteLine("s1 not entered");
  if (string.IsNullOrEmpty(s2)) Console.WriteLine("s2 not entered");
  if (string.IsNullOrEmpty(s3)) Console.WriteLine("s3 not entered");
  if (string.IsNullOrEmpty(s4)) Console.WriteLine("s4 not entered");

The string class provides a function call IsNullOrEmpty that can help with the second case. This function will return true if the the string is empty or it’s a null. This is very useful when you are about to call a function on a string, for example SubString(), because this would throw an exception if the string is null. This is a step in the right direction, but it still doesn’t help with the last two cases since a string with a space or tab aren’t considered empty.  Let’s look at another function on the string class:
 
if (string.IsNullOrWhiteSpace(s1)) Console.WriteLine("s1 not entered");
if (string.IsNullOrWhiteSpace(s2)) Console.WriteLine("s2 not entered");
if (string.IsNullOrWhiteSpace(s3)) Console.WriteLine("s3 not entered");
if (string.IsNullOrWhiteSpace(s4)) Console.WriteLine("s4 not entered");
If you run this code it will recognize all four strings as invalid. IsNullOrWhiteSpace works similar to IsNullOrEmpty but it also looks for any combination of white space, spaces, tabs, carriage returns, etc.
It’s a good idea to avoid comparing a string to and empty string, and better to use one of these functions instead.
Even when you are not just looking for whitespace or not it can give you trouble. Let’s look at this example:
  
string s1 = "OK";
string s2 = " OK ";
string s3 = "\tOK\t";

if (s1 == "OK") Console.WriteLine("s1 is OK");
if (s2 == "OK") Console.WriteLine("s2 is OK");
if (s3 == "OK") Console.WriteLine("s3 is OK");
If you run this, only the first case if found to be OK because the others two have whitespace before and after the string. We can solve this by using the Trim function:
  
if (!string.IsNullOrEmpty(s1) && s1.Trim() == "OK") Console.WriteLine("s1 not entered");
if (!string.IsNullOrEmpty(s2) && s2.Trim() == "OK") Console.WriteLine("s2 not entered");
if (!string.IsNullOrEmpty(s3) && s3.Trim() == "OK") Console.WriteLine("s3 not entered");

Since Trim is an instance method we need to first be sure s1 isn’t null so I use the IsNullOrEmpty function for this. Now that we know the string isn’t null we can call Trim() on it which removes any leading or trailing whitespace. Now all three of these will find OK.