This simple tutorial will show you how to create an application that counts the words in a string and also strips HTML tags if necessary.
Almost every programming language has an easy method of counting the words in a string, some use RegEx (Regular Expressions) and some use arrays to store the words of the string and then count those elements. With C# you can take both approaches and come up with a satisfying result, but in this tutorial we’re going to use the latter. However, we will still use a little big of RegEx for the tag stripping feature.
For those of you who are in a hurry, here is the C# method that does the counting and the stripping of tags (HTML, XHTML, XML, etc.) from the counting. But first, make sure you add the following using reference:
using System.Text.RegularExpressions;
public static int CountWords(string strText, bool stripTags)
{
// Declare and initialize the variable holding the number of counted words
int countedWords = 0;
// If the stripTags argument was passed as false
if (stripTags == false)
{
// Simply count the words in the string by splitting them wherever a space is found
countedWords = strText.Split(' ').Length;
}
else
{
// If the user wants to strip tags, first define the tag form
Regex tagMatch = new Regex("<[^>]+>");
// Replace the tags with an empty string so they are not considered in count
strText = tagMatch.Replace(strText, "");
// Count the words in the string by splitting them wherever a space is found
countedWords = strText.Split(' ').Length;
}
// Return the number of words that were counted
return countedWords;
}
Attached to this C# tutorial you can find a sample application that uses this method.
If you wish to learn how this simple application works, you can start a new Windows Application project in Visual Studio 2005 and add to it the minimum of a textbox where the text is being stored (txtContent), a CheckBox chkStripTags to define wether or not we want the tags stripped, a button btnCount where the counting method is called, and a textbox txtCount to show the number of words counted.
Now double click the button in Visual Studio’s form designer and you should get to its Click event. Inside it add the following call to the method:
txtCount.Text = CountWords(txtContent.Text, chkStripTags.Checked).ToString();
And of course, place the method I defined earlier in the same class.