C# 4.0 Parallel features performance comparison

In order to exploit the power of multicore processor, the fourth version of the framework ships the Parallel features, easiest alternatives for multithreaded operations execution.

With my (old) ASUS PRO60E, a Core 2 Duo T8100 2.1 GHz with 4GB of ram and Win7 x64, I made some tests to find out the performance differences between the two. The tests have been done using the tools shipped with Visual Studio 2010: MsTest, Performance wizard and a lot of cool inspection windows like the Parallel Tasks View.

The solution is composed by a class library containing the core of the logic (and the parallel stuff as well), a Tests library and a Windows application using the main library. The purpose of the application is to solve anagrams by returning a list of possible english words based on a user input. The Anagram class receive an input string, it calculate and store all the possible words composed by the combination of the chars of the input and then cross this list with second one rapresenting the english dictionary to find the matches. The iteration over each one of the possible words and the subsequent task of matching the word in the dictionary is done either with and without the help of the Parallel features. The access to the dictionary as well can be done accessing and reading each line with a stream or by buffering it once in a List<string>, anyway in the post i will cover the second option only. Below a list of the methods under test.

  • public IEnumerable<string> GetPossibleAnagramsParallelForeachThreadLocal(IEnumerable<string> permutationsMatrix, string file, bool bufferFile)
  • public IEnumerable<string> GetPossibleAnagramsForeach(IEnumerable<string> permutationsMatrix, string file, bool bufferFile)
  • public IEnumerable<string> GetPossibleAnagramsParallelForeach(IEnumerable<string> permutationsMatrix, string file, bool bufferFile)
  • public IEnumerable<string> GetPossibleAnagramsLinq(IEnumerable<string> permutationsMatrix, string file, bool bufferFile)
  • public IEnumerable<string> GetPossibleAnagramsPLinq(IEnumerable<string> permutationsMatrix, string file, bool bufferFile)

Let’s see the results:

paralleldurations

The gap between PLinq and Parallel is substantial.

Despite the purpose of both is the same, going parallel, PLinq is more a way to execute query in parallel (and it is achievable by simply adding the AsParallel() method to our query) while the Parallel class is more indicated for independent operations, it replicate the invocations of the Foreach statement into multiple threads. In cases of shared resources between threads it is a good pratice to use the Interlock class or a simple lock for synchronization, the source code contains an example of it.

Below the parallel methods, the CheckPermutationExistence verify if the permutation is contained in the dictionary, if yes it returns true:

public IEnumerable<string> GetPossibleAnagramsPLinq(IEnumerable<string> permutationsMatrix, string file, bool bufferFile)
       {
           IEnumerable<string> possibleAnagrams = permutationsMatrix
                                  .AsParallel().Where(x => CheckPermutationExistence(x, file, bufferFile));
           string[] arr = possibleAnagrams.ToArray();

           return possibleAnagrams;
       }

public IEnumerable<string> GetPossibleAnagramsParallelForeach(IEnumerable<string> permutationsMatrix, string file, bool bufferFile)
        {
            List<string> possibleAnagrams = new List<string>();
            Parallel.ForEach(permutationsMatrix, result =>
                                                     {
                                                         if (CheckPermutationExistence(result, file, bufferFile))
                                                            possibleAnagrams.Add(result);
                                                     });

            return possibleAnagrams;
        }

The main difference between PLinq and Parallel consist in the number of threads the hosting process is allowed to spawn, while PLinq require an exact number to go parallel (it can be specified by WithDegreeOfParallelism() ) the Parallel class seems more generous in term of resource consumption. Using ParallelOptions.MaxDegreeOfParallelism we can specify the maximum number of thread we want to use and in the meanwhile, even if the number is not reached, the Foreach will use any thread becominq available. The result can be seen in the Parallel Task window in Visual Studio 2010. PLinq first.imageimage

By default PLinq will try to execute the maximum number of tasks based on the number of processors, this is calculated with Math.Min(ProcessorCount, MAX_SUPPORTED_DOP) where MAX_SUPPORTED_DOP is 64. This explain the 2 working threads. By setting the degree of parallelism to 5

IEnumerable<string> possibleAnagrams = permutationsMatrix
.AsParallel().WithDegreeOfParallelism(5).Where(x => CheckPermutationExistence(x, file, bufferFile));

Anyway, it is always better to delegate the framework for tasks regarding resources management. Changing the parallelism value each time the hardware changes it is definetly unelegant. Much easier to rely on the Parallel.Foreach management of the threads.

image

The solution contains also a small windows application useful to solve word anagrams 🙂

image




No Comments


You can leave the first : )



Leave a Reply

Your email address will not be published. Required fields are marked *