A Study of Cloud Haskell With Applications to MapReduce
In this thesis, we developed a MapReduce application for performing word count in files. We compared the performance of the implementations done using Control.Parallel - built-in Haskell library for parallel/concurrent computations, Cloud Haskell - a distributed framework/DSL for Haskell and the Java Streams API (immutable datatypes). During profiling of our implementations, we found that for a lower number of words in input files the difference in speed of the Control.Parallel and Cloud Haskell implementations were negligible. So to get a better idea we profiled the implementations 12 times for each word size (excluded the highest and lowest values) in the input file and then took the averages of the values. But as we increased the number of words in the input files, we found that the time taken by Cloud Haskell compared to that of Control.Parallel was very less. Even the Distributed Cloud Haskell implementation was faster than the Control.Parallel implementation although it was a tad slower than its local counterpart. Also interestingly we found that the Java implementation using the Streams API was even faster than the local Cloud Haskell implementation. We also profiled the memory consumed by Control.Parallel and Cloud Haskell and found that Cloud Haskell makes more efficient use of memory as the word size increases. We varied the words from 300K to 3 Million in number and studied the performance of the four applications.