Nested Data Parallelism in Haskell
July 25, 2007 § Leave a comment
[More unedited notes from OSCON]
Appeared in the mid-1990′s, but not yet available in a mainstream language.
cf. Flat data parallel (MPI, map/reduce, *C, OpenMP)
-> chunking mechanism (not “N” threads)
-> great for distributed memory systems (MPI)
“do X to A[i]“
X is sequential
in NDP, X can be parallel
-> recursive and unbalanced (e.g., QuickSort)
-> still good cost model, but hard to implement
-> can’t statically predict: do we need millions of threads?
Programmers life is easier, but compiler work is harders
-> NESL; flattens parallelism
-> parallelizing array comphrehensions; no fork/join — or locks!
but, still communication
e.g., sparse vectors -> sparse matrixes
Tricky part: ill-balanced!
-> flatten sparse vector into a single huge (1-D) vector + index array
-> Guy Blelloch showed us how to do this systematically
-> price is extra bookeeping
-> works best if data always encoded in this manner
Also: general, aggressive fusion (e.g., multiply + add)
-> only possible in functional language with no side effects!
NESL-> Haskell (compiled high-order vs. interpreted first-order)
- non-parametric repsresentations
- aggressive fusion
Plus in a “mainstream” language, with libraries often useful for other things.
About 30% slower on uniprocessor, can be faster on > 2processors.
“Alpha” in top of tree of ghc, usable (bleeding-edge) in a few months.