Nested Data Parallelism in Haskell

[More unedited notes from OSCON]

Nested Data Parallelism in Haskell by
Simon Peyton-Jones

Appeared in the mid-1990’s, but not yet available in a mainstream language.

cf. Flat data parallel (MPI, map/reduce, *C, OpenMP)
-> chunking mechanism (not “N” threads)
-> great for distributed memory systems (MPI)
“do X to A[i]”
X is sequential

in NDP, X can be parallel
-> recursive and unbalanced (e.g., QuickSort)
-> still good cost model, but hard to implement
-> can’t statically predict: do we need millions of threads?

Programmers life is easier, but compiler work is harders
-> NESL; flattens parallelism
-> parallelizing array comphrehensions; no fork/join — or locks!
but, still communication
e.g., sparse vectors -> sparse matrixes

i.e., F-script like array language carefully constrained for “easy” compiler parallelization

Tricky part: ill-balanced!
Solution:
-> flatten sparse vector into a single huge (1-D) vector + index array
-> Guy Blelloch showed us how to do this systematically
-> price is extra bookeeping
-> works best if data always encoded in this manner

Also: general, aggressive fusion (e.g., multiply + add)
-> only possible in functional language with no side effects!

NESL-> Haskell (compiled high-order vs. interpreted first-order)

Key technologies:

flattening
non-parametric repsresentations
chunking
aggressive fusion

Plus in a “mainstream” language, with libraries often useful for other things.

About 30% slower on uniprocessor, can be faster on > 2processors.

“Alpha” in top of tree of ghc, usable (bleeding-edge) in a few months.

Technorati Tags: haskell, parallelism

Share this:

Leave a comment Cancel reply