This is a detailed overview of using parallelism for achieving more computation in the same amount of elapsed time, covering both "shared memory" and "distributed memory" designs. It will concentrate on principles rather than details, to help attendees make the right decision and proceed in the right direction. It is aimed at users with significant programming experience who need to execute programs faster than they run on a single CPU core.
The course is also designed for programmers and system administrators who have some experience of using or supporting parallel codes, by describing the range of practical options, their strengths, weaknesses and other important issues.
You are expected to know how arrays are used in science, though only at a very basic level. For some cheap and simple references, see Matrix Prerequisites.
First half of the course
Second half of the course