Tag FMM

GPGPU-7

Jee is back from GPGPU-7, where he is presenting some new work on a hybrid CPU+GPU implementation of the kernel-independent fast multipole method, or FMM.

  • Jee Choi, Aparna Chandramowlishwaran, Kamesh Madduri, Richard Vuduc. “A CPU-GPU hybrid implementation and model-driven scheduling of the fast multipole method.” In Proc. 7th Wkshp. General-Purpose Processing using GPUs (GPGPU-7), Salt Lake City, UT, March 2014. [Paper|Slides]
Jee & Naila @ GPGPU-7

Representing Georgia Tech at GPGPU-7, Jee and Naila (from a different research lab)

SHAX-C @ KAUST

Beacon @ KAUST

Aparna, Jee, and Rich are at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia this weekend. Well, it’s a work-week for the Saudis. We are presenting and schmoozing with a full slate of amazing attendees of the [Scalable Hierarchical Algorithms for eXtreme Computing] (SHAX-C) workshop. UPDATE: Rich’s talk slides are available [here].

Two papers @ IPDPS’10

The HPC Garage will have two papers at the upcoming IPDPS conference, to be held here in Atlanta, April 19–23. Congratulations to Aparna, who led the two papers, as well as our colleagues at Intel (Kath Knobe) and Lawrence Berkeley National Laboratory (Sam Williams and Lenny Oliker) for their significant contributions!

Here’s a brief description of the papers …

The first paper is a detailed performance evaluation of the Concurrent Collections (CnC) parallel programming model on some of the latest multicore systems. This paper demonstrates the extraordinary potential of the CnC model, and raises a number of questions about how the model should evolve for more complex programs. The second paper is the first extensive multicore optimization and tuning experiment for the kernel-independent fast multipole method (KIFMM). Surprisingly, this paper shows competitive performance from an Intel Nehalem-based multicore system relative to the GPU implementation we showcased in our best paper-nominated SC’09 submission (led by Lashuk and Biros).

  • A. Chandramowlishwaran, K. Knobe, and R. Vuduc. Performance evaluation of Concurrent Collections on high-performance multicore computing systems. In Proc. IEEE Int’l. Parallel and Distributed Processing Symp. (IPDPS), Atlanta, GA, USA, April 2010. (accepted).
  • A. Chandramowlishwaran, S. Williams, L. Oliker, I. Lashuk, G. Biros, and R. Vuduc. Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures. In Proc. IEEE Int’l. Parallel and Distributed Processing Symp. (IPDPS), Atlanta, GA, USA, April 2010. (accepted).