Category Publications

SC’10 papers

We have two papers that will appear at this year’s ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis, a.k.a., “Supercomputing” or “SC”, to be held in New Orleans in November, including one by Aparna and another that is a Gordon Bell Prize Finalist!

The first paper is Aparna’s major improvement to the within-node multicore scalability of our fast multipole method (FMM), introduced in one of our IPDPS papers earlier this year. This paper includes detailed analyses that we imagine will be useful to both other multicore software practitioners and performance tool builders.

The second is a Gordon Bell Prize Finalist, led by Abtin Rahimian, Prof. George Biros, and the CSELa lab at GT, with an assist from HPC Garagers, Aparna, Aashay, Logan, and Rich. This paper presents MoBo (“Moving Boundaries”), a blood flow modeling code that performs the largest and most physically accurate computer-based simulation of red blood cells flowing in plasma that has every been run to date. It achieves 0.7 Petaflop/s on over 200k cores, and is portable to hybrid distributed memory multicore+GPU systems. It uses the scalable and tuned FMM as one of its two major computational components.

  • A. Chandramowlishwaran, K. Madduri, and R. Vuduc. “Diagnosis, tuning, and redesign for multicore performance: A case study of the fast multipole method.” In Proc. ACM/IEEE Int’l. Conf. HPC, Networking, Storage, and Analysis (SC), New Orleans, LA, USA, Nov. 2010. (to appear) [www | PDF]
  •  A. Rahimian, I. Lashuk, A. Chandramowlishwaran, D. Malhotra, L. Moon, R. Sampath, A. Shringarpure, S. Veerapaneni, J. Vetter, R. Vuduc, D. Zorin, G. Biros. “Petascale direct numerical simulation of blood flow on 200k cores and heterogeneous architectures.” In Proc. ACM/IEEE Int’l. Conf. HPC, Networking, Storage, and Analysis (SC), New Orleans, LA, USA, Nov. 2010. (to appear) [www]

Best paper @ IPDPS’10!

We just got word that Aparna’s, Kath Knobe’s (Intel), and my paper will get the Best Paper Award in the software track at the IEEE Int’l. Parallel and Distributed Processing Symposium (IPDPS), which will be held here in Atlanta in April. The paper is the first performance evaluation of a relatively new parallel programming model, called Concurrent Collections (CnC), on state-of-the-art multicore systems. It shows of CnC’s extraordinary potential, using examples from dense linear algebra.

Congratulations to the team. We owe many thanks to our friends and colleagues at Intel (Frank Schlimbach, Geoff Lowney, Shin Lee, and CK Luk) and Rice University (Vivek Sarkar and Zoran Budimli?) for their feedback, encouragement, and support during the course of this work.

ICSE’10 paper accepted

Sangmin Park‘s paper on fault localization applied to concurrent programs has been accepted for publication! The paper will appear at the ACM/IEEE International Conference on Software Engineering (ICSE), to be held in Cape Town, South Africa in May 2010. This paper describes a novel technique for pinpointing the cause of program failure in multithreaded Java software. Sangmin designed and implemented this approach in a prototype tool called “Falcon.” This effort is joint between Prof. Mary Jean Harrold(Sangmin’s primary advisor) and The HPC Garage.

PPoPP’10 paper on GPU SpMV

Jee’s and Amik’s paper on autotuning sparse matrix-vector multiply for GPUs will appear at PPoPP 2010 in Bangalore, India. Great work, gang!

  • J. W. Choi, A. Singh, R. Vuduc. “Model-driven autotuning of sparse matrix-vector multiply on GPUs.” In Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), Bangalore, India, January 2010. (accepted)

Top ParCo’09 Q1 download

Our co-authored paper on multicore optimizations for sparse matrix-vector multiply was the “hottest” (most downloaded) Journal of Parallel Computing (ParCo) article in Q1 2009.

[ParCo Top 25 List, Q1 2009] S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, J. Demmel. “Optimization of sparse matrix-vector multiplication on emerging multicore platforms.” J. Par. Comput. (ParCo)35(3), pp. 178—194, March 2009. doi:10.1016/j.parco.2008.12.006

ICS’09 talk & paper.

Sundar is speaking today at the ACM International Conference on Supercomputing (ICS) on his CPU/GPU stencil kernel work.

Citation & slides: Sundaresan Venkatasubramanian and Richard Vuduc. “Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU platforms.” In Proc. ACM Int’l. Conf. Supercomputing (ICS), Yorktown Heights, NY, USA, June 2009. [Paper | Slides]

When in Rome…

Seunghwa Kang, a friend of the Garage, is off to Rome, Italy, IEEE International Parallel and Distributed Processing Symposium (IPDPS) to give a talk on his paper, “Understanding the design trade-offs among current multicore platforms for numerical computations.”

Citation & PDF: Seunghwa Kang, David A. Bader, and Richard Vuduc. “Understanding the design trade-offs among current multicore platforms for numerical computations.” In Proc. IEEE Int’l. Parallel and Distributed Processing Symp. (IPDPS), Rome, Italy, May 2009. [PDF]