Category Publications

SDM’12 Best Paper

Congratulations to Dongryeol Lee (advisor: Alex Gray) on winning the Best Paper Award at SIAM Data Mining 2012! This paper describes a hybrid distributed/shared memory implementation of kernel summations for high-dimensional data.

  • Dongryeol Lee, Richard Vuduc, Alexander G. Gray. “A distributed kernel summation framework for general-dimension machine learning.” In Proc. SIAM Int’l. Conf. Data Mining, Anaheim, CA, USA, April 2012. Winner, Best Paper. [PDF]


This week, Cong is presenting his work on synthesizing program inverses at the 21st International Conference on Compiler Construction (CC) 2012, in Talinn, Estonia. A program inverse enables speculative parallelization, but also has applications in program debugging and software engineering (e.g., automatically creating “undo” functionality). His paper describes new compiler-based analysis and transformation techniques for the problem, specifically by introducing two new intermediate program representations: a value search graph, which extends single static assignment form to convert the synthesis problem into a graph search problem, and a route graph, which is a subset of the VSG that represents an inverse. Read the gory details below.


Rich is at the 2012 ACM SIGPLAN Symposium Principles and Practice of Parallel Programming (PPoPP) in New Orleans this week to show moral support for Jaewoong Sim and his advisor, Hyesoon Kim, as they present our work on a new performance model for understanding the performance of GPU programs.

  • Jaewoong Sim, Aniruddha Dasgupta, Hyesoon Kim, Richard Vuduc. “GPUPerf: A performance analysis framework for identifying performance benefits in GPGPU applications.” In Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), New Orleans, LA, USA, February 2012. [doi:10.1145/2145816.2145819]

CC’12 paper on program inversion

Congrats to Cong, George, Richard Fujimoto, and our collaborators at Livermore Lab (David Jefferson and Dan Quinlan) on their paper, “A new method for program inversion,” which will appear at the 2012 International Conference on Compiler Construction (CC’12) [www] this April. Roughly speaking, given a program P(In) ? Out, the inverse P-1(Out) reproduces In. This paper describes a new formalisms (value search graph and route graph) for generating such inverses. These formalisms directly express the amount of state needed to enable the inverse, thereby enabling algorithms that can try to minimize storage space. We have implemented this algorithm in Backstroke, which is built on top of the ROSE source-to-source compiler infrastructure [www].


C. Hou, G. Vulov, D. Jefferson, R. Fujimoto, R. Vuduc. “A new method for program inversion.” In Proc. Int’l. Conf. Compiler Construction (CC), Talinn, Estonia, March 2012. (to appear)

ICST’12 paper on Unicorn

Sangmin has a new paper on Unicorn, a tool to pinpoint the location of bugs in parallel and concurrent software programs. Unicorn generalizes his prior work on Falcon to include the case when a bug in a concurrent program involves more than variable (so-called multivariable atomicity violations). The Unicorn paper will appear in the 2012 International Conference on Software Testing, Verification, and Validation (ICST). Congratulations to Sangmin and his advisor, Mary Jean Harrold!

Citation: S. Park, R. Vuduc, and M. J. Harrold. “A unified approach for localizing non-deadlock concurrency bugs.” In Proc. IEEE Int’l. Conf. Software Testing, Verification, and Validation (ICST), Montréal, Canada, April 2012. (accepted)

PPoPP’12 paper

Congrats to Jaewoong, Aniruddha, and Hyesoon on their new paper!


J. Sim, A. Dasgupta, H. Kim, R. Vuduc. Understanding benefits of GPGPU code optimizations using an analytical performance model. In Proc. Principles and Practice of Parallel Programming (PPoPP), New Orleans, LA, USA, Feb. 2012. (accepted)

POPL’12: A type theory for probability density functions

Check out Sooraj’s and Ashish’s cool new paper, to appear at POPL’12. Abstract, preprint, and short blurb at Ashish’s website.


Sooraj Bhat, Ashish Agarwal, Richard Vuduc, Alexander Gray. A type theory for probability density functions. In Proc. Principles and Practice of Programming Languages (POPL), Philadelphia, PA, USA, Jan. 2012. (to appear)

IEEE Micro article on balance, GPUs, FFTs, and exascale

HotPar’11: Balance principles for algorithm-architecture co-design

This just in — we’ll be presenting a paper at this year’s HotPar workshop, on the topic of “co-design”:

  • K. Czechowski, C. Battaglino, C. McClanahan, A. Chandramowlishwaran, R. Vuduc. “Balance principles for algorithm-architecture co-design.” In Proc. USENIX Workshop on Hot Topics in Parallelism (HotPar), May 2011. (accepted)

Two papers @ IPDPS’10

The HPC Garage will have two papers at the upcoming IPDPS conference, to be held here in Atlanta, April 19–23. Congratulations to Aparna, who led the two papers, as well as our colleagues at Intel (Kath Knobe) and Lawrence Berkeley National Laboratory (Sam Williams and Lenny Oliker) for their significant contributions!

Here’s a brief description of the papers …

The first paper is a detailed performance evaluation of the Concurrent Collections (CnC) parallel programming model on some of the latest multicore systems. This paper demonstrates the extraordinary potential of the CnC model, and raises a number of questions about how the model should evolve for more complex programs. The second paper is the first extensive multicore optimization and tuning experiment for the kernel-independent fast multipole method (KIFMM). Surprisingly, this paper shows competitive performance from an Intel Nehalem-based multicore system relative to the GPU implementation we showcased in our best paper-nominated SC’09 submission (led by Lashuk and Biros).

  • A. Chandramowlishwaran, K. Knobe, and R. Vuduc. Performance evaluation of Concurrent Collections on high-performance multicore computing systems. In Proc. IEEE Int’l. Parallel and Distributed Processing Symp. (IPDPS), Atlanta, GA, USA, April 2010. (accepted).
  • A. Chandramowlishwaran, S. Williams, L. Oliker, I. Lashuk, G. Biros, and R. Vuduc. Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures. In Proc. IEEE Int’l. Parallel and Distributed Processing Symp. (IPDPS), Atlanta, GA, USA, April 2010. (accepted).