Papers

For publication news, look here: http://hpcgarage.org/wp/?cat=9

Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.


Jee Choi, Marat Dukhan, Xing Liu, and Richard Vuduc. Algorithmic time, energy, and power on candidate HPC building blocks. In Proc. IEEE Int’l. Parallel and Distributed Processing Symposium (IPDPS), Phoenix, AZ, USA, May 2014. (to appear).

Jee Choi, Aparna Chandramowlishwaran, Kamesh Madduri, and Richard Vuduc. A CPU-GPU hybrid implementation and model-driven scheduling of the fast multipole method. In Proc. 7th Wkshp. General-Purpose Programming using GPUs (GPGPU-7), Salt Lake City, UT, USA, March 2014. PDF | Slides

Marat Dukhan. PeachPy: A Python framework for developing high-performance assembly kernels. In Proc. Workshop on Python for High-Performance and Scientific Computing (PyHPC), co-located with SC’13, Denver, CO, USA, November 2013. PDF | Slides

Piyush Sao and Richard Vuduc. Self-stabilizing iterative solvers. In Proc. Workshop on Latest Advances in Scalable Algorithms for Large-scale Systems (ScalA), co-located with SC’13, Denver, CO, USA, November 2013. * PDF @ ACM DL

Marat Dukhan and Richard Vuduc. Methods for high-throughput computation of elementary functions. In Proc. Int’l. Conf. Parallel Processing and Applied Mathematics (PPAM), Warsaw, Poland, September 2013. (Camera-ready pending)

Sangmin Park, Mary Jean Harrold, and Richard Vuduc. Griffin: Grouping suspicious memory-access patterns to improve understanding of concurrency bugs. In Proc. Int’l. Symp. Software Testing and Analysis (ISSTA), Lugano, Switzerland, July 2013. PDF

Jee Choi and Richard Vuduc. A roofline model of energy. In Proc. IEEE Int’l. Parallel and Distributed Processing Symp. (IPDPS), Boston, MA, USA, May 2013. This paper is the short peer-reviewed conference version of our tech report, https://smartech.gatech.edu/xmlui/handle/1853/45737.
* PDF

Kenneth Czechowski and Richard Vuduc. A theoretical framework for algorithm-architecture co-design. In Proc. IEEE Int’l. Parallel and Distributed Processing Symp. (IPDPS), Boston, MA, USA, May 2013. (to appear).
* PDF

Jee Whan Choi and Richard Vuduc. A roofline model of energy. Technical Report GT-CSE-12-01, Georgia Institute of Technology, School of Computational Science and Engineering, Atlanta, GA, USA, December 2012.https://smartech.gatech.edu/xmlui/handle/1853/45737.
* BibTeX

Hyesoon Kim, Richard Vuduc, Sara Baghsorkhi, Jee Choi, and Wen mei Hwu. Performance analysis and tuning for general purpose graphics processing units (GPGPU). Synthesis Lectures on Computer Architecture. Morgan & Claypool Publishers, San Rafael, CA, USA, November 2012. doi:10.2200/S00451ED1V01Y201209CAC020
DOIBibTeX

William B. March, Kenneth Czechowski, Marat Dukhan, Thomas Benson, Dongryeol Lee, Andrew J. Connolly, Richard Vuduc, Edmond Chow, and Alexander G. Gray. Optimizing the computation of n-point correlations on large-scale astronomical data. InProc. ACM/IEEE Conf. Supercomputing (SC), November 2012. http://conferences.computer.org/sc/2012/papers/1000a033.pdf
* BibTeX

Ilya Lashuk, Aparna Chandramowlishwaran, Harper Langston, Tuan-Ahn Nguyen, Rahul Sampath, Aashay Shringarpure, Richard Vuduc, Lexing Ying, Denis Zorin, and George Biros. A massively parallel adaptive fast multipole method on heterogeneous architectures. Communications of the ACM (CACM), 55(5):101-109, May 2012. doi:10.1145/2160718.2160740Extends conference version: http://doi.acm.org/10.1145/1654059.1654118.
* PDFDOIBibTeX, Topics: n-bodyGPGPUparallel algorithmsperformance optimizationperformance analysisfast multipole methodperformance evaluation

Cong Hou, Daniel Quinlan, David Jefferson, Richard Fujimoto, and Richard Vuduc. Loop synthesis for program inversion. InProc. 4th Wkshp. Reversible Computation, Copenhagen, Denmark, July 2012. http://www.reversible-computation.org/2012/cms.
* TalkPDFBibTeX

Aparna Chandramowlishwaran, Jee Whan Choi, Kamesh Madduri, and Richard Vuduc. Towards a communication optimal fast multipole method and its implications for exascale. In Proc. ACM Symp. Parallel Algorithms and Architectures (SPAA), Pittsburgh, PA, USA, June 2012. Brief announcement. doi:10.1145/2312005.2312039.
PDFDOIBibTeX, Topics: performance analysisperformance modelingperformance optimizationfast multipole method;exascaleco-designparallel algorithmsn-body

Kenneth Czechowski, Chris McClanahan, Casey Battaglino, Kartik Iyer, P.-K. Yeung, and Richard Vuduc. On the communication complexity of 3D FFTs and its implications for exascale. In Proc. ACM Int’l. Conf. Supercomputing (ICS), San Servolo Island, Venice, Italy, June 2012. doi:10.1145/2304576.2304604.
* TalkPDFDOIBibTeX, Topics: FFTexascaleperformance modelingco-design

Richard Vuduc, Kenneth Czechowski, Aparna Chandramowlishwaran, and Jee Whan Choi.
Courses in high-performance computing for scientists and engineers.
In Proc. NSF/TCPP Wkshp. Parallel and Distributed Computing Education (EduPar), co-located with IPDPS’12, Shanghai, China, May 2012.
TalkPDFBibTeX, Topic: education

Cong Hou, George Vulov, Daniel Quinlan, David Jefferson, Richard Fujimoto, and Richard Vuduc. A new method for program inversion. In Proc. Int’l. Conf. Compiler Construction (CC), Tallinn, Estonia, March 2012.http://www.cc.gatech.edu/~chou3/ProgramInversion.pdf.

TalkBibTeX, Topics: program inversioncompilersreverse computationparallel discrete-event simulation

Sangmin Park, Richard Vuduc, and Mary Jean Harrold.
A unified approach for localizing non-deadlock concurrency bugs.
In Proc. IEEE In’tl. Conf. Software Testing, Verification, and
Validation (ICST)
, Montréal, Canada, April 2012.
* BibTeX

Dongryeol Lee, Richard Vuduc, and Alexander G. Gray.
A hybrid MPI/OpenMP kernel summation framework for large scale
data analysis.
In Proc. SIAM Int’l. Conf. Data Mining (SDM), Anaheim, CA, USA,
April 2012.
Winner, Best Paper
* BibTeX

Jaewoong Sim, Aniruddha Dasgputa, Hyesoon Kim, and Richard Vuduc.
GPUPerf: A performance analysis framework for identifying performance benefits in GPGPU applications.
In Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), New Orleans, LA, USA, February 2012.
(to appear).
* BibTeX

Jaekyu Lee, Hyesoon Kim, and Richard Vuduc.
When prefetching works, when it doesn’t, and why.
ACM Trans. Architecture and Code Optimization (TACO), 2012.
 * BibTeX, Topics: microarchitecture; prefetching

Sooraj Bhat, Ashish Agarwal, Richard Vuduc, and Alexander Gray.
A type theory for probability density functions.
In ACM SIGACT-SIGPLAN Symp. Principles of Programming Languages (POPL 2012), Philadelpha, PA, USA, January 2012.
* PDF, BibTeX

Kenneth Czechowski, Chris McClanahan, Casey Battaglino, Kartik Iyer, P.-K. Yeung, and Richard Vuduc.
Prospects for scalable 3D FFTs on heterogeneous exascale systems.
In In Proc. ACM/IEEE Conf. Supercomputing (SC), November 2011.
(poster); extended version available as Georgia Tech report GT-CSE-11-02.
* BibTeX

Richard Vuduc and Kenneth Czechowski.
What GPU computing means for high-end systems.
IEEE Micro, 31(4):74-78, July/August 2011.
http://dx.doi.org/10.1109/MM.2011.78.
* DOI, BibTeX

Raghul Gunasekaran, David Dillow, Galen Shipman, Richard Vuduc, and Edmond Chow.
Characterizing application runtime behavior from system logs and metrics.
In Proc. Int’l. Wkshp. Characterizing Applications for Heterogeneous Exascale Systems (CACHES), Tucson, AZ, USA, June 2011.
* BibTeX

Kenneth Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, and Richard Vuduc.
Balance principles for algorithm-architecture co-design.
In Proc. USENIX Wkshp. Hot Topics in Parallelism (HotPar), Berkeley, CA, USA, May 2011.
* Talk, PDF, BibTeX, Topics: parallel algorithms; architecture; co-design; performance analysis

Sam Williams, Nathan Bell, Jee Choi, Michael Garland, Leonid Oliker, and Richard Vuduc.
Sparse matrix vector multiplication on multicore and accelerator systems.
In Jakub Kurzak, David A. Bader, and Jack Dongarra, editors, Scientific Computing with Multicore Processors and Accelerators. CRC Press, 2011.
* BibTeX

Jaekyu Lee, Nagesh B. Lakshminarayana, Hyesoon Kim, and Richard Vuduc.
Hardware and software prefetching mechanisms for GPGPU applications.
In Proc. IEEE/ACM Int’l. Symp. Microarchitecture (MICRO), Atlanta, GA, USA, December 2010.
* PDF, BibTeX, Topics: GPGPU; performance evaluation; performance programming; prefetching; codesign

Abtin Rahimian, Ilya Lashuk, Aparna Chandramowlishwaran, Dhairya Malhotra, Logan Moon, Rahul Sampath, Aashay Shringarpure, Shravan Veerapaneni, Jeffrey Vetter, Richard Vuduc, Denis Zorin, and George Biros.
Petascale direct numerical simulation of blood flow on 200k cores and heterogeneous architectures.
In Proc. ACM/IEEE Conf. Supercomputing (SC), New Orleans, LA, USA, November 2010.
Winner, Gordon Bell Prize.
* PDF, DOI, BibTeX, Topics: performance evaluation; performance programming; GPGPU; multicore; MPI; parallel algorithms; heterogeneous architecturesparallel algorithms

Aparna Chandramowlishwaran, Kamesh Madduri, and Richard Vuduc.
Diagnosis, tuning, and redesign for multicore performance: A case study of the fast multipole method.
In Proc. ACM/IEEE Conf. Supercomputing (SC), New Orleans, LA, USA, November 2010.
* PDF, DOI, BibTeX, Topics: multicore; n-body; performance analysis; performance optimization

Richard Vuduc, Aparna Chandramowlishwaran, Jee Whan Choi, Murat Efe Guney, and Aashay Shringarpure.
On the limits of GPU acceleration.
In Proc. USENIX Wkshp. Hot Topics in Parallelism (HotPar), Berkeley, CA, USA, June 2010.
* PDF, BibTeX, Topics: multicore; GPGPU; sparse linear algebra; n-body

Sooraj Bhat, Ashish Agarwal, Alexander Gray, and Richard Vuduc.
Toward interactive statistical modeling.
Procedia Computer Science, 1(1):1829-1838, May-June 2010.
Proc. Int’l. Conf. Computational Science (ICCS), Wkshp. Automated Program Generation for Computational Science (APGCS).
* PDF, DOI, BibTeX, Topics: machine learning; algorithm derivation; interactive modeling; type theory

Aparna Chandramowlishwaran, Kathleen Knobe, and Richard Vuduc.
Performance evaluation of Concurrent Collections on high-performance multicore computing systems.
Technical Report GT-CSE-10-01, Georgia Institute of Technology, Atlanta, GA, USA, February 2010.
* BibTeX, Topics: parallel programming models; dense linear algebra; multicore

Aparna Chandramowlishwaran, Samuel Williams, Leonid Oliker, Ilya Lashuk, George Biros, and Richard Vuduc.
Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures.
In Proc. IEEE Int’l. Parallel and Distributed Processing Symp. (IPDPS), Atlanta, GA, USA, April 2010.
* PDF, BibTeX, Topics: n-body; multicore; performance analysis; performance optimization

Aparna Chandramowlishwaran, Kathleen Knobe, and Richard Vuduc.
Performance evaluation of Concurrent Collections on high-performance multicore computing systems.
In Proc. IEEE Int’l. Parallel and Distributed Processing Symp. (IPDPS), Atlanta, GA, USA, April 2010.
Winner, Best Paper (software track).
* PDF, DOI, BibTeX, Topics: CnC; parallel programming models; dense linear algebra; multicore

Aparna Chandramowlishwaran, Kathleen Knobe, and Richard Vuduc.
Applying the Concurrent Collections programming model to asynchronous parallel dense linear algebra.
In Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), Bangalore, India, January 2010. (poster).
* PDF, DOI, BibTeX, Topics: parallel programming models; dense linear algebra; multicore

Sangmin Park, Richard W. Vuduc, and Mary Jean Harrold.
FALCON: Fault localization for concurrent programs.
In Proc. ACM/IEEE Int’l. Conf. Software Eng., Cape Town, South Africa, May 2010.
* PDF, DOI, BibTeX, Topics: testing; debugging; fault-localization; concurrency

Jee Whan Choi, Amik Singh, and Richard W. Vuduc.
Model-driven autotuning of sparse matrix-vector multiply on GPUs.
In Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), Bangalore, India, January 2010.
* PDF, DOI, BibTeX, Topics: sparse linear algebra; autotuning; GPGPU; performance modeling; performance optimization

Chunhua Liao, Daniel J. Quinlan, Richard Vuduc, and Thomas Panas.
Effective source-to-source outlining to support whole program empirical optimization.
In Proc. Int’l. Wkshp. Languages and Compilers for Parallel Computing (LCPC), volume LNCS, Newark, DE, USA, October 2009.
* DOIBibTeX, Topics: compilersautotuningoutlining

Nitin Arora, Ryan P. Russell, and Richard W. Vuduc.
Fast sensitivity computations for numerical optimizations.
In Proc. AAS/AIAA Astrodynamics Specialist Conference, AAS 09-435, Pittsburgh, PA, USA, August 2009.
* PDF, BibTeX, Topics: numerical optimization; sensitivity; GPGPU; astrodynamics

Manisha Gajbe, Andrew Canning, John Shalf, Lin-Wang Wang, Harvey Wasserman, and Richard Vuduc.
Auto-tuning distributed-memory 3-dimensional fast Fourier transforms on the Cray XT4.
In Proc. Cray User’s Group (CUG) Meeting, Atlanta, GA, USA, May 2009.
* BibTeX, Topics: autotuning; performance analysis; performance optimization; FFT

Sundaresan Venkatasubramanian and Richard W. Vuduc.
Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU platforms.
In Proc. ACM Int’l. Conf. Supercomputing (ICS), New York, NY, USA, June 2009.
* PDF, DOI, BibTeX, Topics: asynchronous iteration; GPGPU; heterogeneous architectures; performance optimization

Nitin Arora, Aashay Shringarpure, and Richard Vuduc.
Direct n-body kernels for multicore platforms.
In Proc. Int’l. Conf. Parallel Processing (ICPP), Vienna, Austria, September 2009.
* PDF, DOI, BibTeX, Topics: multicore; n-body; performance analysis; performance optimization

Ilya Lashuk, Aparna Chandramowlishwaran, Harper Langston, Tuan-Anh Nguyen, Rahul Sampath, Aashay Shringarpure, Richard Vuduc, Lexing Ying, Denis Zorin, and George Biros.
A massively parallel adaptive fast multipole method on heterogeneous architectures.
In Proc. ACM/IEEE Conf. Supercomputing (SC), Portland, OR, USA, November 2009.
Finalist, Best Paper.
* PDF, DOI, BibTeX, Topics: n-body; multicore; GPGPU; MPI; parallel algorithms

Seunghwa Kang, David Bader, and Richard Vuduc.
Understanding the design trade-offs among current multicore systems for numerical computations.
In Proc. IEEE Int’l. Parallel and Distributed Processing Symp. (IPDPS), Rome, Italy, May 2009.
* PDF, DOI, BibTeX, Topics: statistical models; n-body; multicore; performance analysis; performance optimization

Sam Williams, Richard Vuduc, Leonid Oliker, John Shalf, Katherine Yelick, and James Demmel.
Optimizing sparse matrix-vector multiply on emerging multicore platforms.
Parallel Computing (ParCo), 35(3):178-194, March 2009.
http://dx.doi.org/10.1016/j.parco.2008.12.006.
Extends conference version:
http://dx.doi.org/10.1145/1362622.1362674
.
* PDF, DOI, BibTeX, Topics: sparse linear algebra; multicore; autotuning

Aparna Chandramowlishwaran, Abhinav Karhu, Ketan Umare, and Richard Vuduc.
Numerical algorithms with tunable parallelism.
In Proc.Wkshp. Software Tools for Multicore Systems (STMCS), at IEEE/ACM Int’l. Symp. Code Generation and Optimization (CGO), Boston, MA, USA, April 2008.
* PDF, BibTeX, Topics: autotuning; asynchronous variational integration; asynchronous iteration

Thomas Panas, Dan Quinlan, and Richard Vuduc.
Tool support for inspecting the code quality of HPC applications.
In Proc. Wkshp. Software Eng. for High-Performance Computing Applications (SE-HPC), at ACM/IEEE Int’l. Conf. Software Eng. (ICSE), Minneapolis, MN, USA, May 2007.
* PDF, DOI, BibTeX, Topics: program visualization; software engineering

Thomas Panas, Dan Quinlan, and Richard Vuduc.
Analyzing and visualizing whole program architectures.
In Proc. Wkshp. Aerospace Software Engineering (AeroSE), at ACM/IEEE Int’l. Conf. Software Eng. (ICSE), Minneapolis, MN, USA, May 2007.
Also: Lawrence Livermore National Laboratory Technical Report UCRL-PROC-231453.
* PDF, BibTeX, Topics: program visualization; software engineering

Dan Quinlan, Richard Vuduc, and Ghassan Misherghi.
Techniques for specifying bug patterns.
In Proc. ACM Wkshp. Parallel and Distributed Systems: Testing and Debugging (PADTAD), at Int’l. Symp. Software Testing and Analysis (ISSTA), Portland, ME, USA, July 2007.
Winner, Best Paper.
 * PDF, DOI, BibTeX, Topics: software security; compilers; debugging

Sam Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, and James Demmel.
Optimization of sparse matrix-vector multiplication on emerging multicore platforms.
In Proc. ACM/IEEE Conf. Supercomputing (SC), 2007.
* DOI, BibTeX, Topics: sparse linear algebra; autotuning; multicore; performance analysis; performance optimization

Rajesh Nishtala, Richard Vuduc, James W. Demmel, and Katherine A. Yelick.
When cache blocking sparse matrix vector multiply works and why.
Applicable Algebra in Engineering, Communication, and Computing: Special Issue on Computational Linear Algebra and Sparse Matrix Computations, March 2007.
* DOI, BibTeX, Topics: sparse linear algebra; performance analysis; performance optimization

Qing Yi, Keith Seymour, Haihang You, Richard Vuduc, and Dan Quinlan.
POET: Parameterized Optimizations for Empirical Tuning.
In Proc. Wkshp. Performance Optimization of High-level Languages and Libraries (POHLL), at IEEE Int’l. Par. Distrib. Processing Symp. (IPDPS), pages 1-8, Long Beach, CA, USA, March 2007.
* PDF, DOI, BibTeX, Topics: compilers; autotuning; program generation

Dan Quinlan, Markus Schordan, Richard Vuduc, and Qing Yi.
Annotating user-defined abstractions for optimization.
In Proc. Wkshp. Performance Optimization of High-level Languages and Libraries (POHLL), at IEEE Int’l. Par. Distrib. Processing Symp. (IPDPS), Rhodes, Greece, April 2006.
http://dx.doi.org/10.1109/IPDPS.2006.1639722.
* DOI, BibTeX, Topic: compilers

Dan Quinlan, Richard Vuduc, Thomas Panas, Jochen Härdtlein, and Andreas Sæbjørnsen.
Support for whole-program analysis and the verification of the one-definition rule in C++.
In Proc. Static Analysis Summit (SAS), volume NIST Special Publication 500-262, pages 27-35, 2006.
* PDF, BibTeX, Topics: program analysis; C++; one-definition rule; software security; compilers

Richard Vuduc, Martin Schulz, Dan Quinlan, and Bronis de Supinski.
Improving distributed memory applications testing by message perturbation.
In Proc. ACM Wkshp. Parallel and Distributed Systems: Testing and Debugging (PADTAD), at Int’l. Symp. Software Testing and Analysis (ISSTA), Portland, ME, USA, July 2006.
Winner, Best Paper.
* PDF, DOI, BibTeX, Topics: MPI; testing; debugging; irritators

Yuan Zhao, Qing Yi, Ken Kennedy, Dan Quinlan, and Richard Vuduc.
Parameterizing loop fusion for automated empirical tuning.
Technical Report UCRL-TR-217808, Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, California, USA, December 2005.
* DOI, BibTeX, Topics: compilers; autotuning

Dan Quinlan, Shmuel Ur, and Richard Vuduc.
An extensible open-source compiler infrastructure for testing.
In Proc. IBM Haifa Verification Conf. (VC), volume LNCS 3875, pages 116-133, Haifa, Israel, November 2005. Springer Berlin / Heidelberg.
* PDF, DOI, BibTeX, Topics: compilers; testing

Richard Vuduc, James W. Demmel, and Katherine A. Yelick.
OSKI: A library of automatically tuned sparse matrix kernels.
In Proc. SciDAC, J. Physics: Conf. Ser., volume 16, pages 521-530, 2005.
* DOI, BibTeX, Topics: sparse linear algebra; autotuning; performance optimization

Richard W. Vuduc and Hyun-Jin Moon.
Fast sparse matrix-vector multiplication by exploiting variable block structure.
In Proc. High-Performance Computing and Communications Conf. (HPCC), volume LNCS 3726, pages 807-816, Sorrento, Italy, September 2005. Springer.
* DOI, BibTeX, Topics: sparse linear algebra; autotuning; performance optimization

James Demmel, Jack Dongarra, Viktor Eijkhout, Erika Fuentes, Antoine Petitet, Richard Vuduc, R. Clint Whaley, and Katherine Yelick.
Self-adapting linear algebra algorithms and software.
Proc. IEEE, 93(2):293-312, February 2005.
* DOI, BibTeX, Topics: dense linear algebra; sparse linear algebra; autotuning

Benjamin C. Lee, Richard Vuduc, James Demmel, and Katherine Yelick.
Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply.
In Proc. Int’l. Conf. Parallel Processing (ICPP), Montreal, Canada, August 2004.
Winner, Best Paper.
* DOI, BibTeX, Topics: sparse linear algebra; performance modeling; autotuning

Eun-Jin Im, Katherine Yelick, and Richard Vuduc.
SPARSITY: Optimization framework for sparse matrix kernels.
Int’l. J. High Performance Computing Applications (IJHPCA), 18(1):135-158, February 2004.
* DOI, BibTeX, Topics: sparse linear algebra; autotuning; performance modeling; performance optimization

Richard W. Vuduc.
Automatic performance tuning of sparse matrix kernels.
PhD thesis, University of California, Berkeley, CA, USA, January 2004.

http://bebop.cs.berkeley.edu/pubs/vuduc2003-dissertation.pdf.
* BibTeX, Topics: performance analysis; performance modeling; performance optimization; autotuning; sparse linear algebra; statistical models

Richard Vuduc, James Demmel, and Jeff Bilmes.
Statistical models for empirical search-based performance tuning.
Int’l. J. High Performance Computing Applications (IJHPCA), 18(1):65-94, 2004.
http://dx.doi.org/10.1177/1094342004041293.
Extends conference version:
http://dx.doi.org/10.1007/3-540-45545-0_21
.
* DOI, BibTeX, Topics: statistical models; autotuning; survey; dense linear algebra; performance analysis

Richard Vuduc, Attila Gyulassy, James W. Demmel, and Katherine A. Yelick.
Memory hierarchy optimizations and bounds for sparse A^T*A*x.
In Proc. Wkshp. Parallel Linear Algebra (PLA), at Int’l. Conf. Computational Sci. (ICCS), volume LNCS 2659, pages 705-714, Melbourne, Australia, June 2003. Springer Berlin / Heidelberg.
* DOI, BibTeX, Topics: sparse linear algebra; autotuning; performance modeling

Richard Vuduc, James W. Demmel, Katherine A. Yelick, Shoaib Kamil, Rajesh Nishtala, and Benjamin Lee.
Performance optimizations and bounds for sparse matrix-vector multiply.
In Proc. ACM/IEEE Conf. Supercomputing (SC), Baltimore, MD, USA, November 2002.
Finalist, Best Student Paper.
http://portal.acm.org/citation.cfm?id=762822.
* BibTeX, Topics: sparse linear algebra; performance modeling; autotuning

Richard Vuduc, Shoaib Kamil, Jen Hsu, Rajesh Nishtala, James W. Demmel, and Katherine A. Yelick.
Automatic performance tuning and analysis of sparse triangular solve.
In Proc. Wkshp. Performance Optimization of High-level Languages and Libraries (POHLL), at ACM Int’l. Conf. Supercomputing (ICS), New York, USA, June 2002.
Winner, Best Presentation; Winner, Best Student Paper.
http://www.ece.lsu.edu/jxr/pohll-02/papers/vuduc.pdf.

* BibTeX, Topics: sparse linear algebra; performance modeling; autotuning

Richard Vuduc, James W. Demmel, and Jeff A. Bilmes.
Statistical models for empirical search-based performance tuning.
In Proc. Int’l. Conf. Computational Science (ICCS), volume LNCS 2073, pages 117-126, San Francisco, CA, USA, May 2001. Springer Berlin / Heidelberg.
Extends workshop version:
http://www.eecs.harvard.edu/~smith/fddo3/papers/107.ps
.
http://dx.doi.org/10.1007/3-540-45545-0_21.
* DOI, BibTeX, Topics: statistical models; autotuning; dense linear algebra

Richard Vuduc and James W. Demmel.
Code generators for automatic tuning of numerical kernels: Experiences with FFTW.
In Proc. Semantics, Applications, and Implementation of Program Generation (SAIG), at ACM SIGPLAN Int’l. Conf. Functional Programming (ICFP), Montréal, Canada, September 2000.
* PDF, DOI, BibTeX, Topics: program generation; signal processing; autotuning; FFT

Richard Vuduc, James Demmel, and Jeff Bilmes.
Statistical modeling of feedback data in an automatic tuning system.
In Proc. ACM Wkshp. Feedback-Directed Dynamic Optimization (FDDO), at Int’l. Symp. Microarchitecture (MICRO), Monterey, CA, USA, December 2000.
Winner, Best Presentation.
* BibTeX, Topics: dense linear algebra; statistical models; autotuning; performance modeling

Automatically created on Wed Feb 8 05:24:26 2012 by yab2web.