Papers
For publication news, look here: http://hpcgarage.org/wp/?cat=9
Note: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Jee Choi, Marat Dukhan, Xing Liu, and Richard Vuduc. Algorithmic time, energy, and power on candidate HPC building blocks. In Proc. IEEE Int’l. Parallel and Distributed Processing Symposium (IPDPS), Phoenix, AZ, USA, May 2014. (to appear).
Jee Choi, Aparna Chandramowlishwaran, Kamesh Madduri, and Richard Vuduc. A CPU-GPU hybrid implementation and model-driven scheduling of the fast multipole method. In Proc. 7th Wkshp. General-Purpose Programming using GPUs (GPGPU-7), Salt Lake City, UT, USA, March 2014. PDF | Slides
Marat Dukhan. PeachPy: A Python framework for developing high-performance assembly kernels. In Proc. Workshop on Python for High-Performance and Scientific Computing (PyHPC), co-located with SC’13, Denver, CO, USA, November 2013. PDF | Slides
Piyush Sao and Richard Vuduc. Self-stabilizing iterative solvers. In Proc. Workshop on Latest Advances in Scalable Algorithms for Large-scale Systems (ScalA), co-located with SC’13, Denver, CO, USA, November 2013. * PDF @ ACM DL
Marat Dukhan and Richard Vuduc. Methods for high-throughput computation of elementary functions. In Proc. Int’l. Conf. Parallel Processing and Applied Mathematics (PPAM), Warsaw, Poland, September 2013. (Camera-ready pending)
Sangmin Park, Mary Jean Harrold, and Richard Vuduc. Griffin: Grouping suspicious memory-access patterns to improve understanding of concurrency bugs. In Proc. Int’l. Symp. Software Testing and Analysis (ISSTA), Lugano, Switzerland, July 2013. PDF
Jee Choi and Richard Vuduc. A roofline model of energy. In Proc. IEEE Int’l. Parallel and Distributed Processing Symp. (IPDPS), Boston, MA, USA, May 2013. This paper is the short peer-reviewed conference version of our tech report, https://smartech.gatech.edu/xmlui/handle/1853/45737.
* PDF
Kenneth Czechowski and Richard Vuduc. A theoretical framework for algorithm-architecture co-design. In Proc. IEEE Int’l. Parallel and Distributed Processing Symp. (IPDPS), Boston, MA, USA, May 2013. (to appear).
* PDF
Jee Whan Choi and Richard Vuduc. A roofline model of energy. Technical Report GT-CSE-12-01, Georgia Institute of Technology, School of Computational Science and Engineering, Atlanta, GA, USA, December 2012.https://smartech.gatech.edu/xmlui/handle/1853/45737.
* BibTeX
Hyesoon Kim, Richard Vuduc, Sara Baghsorkhi, Jee Choi, and Wen mei Hwu. Performance analysis and tuning for general purpose graphics processing units (GPGPU). Synthesis Lectures on Computer Architecture. Morgan & Claypool Publishers, San Rafael, CA, USA, November 2012. doi:10.2200/S00451ED1V01Y201209CAC020
* DOI, BibTeX
William B. March, Kenneth Czechowski, Marat Dukhan, Thomas Benson, Dongryeol Lee, Andrew J. Connolly, Richard Vuduc, Edmond Chow, and Alexander G. Gray. Optimizing the computation of n-point correlations on large-scale astronomical data. InProc. ACM/IEEE Conf. Supercomputing (SC), November 2012. http://conferences.computer.org/sc/2012/papers/1000a033.pdf
* BibTeX
Ilya Lashuk, Aparna Chandramowlishwaran, Harper Langston, Tuan-Ahn Nguyen, Rahul Sampath, Aashay Shringarpure, Richard Vuduc, Lexing Ying, Denis Zorin, and George Biros. A massively parallel adaptive fast multipole method on heterogeneous architectures. Communications of the ACM (CACM), 55(5):101-109, May 2012. doi:10.1145/2160718.2160740. Extends conference version: http://doi.acm.org/10.1145/1654059.1654118.
* PDF, DOI, BibTeX, Topics: n-body; GPGPU; parallel algorithms; performance optimization; performance analysis; fast multipole method; performance evaluation
Cong Hou, Daniel Quinlan, David Jefferson, Richard Fujimoto, and Richard Vuduc. Loop synthesis for program inversion. InProc. 4th Wkshp. Reversible Computation, Copenhagen, Denmark, July 2012. http://www.reversible-computation.org/2012/cms.
* Talk, PDF, BibTeX
Aparna Chandramowlishwaran, Jee Whan Choi, Kamesh Madduri, and Richard Vuduc. Towards a communication optimal fast multipole method and its implications for exascale. In Proc. ACM Symp. Parallel Algorithms and Architectures (SPAA), Pittsburgh, PA, USA, June 2012. Brief announcement. doi:10.1145/2312005.2312039.
* PDF, DOI, BibTeX, Topics: performance analysis; performance modeling; performance optimization; fast multipole method;exascale; co-design; parallel algorithms; n-body
Kenneth Czechowski, Chris McClanahan, Casey Battaglino, Kartik Iyer, P.-K. Yeung, and Richard Vuduc. On the communication complexity of 3D FFTs and its implications for exascale. In Proc. ACM Int’l. Conf. Supercomputing (ICS), San Servolo Island, Venice, Italy, June 2012. doi:10.1145/2304576.2304604.
* Talk, PDF, DOI, BibTeX, Topics: FFT; exascale; performance modeling; co-design
Richard Vuduc, Kenneth Czechowski, Aparna Chandramowlishwaran, and Jee Whan Choi.
Courses in high-performance computing for scientists and engineers.
In Proc. NSF/TCPP Wkshp. Parallel and Distributed Computing Education (EduPar), co-located with IPDPS’12, Shanghai, China, May 2012.
* Talk, PDF, BibTeX, Topic: education
Cong Hou, George Vulov, Daniel Quinlan, David Jefferson, Richard Fujimoto, and Richard Vuduc. A new method for program inversion. In Proc. Int’l. Conf. Compiler Construction (CC), Tallinn, Estonia, March 2012.http://www.cc.gatech.edu/~chou3/ProgramInversion.pdf.
* Talk, BibTeX, Topics: program inversion; compilers; reverse computation; parallel discrete-event simulation
Sangmin Park, Richard Vuduc, and Mary Jean Harrold.
A unified approach for localizing non-deadlock concurrency bugs.
In Proc. IEEE In’tl. Conf. Software Testing, Verification, and
Validation (ICST), Montréal, Canada, April 2012.
* BibTeX
Dongryeol Lee, Richard Vuduc, and Alexander G. Gray.
A hybrid MPI/OpenMP kernel summation framework for large scale
data analysis.
In Proc. SIAM Int’l. Conf. Data Mining (SDM), Anaheim, CA, USA,
April 2012.
Winner, Best Paper
* BibTeX
Jaewoong Sim, Aniruddha Dasgputa, Hyesoon Kim, and Richard Vuduc.
GPUPerf: A performance analysis framework for identifying performance benefits in GPGPU applications.
In Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), New Orleans, LA, USA, February 2012.
(to appear).
* BibTeX
Jaekyu Lee, Hyesoon Kim, and Richard Vuduc.
When prefetching works, when it doesn’t, and why.
ACM Trans. Architecture and Code Optimization (TACO), 2012.
* BibTeX, Topics: microarchitecture; prefetching
Sooraj Bhat, Ashish Agarwal, Richard Vuduc, and Alexander Gray.
A type theory for probability density functions.
In ACM SIGACT-SIGPLAN Symp. Principles of Programming Languages (POPL 2012), Philadelpha, PA, USA, January 2012.
* PDF, BibTeX
Kenneth Czechowski, Chris McClanahan, Casey Battaglino, Kartik Iyer, P.-K. Yeung, and Richard Vuduc.
Prospects for scalable 3D FFTs on heterogeneous exascale systems.
In In Proc. ACM/IEEE Conf. Supercomputing (SC), November 2011.
(poster); extended version available as Georgia Tech report GT-CSE-11-02.
* BibTeX
Richard Vuduc and Kenneth Czechowski.
What GPU computing means for high-end systems.
IEEE Micro, 31(4):74-78, July/August 2011.
http://dx.doi.org/10.1109/MM.2011.78.
* DOI, BibTeX
Raghul Gunasekaran, David Dillow, Galen Shipman, Richard Vuduc, and Edmond Chow.
Characterizing application runtime behavior from system logs and metrics.
In Proc. Int’l. Wkshp. Characterizing Applications for Heterogeneous Exascale Systems (CACHES), Tucson, AZ, USA, June 2011.
* BibTeX
Kenneth Czechowski, Casey Battaglino, Chris McClanahan, Aparna Chandramowlishwaran, and Richard Vuduc.
Balance principles for algorithm-architecture co-design.
In Proc. USENIX Wkshp. Hot Topics in Parallelism (HotPar), Berkeley, CA, USA, May 2011.
* Talk, PDF, BibTeX, Topics: parallel algorithms; architecture; co-design; performance analysis
Sam Williams, Nathan Bell, Jee Choi, Michael Garland, Leonid Oliker, and Richard Vuduc.
Sparse matrix vector multiplication on multicore and accelerator systems.
In Jakub Kurzak, David A. Bader, and Jack Dongarra, editors, Scientific Computing with Multicore Processors and Accelerators. CRC Press, 2011.
* BibTeX
Jaekyu Lee, Nagesh B. Lakshminarayana, Hyesoon Kim, and Richard Vuduc.
Hardware and software prefetching mechanisms for GPGPU applications.
In Proc. IEEE/ACM Int’l. Symp. Microarchitecture (MICRO), Atlanta, GA, USA, December 2010.
* PDF, BibTeX, Topics: GPGPU; performance evaluation; performance programming; prefetching; codesign
Abtin Rahimian, Ilya Lashuk, Aparna Chandramowlishwaran, Dhairya Malhotra, Logan Moon, Rahul Sampath, Aashay Shringarpure, Shravan Veerapaneni, Jeffrey Vetter, Richard Vuduc, Denis Zorin, and George Biros.
Petascale direct numerical simulation of blood flow on 200k cores and heterogeneous architectures.
In Proc. ACM/IEEE Conf. Supercomputing (SC), New Orleans, LA, USA, November 2010.
Winner, Gordon Bell Prize.
* PDF, DOI, BibTeX, Topics: performance evaluation; performance programming; GPGPU; multicore; MPI; parallel algorithms; heterogeneous architectures; parallel algorithms
Aparna Chandramowlishwaran, Kamesh Madduri, and Richard Vuduc.
Diagnosis, tuning, and redesign for multicore performance: A case study of the fast multipole method.
In Proc. ACM/IEEE Conf. Supercomputing (SC), New Orleans, LA, USA, November 2010.
* PDF, DOI, BibTeX, Topics: multicore; n-body; performance analysis; performance optimization
Richard Vuduc, Aparna Chandramowlishwaran, Jee Whan Choi, Murat Efe Guney, and Aashay Shringarpure.
On the limits of GPU acceleration.
In Proc. USENIX Wkshp. Hot Topics in Parallelism (HotPar), Berkeley, CA, USA, June 2010.
* PDF, BibTeX, Topics: multicore; GPGPU; sparse linear algebra; n-body
Sooraj Bhat, Ashish Agarwal, Alexander Gray, and Richard Vuduc.
Toward interactive statistical modeling.
Procedia Computer Science, 1(1):1829-1838, May-June 2010.
Proc. Int’l. Conf. Computational Science (ICCS), Wkshp. Automated Program Generation for Computational Science (APGCS).
* PDF, DOI, BibTeX, Topics: machine learning; algorithm derivation; interactive modeling; type theory
Aparna Chandramowlishwaran, Kathleen Knobe, and Richard Vuduc.
Performance evaluation of Concurrent Collections on high-performance multicore computing systems.
Technical Report GT-CSE-10-01, Georgia Institute of Technology, Atlanta, GA, USA, February 2010.
* BibTeX, Topics: parallel programming models; dense linear algebra; multicore
Aparna Chandramowlishwaran, Samuel Williams, Leonid Oliker, Ilya Lashuk, George Biros, and Richard Vuduc.
Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures.
In Proc. IEEE Int’l. Parallel and Distributed Processing Symp. (IPDPS), Atlanta, GA, USA, April 2010.
* PDF, BibTeX, Topics: n-body; multicore; performance analysis; performance optimization
Aparna Chandramowlishwaran, Kathleen Knobe, and Richard Vuduc.
Performance evaluation of Concurrent Collections on high-performance multicore computing systems.
In Proc. IEEE Int’l. Parallel and Distributed Processing Symp. (IPDPS), Atlanta, GA, USA, April 2010.
Winner, Best Paper (software track).
* PDF, DOI, BibTeX, Topics: CnC; parallel programming models; dense linear algebra; multicore
Aparna Chandramowlishwaran, Kathleen Knobe, and Richard Vuduc.
Applying the Concurrent Collections programming model to asynchronous parallel dense linear algebra.
In Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), Bangalore, India, January 2010. (poster).
* PDF, DOI, BibTeX, Topics: parallel programming models; dense linear algebra; multicore
Sangmin Park, Richard W. Vuduc, and Mary Jean Harrold.
FALCON: Fault localization for concurrent programs.
In Proc. ACM/IEEE Int’l. Conf. Software Eng., Cape Town, South Africa, May 2010.
* PDF, DOI, BibTeX, Topics: testing; debugging; fault-localization; concurrency
Jee Whan Choi, Amik Singh, and Richard W. Vuduc.
Model-driven autotuning of sparse matrix-vector multiply on GPUs.
In Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming (PPoPP), Bangalore, India, January 2010.
* PDF, DOI, BibTeX, Topics: sparse linear algebra; autotuning; GPGPU; performance modeling; performance optimization
Chunhua Liao, Daniel J. Quinlan, Richard Vuduc, and Thomas Panas.
Effective source-to-source outlining to support whole program empirical optimization.
In Proc. Int’l. Wkshp. Languages and Compilers for Parallel Computing (LCPC), volume LNCS, Newark, DE, USA, October 2009.
* DOI, BibTeX, Topics: compilers; autotuning; outlining
Nitin Arora, Ryan P. Russell, and Richard W. Vuduc.
Fast sensitivity computations for numerical optimizations.
In Proc. AAS/AIAA Astrodynamics Specialist Conference, AAS 09-435, Pittsburgh, PA, USA, August 2009.
* PDF, BibTeX, Topics: numerical optimization; sensitivity; GPGPU; astrodynamics
Manisha Gajbe, Andrew Canning, John Shalf, Lin-Wang Wang, Harvey Wasserman, and Richard Vuduc.
Auto-tuning distributed-memory 3-dimensional fast Fourier transforms on the Cray XT4.
In Proc. Cray User’s Group (CUG) Meeting, Atlanta, GA, USA, May 2009.
* BibTeX, Topics: autotuning; performance analysis; performance optimization; FFT
Sundaresan Venkatasubramanian and Richard W. Vuduc.
Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU platforms.
In Proc. ACM Int’l. Conf. Supercomputing (ICS), New York, NY, USA, June 2009.
* PDF, DOI, BibTeX, Topics: asynchronous iteration; GPGPU; heterogeneous architectures; performance optimization
Nitin Arora, Aashay Shringarpure, and Richard Vuduc.
Direct n-body kernels for multicore platforms.
In Proc. Int’l. Conf. Parallel Processing (ICPP), Vienna, Austria, September 2009.
* PDF, DOI, BibTeX, Topics: multicore; n-body; performance analysis; performance optimization
Ilya Lashuk, Aparna Chandramowlishwaran, Harper Langston, Tuan-Anh Nguyen, Rahul Sampath, Aashay Shringarpure, Richard Vuduc, Lexing Ying, Denis Zorin, and George Biros.
A massively parallel adaptive fast multipole method on heterogeneous architectures.
In Proc. ACM/IEEE Conf. Supercomputing (SC), Portland, OR, USA, November 2009.
Finalist, Best Paper.
* PDF, DOI, BibTeX, Topics: n-body; multicore; GPGPU; MPI; parallel algorithms
Seunghwa Kang, David Bader, and Richard Vuduc.
Understanding the design trade-offs among current multicore systems for numerical computations.
In Proc. IEEE Int’l. Parallel and Distributed Processing Symp. (IPDPS), Rome, Italy, May 2009.
* PDF, DOI, BibTeX, Topics: statistical models; n-body; multicore; performance analysis; performance optimization
Sam Williams, Richard Vuduc, Leonid Oliker, John Shalf, Katherine Yelick, and James Demmel.
Optimizing sparse matrix-vector multiply on emerging multicore platforms.
Parallel Computing (ParCo), 35(3):178-194, March 2009.
http://dx.doi.org/10.1016/j.parco.2008.12.006.
Extends conference version:
http://dx.doi.org/10.1145/1362622.1362674.
* PDF, DOI, BibTeX, Topics: sparse linear algebra; multicore; autotuning
Aparna Chandramowlishwaran, Abhinav Karhu, Ketan Umare, and Richard Vuduc.
Numerical algorithms with tunable parallelism.
In Proc.Wkshp. Software Tools for Multicore Systems (STMCS), at IEEE/ACM Int’l. Symp. Code Generation and Optimization (CGO), Boston, MA, USA, April 2008.
* PDF, BibTeX, Topics: autotuning; asynchronous variational integration; asynchronous iteration
Thomas Panas, Dan Quinlan, and Richard Vuduc.
Tool support for inspecting the code quality of HPC applications.
In Proc. Wkshp. Software Eng. for High-Performance Computing Applications (SE-HPC), at ACM/IEEE Int’l. Conf. Software Eng. (ICSE), Minneapolis, MN, USA, May 2007.
* PDF, DOI, BibTeX, Topics: program visualization; software engineering
Thomas Panas, Dan Quinlan, and Richard Vuduc.
Analyzing and visualizing whole program architectures.
In Proc. Wkshp. Aerospace Software Engineering (AeroSE), at ACM/IEEE Int’l. Conf. Software Eng. (ICSE), Minneapolis, MN, USA, May 2007.
Also: Lawrence Livermore National Laboratory Technical Report UCRL-PROC-231453.
* PDF, BibTeX, Topics: program visualization; software engineering
Dan Quinlan, Richard Vuduc, and Ghassan Misherghi.
Techniques for specifying bug patterns.
In Proc. ACM Wkshp. Parallel and Distributed Systems: Testing and Debugging (PADTAD), at Int’l. Symp. Software Testing and Analysis (ISSTA), Portland, ME, USA, July 2007.
Winner, Best Paper.
* PDF, DOI, BibTeX, Topics: software security; compilers; debugging
Sam Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, and James Demmel.
Optimization of sparse matrix-vector multiplication on emerging multicore platforms.
In Proc. ACM/IEEE Conf. Supercomputing (SC), 2007.
* DOI, BibTeX, Topics: sparse linear algebra; autotuning; multicore; performance analysis; performance optimization
Rajesh Nishtala, Richard Vuduc, James W. Demmel, and Katherine A. Yelick.
When cache blocking sparse matrix vector multiply works and why.
Applicable Algebra in Engineering, Communication, and Computing: Special Issue on Computational Linear Algebra and Sparse Matrix Computations, March 2007.
* DOI, BibTeX, Topics: sparse linear algebra; performance analysis; performance optimization
Qing Yi, Keith Seymour, Haihang You, Richard Vuduc, and Dan Quinlan.
POET: Parameterized Optimizations for Empirical Tuning.
In Proc. Wkshp. Performance Optimization of High-level Languages and Libraries (POHLL), at IEEE Int’l. Par. Distrib. Processing Symp. (IPDPS), pages 1-8, Long Beach, CA, USA, March 2007.
* PDF, DOI, BibTeX, Topics: compilers; autotuning; program generation
Dan Quinlan, Markus Schordan, Richard Vuduc, and Qing Yi.
Annotating user-defined abstractions for optimization.
In Proc. Wkshp. Performance Optimization of High-level Languages and Libraries (POHLL), at IEEE Int’l. Par. Distrib. Processing Symp. (IPDPS), Rhodes, Greece, April 2006.
http://dx.doi.org/10.1109/IPDPS.2006.1639722.
* DOI, BibTeX, Topic: compilers
Dan Quinlan, Richard Vuduc, Thomas Panas, Jochen Härdtlein, and Andreas Sæbjørnsen.
Support for whole-program analysis and the verification of the one-definition rule in C++.
In Proc. Static Analysis Summit (SAS), volume NIST Special Publication 500-262, pages 27-35, 2006.
* PDF, BibTeX, Topics: program analysis; C++; one-definition rule; software security; compilers
Richard Vuduc, Martin Schulz, Dan Quinlan, and Bronis de Supinski.
Improving distributed memory applications testing by message perturbation.
In Proc. ACM Wkshp. Parallel and Distributed Systems: Testing and Debugging (PADTAD), at Int’l. Symp. Software Testing and Analysis (ISSTA), Portland, ME, USA, July 2006.
Winner, Best Paper.
* PDF, DOI, BibTeX, Topics: MPI; testing; debugging; irritators
Yuan Zhao, Qing Yi, Ken Kennedy, Dan Quinlan, and Richard Vuduc.
Parameterizing loop fusion for automated empirical tuning.
Technical Report UCRL-TR-217808, Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, California, USA, December 2005.
* DOI, BibTeX, Topics: compilers; autotuning
Dan Quinlan, Shmuel Ur, and Richard Vuduc.
An extensible open-source compiler infrastructure for testing.
In Proc. IBM Haifa Verification Conf. (VC), volume LNCS 3875, pages 116-133, Haifa, Israel, November 2005. Springer Berlin / Heidelberg.
* PDF, DOI, BibTeX, Topics: compilers; testing
Richard Vuduc, James W. Demmel, and Katherine A. Yelick.
OSKI: A library of automatically tuned sparse matrix kernels.
In Proc. SciDAC, J. Physics: Conf. Ser., volume 16, pages 521-530, 2005.
* DOI, BibTeX, Topics: sparse linear algebra; autotuning; performance optimization
Richard W. Vuduc and Hyun-Jin Moon.
Fast sparse matrix-vector multiplication by exploiting variable block structure.
In Proc. High-Performance Computing and Communications Conf. (HPCC), volume LNCS 3726, pages 807-816, Sorrento, Italy, September 2005. Springer.
* DOI, BibTeX, Topics: sparse linear algebra; autotuning; performance optimization
James Demmel, Jack Dongarra, Viktor Eijkhout, Erika Fuentes, Antoine Petitet, Richard Vuduc, R. Clint Whaley, and Katherine Yelick.
Self-adapting linear algebra algorithms and software.
Proc. IEEE, 93(2):293-312, February 2005.
* DOI, BibTeX, Topics: dense linear algebra; sparse linear algebra; autotuning
Benjamin C. Lee, Richard Vuduc, James Demmel, and Katherine Yelick.
Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply.
In Proc. Int’l. Conf. Parallel Processing (ICPP), Montreal, Canada, August 2004.
Winner, Best Paper.
* DOI, BibTeX, Topics: sparse linear algebra; performance modeling; autotuning
Eun-Jin Im, Katherine Yelick, and Richard Vuduc.
SPARSITY: Optimization framework for sparse matrix kernels.
Int’l. J. High Performance Computing Applications (IJHPCA), 18(1):135-158, February 2004.
* DOI, BibTeX, Topics: sparse linear algebra; autotuning; performance modeling; performance optimization
Richard W. Vuduc.
Automatic performance tuning of sparse matrix kernels.
PhD thesis, University of California, Berkeley, CA, USA, January 2004.
http://bebop.cs.berkeley.edu/pubs/vuduc2003-dissertation.pdf.
* BibTeX, Topics: performance analysis; performance modeling; performance optimization; autotuning; sparse linear algebra; statistical models
Richard Vuduc, James Demmel, and Jeff Bilmes.
Statistical models for empirical search-based performance tuning.
Int’l. J. High Performance Computing Applications (IJHPCA), 18(1):65-94, 2004.
http://dx.doi.org/10.1177/1094342004041293.
Extends conference version:
http://dx.doi.org/10.1007/3-540-45545-0_21.
* DOI, BibTeX, Topics: statistical models; autotuning; survey; dense linear algebra; performance analysis
Richard Vuduc, Attila Gyulassy, James W. Demmel, and Katherine A. Yelick.
Memory hierarchy optimizations and bounds for sparse A^T*A*x.
In Proc. Wkshp. Parallel Linear Algebra (PLA), at Int’l. Conf. Computational Sci. (ICCS), volume LNCS 2659, pages 705-714, Melbourne, Australia, June 2003. Springer Berlin / Heidelberg.
* DOI, BibTeX, Topics: sparse linear algebra; autotuning; performance modeling
Richard Vuduc, James W. Demmel, Katherine A. Yelick, Shoaib Kamil, Rajesh Nishtala, and Benjamin Lee.
Performance optimizations and bounds for sparse matrix-vector multiply.
In Proc. ACM/IEEE Conf. Supercomputing (SC), Baltimore, MD, USA, November 2002.
Finalist, Best Student Paper.
http://portal.acm.org/citation.cfm?id=762822.
* BibTeX, Topics: sparse linear algebra; performance modeling; autotuning
Richard Vuduc, Shoaib Kamil, Jen Hsu, Rajesh Nishtala, James W. Demmel, and Katherine A. Yelick.
Automatic performance tuning and analysis of sparse triangular solve.
In Proc. Wkshp. Performance Optimization of High-level Languages and Libraries (POHLL), at ACM Int’l. Conf. Supercomputing (ICS), New York, USA, June 2002.
Winner, Best Presentation; Winner, Best Student Paper.
http://www.ece.lsu.edu/jxr/pohll-02/papers/vuduc.pdf.
* BibTeX, Topics: sparse linear algebra; performance modeling; autotuning
Richard Vuduc, James W. Demmel, and Jeff A. Bilmes.
Statistical models for empirical search-based performance tuning.
In Proc. Int’l. Conf. Computational Science (ICCS), volume LNCS 2073, pages 117-126, San Francisco, CA, USA, May 2001. Springer Berlin / Heidelberg.
Extends workshop version:
http://www.eecs.harvard.edu/~smith/fddo3/papers/107.ps.
http://dx.doi.org/10.1007/3-540-45545-0_21.
* DOI, BibTeX, Topics: statistical models; autotuning; dense linear algebra
Richard Vuduc and James W. Demmel.
Code generators for automatic tuning of numerical kernels: Experiences with FFTW.
In Proc. Semantics, Applications, and Implementation of Program Generation (SAIG), at ACM SIGPLAN Int’l. Conf. Functional Programming (ICFP), Montréal, Canada, September 2000.
* PDF, DOI, BibTeX, Topics: program generation; signal processing; autotuning; FFT
Richard Vuduc, James Demmel, and Jeff Bilmes.
Statistical modeling of feedback data in an automatic tuning system.
In Proc. ACM Wkshp. Feedback-Directed Dynamic Optimization (FDDO), at Int’l. Symp. Microarchitecture (MICRO), Monterey, CA, USA, December 2000.
Winner, Best Presentation.
* BibTeX, Topics: dense linear algebra; statistical models; autotuning; performance modeling
Automatically created on Wed Feb 8 05:24:26 2012 by yab2web.