- Improving Communication by Optimizing On-Node Data Movement with Data Layout Tuowen Zhao, Mary Hall, Hans Johansen, and Samuel Williams. In
*Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming*(*PPoPP*). Mar. 2021 *Data-Driven Mixed Precision Sparse Matrix Vector Multiplication for GPUs*Khalid Ahmad, Hari Sundar, Mary Hall, ACM Transactions on Architectures and Code Optimization 16(5), Dec. 2019.*Exploiting Reuse and Vectorization in Blocked Stencil**Computations on CPUs and GPUs*T. Zhao, S. Williams, M. Hall, H. Johansen International Conference on Supercomputing, Networking, Storage and Analysis (SC), Nov. 2019.*SWIRL: High-Performance Many-Core CPU Code Generation for Deep Neural**Networks*Anand Venkat, Tharindu Rusira, Raj Barik, Mary Hall, Leonard Truong, International Journal of High-Performance Computing Applications, 33(6), 2019.*Sparse Computation Data Dependence Simplification for Efficient Compiler-Generated Inspectors*M. Mohammadi, K. Cheshmi, E. Davis, M. Hall, M. Dehnavi, P. Nandy, C. Olschanowsky, A. Venkat. T. Yuki, M. Strout, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2019.*The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code*M. M. Strout, M. Hall and C. Olschanowsky, Proceedings of the IEEE 106(11):1921–1934, Nov. 2018.*Autotuning in High-Performance Computing Applications*Prasanna Balaprakash, Jack Dongarra, Todd Gamblin, Mary Hall, Jeffrey K. Hollingsworth, Boyana Norris, and Richard Vuduc, Proceedings of the IEEE 106(11):2068–2083, Nov. 2018.*Automating Wavefront Parallelization for Sparse Matrix Codes*A. Venkat, M. Mohamadi, J. Park, R. Barik, H. Rong, M. Strout, M. Hall, International Conference on Supercomputing, Networking, Storage and Analysis (SC), Nov. 2016,**Best Paper Finalist**.

*Data-Driven Mixed Precision Sparse Matrix Vector Multiplication for GPUs*Khalid Ahmad, Hari Sundar, Mary Hall, ACM Transactions on Architectures and Code Optimization 16(5), Dec. 2019.*SWIRL: High-Performance Many-Core CPU Code Generation for Deep Neural**Networks*Anand Venkat, Tharindu Rusira, Raj Barik, Mary Hall, Leonard Truong, International Journal of High-Performance Computing Applications, 33(6), 2019.*The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code*M. M. Strout, M. Hall and C. Olschanowsky, Proceedings of the IEEE 106(11):1921–1934, Nov. 2018.*Autotuning in High-Performance Computing Applications*Prasanna Balaprakash, Jack Dongarra, Todd Gamblin, Mary Hall, Jeffrey K. Hollingsworth, Boyana Norris, and Richard Vuduc, Proceedings of the IEEE 106(11):2068–2083, Nov. 2018.*Student Cluster Competition 2017, Team University of Utah: Reproducing Vectorization of the Tersoff Multi-Body Potential on the Intel Broadwell and Intel Skylake Platforms*J. Lake, Q. Chao, H. Eyre, E. Ford, K. Parker, K. Savoie, H. Sundar, M. Hall, Parallel Computing 79, Jul. 2018.*Reproducing ParConnect for SC16*Marek Baranowski, Braden Caywood, Hannah Eyre, Janaan Lake, Kevin Parker, Kincaid Savoie, Hari Sundar, Mary Hall, Parallel Computing 70:18–21, Dec. 2017.*Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers*P. Basu, S. Williams, B. Van Straalen, L. Oliker, P. Colella, and M. Hall, Parallel Computing 64(C):50–64, May 2017.*Designing a Tunable Nested Data-Parallel Programming System*S. Muralidharan, M. Garland, A. Sidelnik, M. Hall, ACM Transactions on Architecture and Code Optimization, 13(4), December 2016.*Towards Making Autotuning Mainstream*P. Basu, M. Hall, M. Khan, S. Maindola, S. Muralidharan, S. Ramalingam, A. Rivera, M. Shantharam, A. Venkat,*International Journal of High Performance Computing Applications*, 27(4), November 2013.*A script-based autotuning compiler system to generate high-performance CUDA code*M. Khan, P. Basu, G. Rudy, M. Hall, C. Chen, and J. Chame.*ACM Transactions on Architecture and Code Optimization*, 9(4), January 2013.*Hierarchical parallelization and optimization of high-order stencil computations on multicore**clusters*H. Dursun, M. Kunaseth, K. Nomura, J. Chame, R.F. Lucas, C. Chen, M. Hall, R.K. Kalia, A. Nakano, P. Vashishta,*The Journal of Supercomputing*, 62(2):946-966, December 2012.*Auto-tuning Full Applications: A Case Study*A. Tiwari, C. Chen, C. Liao, J. Chame, J. Hollingsworth, M. Hall and D. Quinlan,*International Journal of High Performance Computing Applications*, 25(3):286-294, Aug. 2011.*Domain-Specific Optimization of Signal Recognition Targeting FPGAs*M. Demertzi, P.C. Diniz, M.W. Hall, A.C. Gilbert and Y.Wang,*ACM Transactions on Reconfigurable Technology**and Systems*, 4(2), May, 2011.*Parameterized specification, configuration and execution of data-intensive scientific work-flows*V.S. Kumar, T. Kurc, V. Ratnakar, J. Kim, G. Mehta, K. Vahi, Y.L. Nelson, P. Sadayappan, E. Deelman, Y. Gil, M. Hall and J. Saltz,*Cluster Computing*, April 2010.*Compiler Research: The Next Fifty Years*M. Hall, D. Padua and K. Pingali,*Communications of the ACM*, Feb. 2009.*Self-Configuring Applications for Heterogeneous Systems: Program Composition and Optimization Using Cognitive Techniques*M. Hall, Y. Gil and R. Lucas.*Proceedings of the IEEE, Special Issue on Cutting-Edge Computing*, Vol. 96(5), May 2008.

- Improving Communication by Optimizing On-Node Data Movement with Data Layout Tuowen Zhao, Mary Hall, Hans Johansen, and Samuel Williams. In
*Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming*(*PPoPP*). Mar. 2021 *Exploiting Reuse and Vectorization in Blocked Stencil**Computations on CPUs and GPUs*T. Zhao, S. Williams, M. Hall, H. Johansen International Conference on Supercomputing, Networking, Storage and Analysis (SC), Nov. 2019.*Sparse Computation Data Dependence Simplification for Efficient Compiler-Generated Inspectors*M. Mohammadi, K. Cheshmi, E. Davis, M. Hall, M. Dehnavi, P. Nandy, C. Olschanowsky, A. Venkat. T. Yuki, M. Strout, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2019.*Automating Wavefront Parallelization for Sparse Matrix Codes*A. Venkat, M. Mohamadi, J. Park, R. Barik, H. Rong, M. Strout, M. Hall, International Conference on Supercomputing, Networking, Storage and Analysis (SC), Nov. 2016,**Best Paper Finalist**.*Synchronization Tradeoffs in GPU Implementations of Graph Algorithms*R. Kaleem, A. Venkat, S. Pai, M. Hall, K. Pingali, Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2016.*Architecture-Adaptive Code Variant Tuning*S. Muralidharan, A. Roy, M. Hall, M. Garland, and P. Rai, Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2016.*Generating Efficient Tensor Contractions for GPUs*T. Nelson, A. Rivera, P. Balaprakash, M. Hall, P.D. Hovland, E. Jessup, B. Norris, Proceedings of the IEEE International Conference on Parallel Processing (ICPP), Sept. 2015.*Loop and Data Transformations for Sparse Matrix Code*Anand Venkat, Mary Hall, Michelle Strout,*Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)*, June, 2015.*Nitro: A Framework for Adaptive Code Variant Tuning*S. Muralidharan, M. Shantharam, M. Hall, M. Garland, B. Catanzaro,*Proceedings of the International Parallel and Distributed Processing Symposium*, May, 2014.*Non-Affine Extensions to Polyhedral Code Generation*A. Venkat, M. Shantharam, M. Hall, M. M. Strout,*Proceedings of the International Conference on Code Generation and Optimiza**tion*, Feb. 2014.*Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid*P. Basu, S. Williams, B. Van Straalen, A. Venkat, L. Oliker, M. Hall,*High Performance Computing Conference (HiPC)*, December 2013.*Analyzing the effect of compiler optimizations on application reliability*M. Demertzi, M. Annavaram and M. Hall,*Proceedings of the IEEE International Symposium on Workload Characterization*, Nov., 2011.*EigenCFA: Accelerating Flow Analysis with GPUs*T. Prabhu, S. Ramalingam , M. Might, M. Hall, In*ACM SIGPLAN Principles of Programming Languages*, Jan. 2011.*Autotuning and Specialization: Speeding up Nek5000 with Compiler Technology*Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul Fischer, Paul D. Hovland,*International Conference on Supercomputing*, June, 2010.*An Integrated Framework for Parameter-based Optimization of Scientific Workflows*V. S. Kumar, P. Sadayappan, G. Mehta, K. Vahi, E. Deelman, V. Ratnakar, J. Kim, Y. Gil, M. Hall, T. Kurc, J. Saltz,*Proceedings of the International Symposium on High Performance Distributed Computing*, June, 2009.*Model-Guided Autotuning of High-Productivity Languages for Petascale Computing*H. Zima M. Hall, C. Chen, J. Chame, In*Proceedings of the International Symposium on High Performance Distributed Computing*, June, 2009.*A Scalable Autotuning Framework for Compiler Optimization*A. Tiwari, C. Chen, J. Chame, M. Hall and J. K. Hollingsworth, In*Proceedings of the International Parallel and Distributed Processing Symposium*, May, 2009.

*Polyhedral Compilation Support for C++ Features: A Case Study with CPPTRAJ*A. Roy, D. Roe, M. Hall, T. Cheatham, Lecture Notes in Computer Science, 2019, Volume 11403, Languages and Compilers for Parallel Computing 2017, Springer Verlag, Pages 26-35- Polyhedral Compiler Technology in Collaboration with Autotuning Important to Domain-Specific Frameworks for HPC M. Hall and P. Basu, Lecture Notes in Computer Science, 2017, Volume 10136, Languages and Compilers for Parallel Computing, Springer Verlag
*Optimizing LOBPCG: Sparse Matrix Loop and Data Transformations in Action*K. Ahmad, A. Venkat and M. Hall, Lecture Notes in Computer Science, 2017, Volume 10136, Languages and Compilers for Parallel Computing 2016, Springer Verlag, Pages 221-231*“A Programming Language Interface to Describe Transformations and Code Generation,”*G. Rudy, M. Khan, M. Hall, C. Chen and J. Chame, Lecture Notes in Computer Science, 2011, Volume 6548, Languages and Compilers for Parallel Computing, Springer Verlag, Pages 136-150- “
*Languages and Compilers for Autotuning*,” M.W. Hall and J. Chame, In*Performance Tuning of Scientic Applications*, edited by David Bailey, Robert F. Lucas and Sam Williams. Taylor and Francis publishers, Nov. 2010. *“Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology,”*Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul D. Hovland, Software Automatic Tuning: from concepts to state-of-the-art results, edited by Keita Teranishi, John Cavazos, Ken Naono and Reiji Suda, Springer-Verlag Publishers, 2010, Pages 353-370*“Loop Transformation Recipes for Code Generation and Auto-Tuning,”*Mary Hall, Jacqueline Chame, Chun Chen, Jaewook Shin and Gabe Rudy, Lecture Notes in Computer Science, 2010, Volume 5898, Languages and Compilers for Parallel Computing, Springer-Verlag, Pages 50-64

- Delivering performance-portable stencil computations on CPUs and GPUs using Bricks Zhao, Tuowen, Samuel Williams, Mary Hall, and Hans Johansen. In
*2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)*, Nov. 2018. *A Novel Variable-Blocking Representation for Efficient Sparse Matrix-Vector Multiply on GPUs*T. Zhao, T. Rusira, K. Ahmad, and M. Hall, (poster), SC16, November, 2016.*“Improving High-Performance Sparse Libraries using Compiler-Assisted Specialization : A PETSc Case Study,”*Shreyas Ramalingam, M. Hall and C. Chen, Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS), held in conjunction with International Parallel and Distributed Processing Symposium, May 2012*“Understanding the Behavior of Pthread Applications on Non-Uniform Cache Architectures,”*G. S. Sachdev, K. Sudan, M. W. Hall, and R. Balasubramonian, (poster paper), In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Oct. 2011*“Generating High Performance Libraries using CHiLL and Autotuning,”*S. Ramalingam and M. Hall, (poster), International Workshop on Languages and Compilers for Parallel Computing, Sept. 2011- “
*Evaluating graph coloring on GPUs*,” P. Grosset, P. Zhu, S. Liu, S. Venkatasubramanian, and M. Hall. In Proceedings of the*16th ACM symposium on Principles and practice of parallel**programming (PPoPP ’11)*, Feb. 2011. Received runner-up for Best Student Poster. *“CUDA-CHiLL: Using Compiler-Based Autotuning to Generate High-Performance GPU Libraries,”*M. Khan, G. Rudy, C. Chen, M. Hall, J. Chame, (poster) SC’10, Nov. 2010*“Automatic High-Performance GPU code Generation using CUDA-CHiLL”*, (poster) Malik Khan, Jacqueline Chame, Gabe Rudy, Chun Chen, Mary Hall, Mark Hall, Nvidia GPU Technology Conference, Sept. 2010*“Takagi Factorization on GPU using CUDA,”*(poster paper), Gagandeep S. Sachdev, Vishay Vanjani and Mary W. Hall, Symposium on Application Accelerators for High Performance Computing, July, 2010- “
*GPU Accelerated Particle System for Triangulated Surface Meshes*,” (poster paper), B. Peterson, M. Datar, M. Hall and R. Whitaker,*Symposium on Application Accelerators for High**Performance Computing*, July, 2010. *“Autotuning and Specialization: Speeding up Nek5000 with Compiler Technology,”*(poster) J. Shin, M. W. Hall, J. Chame, C. Chen, P. F. Fischer, P. D. Hovland, SC’09, Nov. 2009*“Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology,”*Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul D. Hovland, International Workshop on Automatic Performance Tuning, October, 2009- “
*Assembling Large Mosaics of Electron Microscope Images using GPU*,” (poster paper) Kannan Venkataraju, Mark Kim, Dan Gerszewski, James R. Anderson, and Mary Hall,*Symposium on Application Accelerators for High Performance Computing*, July, 2009. - “
*Computation reuse in domain-specific optimization of signal recognition*“, (poster paper) Melina Demertzi, Pedro C. Diniz, Mary W. Hall, Anna C. Gilbert, and Yi Wang, In*Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays (FPGA ’09)*,

Feb. 2009, p. 281. - “
*Model-Guided Performance Tuning of Parameter Values: A Case Study with Molecular Dy**namics Visualization*,” Y. Nelson, B. Bansal, M. Hall, A. Nakano, and K. Lerman,*Proceedings of the Workshop on High-Level Parallel Programming Models and Supportive Environments*, held in conjunction with IPDPS ’08, April, 2008.

*“ABSTRACTIONS AND STRATEGIES FORADAPTIVE PROGRAMMING”,*Saurav Muralidharan, Doctoral’s thesis, 2016.*“AN INTEGRATED COMPILER AND RUNTIMEFRAMEWORK FOR SPARSE MATRIX CODES”,*Anand Venkat, Doctoral’s thesis, 2016.*“COMPILER OPTIMIZATIONS AND AUTOTUNINGFOR STENCILS AND GEOMETRIC MULTIGRID”,*Protonu Basu, Doctoral’s thesis, 2016.*“PERFORMANCE MODELING FORARCHITECTURAL ANDPROGRAM ANALYSIS”,*Yu Jung Lo, Master’s thesis, 2015.*“USING AUTOTUNING FOR ACCELERATINGTENSOR CONTRACTION ON GRAPHICSPROCESSING UNITS (GPUS)”,*Axel Y. Rivera, Master’s thesis, 2014.*“IMPROVING HIGH-PERFORMANCE SPARSELIBRARIES USING COMPILER ASSISTEDSPECIALIZATION: A PETSC (PORTABLE,EXTENSIBLE TOOLKIT FOR SCIENTIFICCOMPUTATION) CASE STUDY”,*Shreyas Ramalingam, Master’s thesis, 2012.*“CUDA-CHILL: A PROGRAMMING LANGUAGE INTERFACE FOR GPGPU OPTIMIZATIONS AND CODE GENERATION”,*Gabe Rudy, Master’s thesis, 2010.

- Improving Communication by Optimizing On-Node Data Movement with Data Layout Tuowen Zhao, Mary Hall, Hans Johansen, and Samuel Williams. In
*Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming*(*PPoPP*). Mar. 2021 *Data-Driven Mixed Precision Sparse Matrix Vector Multiplication for GPUs*Khalid Ahmad, Hari Sundar, Mary Hall, ACM Transactions on Architectures and Code Optimization 16(5), Dec. 2019.*Exploiting Reuse and Vectorization in Blocked Stencil**Computations on CPUs and GPUs*T. Zhao, S. Williams, M. Hall, H. Johansen International Conference on Supercomputing, Networking, Storage and Analysis (SC), Nov. 2019.*SWIRL: High-Performance Many-Core CPU Code Generation for Deep Neural**Networks*Anand Venkat, Tharindu Rusira, Raj Barik, Mary Hall, Leonard Truong, International Journal of High-Performance Computing Applications, 33(6), 2019.*Sparse Computation Data Dependence Simplification for Efficient Compiler-Generated Inspectors*M. Mohammadi, K. Cheshmi, E. Davis, M. Hall, M. Dehnavi, P. Nandy, C. Olschanowsky, A. Venkat. T. Yuki, M. Strout, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2019.*The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code*M. M. Strout, M. Hall and C. Olschanowsky, Proceedings of the IEEE 106(11):1921–1934, Nov. 2018.*Autotuning in High-Performance Computing Applications*Prasanna Balaprakash, Jack Dongarra, Todd Gamblin, Mary Hall, Jeffrey K. Hollingsworth, Boyana Norris, and Richard Vuduc, Proceedings of the IEEE 106(11):2068–2083, Nov. 2018.- Delivering performance-portable stencil computations on CPUs and GPUs using Bricks Zhao, Tuowen, Samuel Williams, Mary Hall, and Hans Johansen. In
*2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)*, Nov. 2018. *“ABSTRACTIONS AND STRATEGIES FORADAPTIVE PROGRAMMING”,*Saurav Muralidharan, Doctoral’s thesis, 2016.*“AN INTEGRATED COMPILER AND RUNTIMEFRAMEWORK FOR SPARSE MATRIX CODES”,*Anand Venkat, Doctoral’s thesis, 2016.*Automating Wavefront Parallelization for Sparse Matrix Codes*A. Venkat, M. Mohamadi, J. Park, R. Barik, H. Rong, M. Strout, M. Hall, International Conference on Supercomputing, Networking, Storage and Analysis (SC), Nov. 2016,**Best Paper Finalist**.*A Novel Variable-Blocking Representation for Efficient Sparse Matrix-Vector Multiply on GPUs*T. Zhao, T. Rusira, K. Ahmad, and M. Hall, (poster), SC16, November, 2016.*“COMPILER OPTIMIZATIONS AND AUTOTUNINGFOR STENCILS AND GEOMETRIC MULTIGRID”,*Protonu Basu, Doctoral’s thesis, 2016.*Optimizing LOBPCG: Sparse Matrix Loop and Data Transformations in Action*K. Ahmad, A. Venkat and M. Hall, Lecture Notes in Computer Science, 2017, Volume 10136, Languages and Compilers for Parallel Computing 2016, Springer Verlag, Pages 221-231- “
*Loop and Data Transformations for Sparse Matrix Code*, ” Anand Venkat, Mary Hall, Michelle Strout,*Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)*, June, 2015. *“PERFORMANCE MODELING FORARCHITECTURAL ANDPROGRAM ANALYSIS”,*Yu Jung Lo, Master’s thesis, 2015.- “
*Nitro: A Framework for Adaptive Code Variant Tuning*,” S. Muralidharan, M. Shantharam, M. Hall, M. Garland, B. Catanzaro,*Proceedings of the International Parallel and Distributed Processing Symposium*, May, 2014. *“USING AUTOTUNING FOR ACCELERATINGTENSOR CONTRACTION ON GRAPHICSPROCESSING UNITS (GPUS)”,*Axel Y. Rivera, Master’s thesis, 2014.- “Non-affine Extensions to Polyhedral Code Generation,” A. Venkat, M. Shantharam, M. Hall, M. M. Strout,
*Proceedings of the International Conference on Code Generation and Optimization*, Feb. 2014. - “
*Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid*,” P. Basu, S. Williams, B. Van Straalen, A. Venkat, L. Oliker, M. Hall,*High Performance Computing Conference (HiPC)*, December 2013. - “
*Towards Making Autotuning Mainstream*,” P. Basu, M. Hall, M. Khan, S. Maindola, S. Muralidharan, S. Ramalingam, A. Rivera, M. Shantharam, A. Venkat,*International Journal of High Performance Computing Applications*, 27(4), November 2013. - “
*A script-based autotuning compiler system to generate high-performance CUDA code*,” M. Khan, P. Basu, G. Rudy, M. Hall, C. Chen, and J. Chame.*ACM Transactions on Architecture and Code Optimization*, 9(4), January 2013. - “
*Hierarchical parallelization and optimization of high-order stencil computations on multicore**clusters,*” H. Dursun, M. Kunaseth, K. Nomura, J. Chame, R.F. Lucas, C. Chen, M. Hall, R.K. Kalia, A. Nakano, P. Vashishta,*The Journal of Supercomputing*, 62(2):946-966, December 2012. - “
*Understanding ACM’s Past*,” M. Hall,*Communications of the ACM*, 55(12), December 2012. *“IMPROVING HIGH-PERFORMANCE SPARSELIBRARIES USING COMPILER ASSISTEDSPECIALIZATION: A PETSC (PORTABLE,EXTENSIBLE TOOLKIT FOR SCIENTIFICCOMPUTATION) CASE STUDY”,*Shreyas Ramalingam, Master’s thesis, 2012.*“Improving High-Performance Sparse Libraries using Compiler-Assisted Specialization : A PETSc Case Study,”*Shreyas Ramalingam, M. Hall and C. Chen, Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS), held in conjunction with International Parallel and Distributed Processing Symposium, May 2012- “
*Analyzing the effect of compiler optimizations on application reliability*,” M. Demertzi, M. Annavaram and M. Hall,*Proceedings of the IEEE International Symposium on Workload Characterization*, Nov., 2011. *“Understanding the Behavior of Pthread Applications on Non-Uniform Cache Architectures,”*G. S. Sachdev, K. Sudan, M. W. Hall, and R. Balasubramonian, (poster paper), In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Oct. 2011*“Generating High Performance Libraries using CHiLL and Autotuning,”*S. Ramalingam and M. Hall, (poster), International Workshop on Languages and Compilers for Parallel Computing, Sept. 2011- “
*Auto-tuning Full Applications: A Case Study,*” A. Tiwari, C. Chen, C. Liao, J. Chame, J. Hollingsworth, M. Hall and D. Quinlan,*International Journal of High Performance Computing Applications*, 25(3):286-294, Aug. 2011. - “
*Domain-Specific Optimization of Signal Recognition Targeting FPGAs*,” M. Demertzi, P.C. Diniz, M.W. Hall, A.C. Gilbert and Y.Wang,*ACM Transactions on Reconfigurable Technology**and Systems*, 4(2), May, 2011. - “
*Evaluating graph coloring on GPUs*,” P. Grosset, P. Zhu, S. Liu, S. Venkatasubramanian, and M. Hall. In Proceedings of the*16th ACM symposium on Principles and practice of parallel programming (PPoPP ’11)*, Feb. 2011. Received runner-up for Best Student Poster. - “
*EigenCFA: Accelerating Flow Analysis with GPUs*,” T. Prabhu, S. Ramalingam , M. Might, M. Hall, In*ACM SIGPLAN Principles of Programming Languages*, Jan. 2011. *“A Programming Language Interface to Describe Transformations and Code Generation,”*G. Rudy, M. Khan, M. Hall, C. Chen and J. Chame, Lecture Notes in Computer Science, 2011, Volume 6548, Languages and Compilers for Parallel Computing, Springer Verlag, Pages 136-150- “
*Languages and Compilers for Autotuning*,” M.W. Hall and J. Chame, In*Performance Tuning of Scientic Applications*, edited by David Bailey, Robert F. Lucas and Sam Williams. Taylor and Francis publishers, Nov. 2010. *“CUDA-CHiLL: Using Compiler-Based Autotuning to Generate High-Performance GPU Libraries,”*M. Khan, G. Rudy, C. Chen, M. Hall, J. Chame, (poster) SC’10, Nov. 2010*“Automatic High-Performance GPU code Generation using CUDA-CHiLL”*, (poster) Malik Khan, Jacqueline Chame, Gabe Rudy, Chun Chen, Mary Hall, Mark Hall, Nvidia GPU Technology Conference, Sept. 2010*“CUDA-CHILL: A PROGRAMMING LANGUAGE INTERFACE FOR GPGPU OPTIMIZATIONS AND CODE GENERATION”,*Gabe Rudy, Master’s thesis, 2010.*“Takagi Factorization on GPU using CUDA,”*(poster paper), Gagandeep S. Sachdev, Vishay Vanjani and Mary W. Hall, Symposium on Application Accelerators for High Performance Computing, July, 2010- “
*GPU Accelerated Particle System for Triangulated Surface Meshes*,” (poster paper), B. Peterson, M. Datar, M. Hall and R. Whitaker,*Symposium on Application Accelerators for High Performance Computing*, July, 2010. - “
*Autotuning and Specialization: Speeding up Nek5000 with Compiler Technology*,” Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul Fischer, Paul D. Hovland,*International Conference on Supercomputing*, June, 2010. - “
*Parameterized specification, configuration and execution of data-intensive scientific work-flows,*” V.S. Kumar, T. Kurc, V. Ratnakar, J. Kim, G. Mehta, K. Vahi, Y.L. Nelson, P. Sadayappan, E. Deelman, Y. Gil, M. Hall and J. Saltz,*Cluster Computing*, April 2010. *“Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology,”*Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul D. Hovland,*Software Automatic Tuning: from concepts to state-of-the-art results*, edited by Keita Teranishi, John Cavazos, Ken Naono and Reiji Suda, Springer-Verlag Publishers, 2010, Pages 353-370*Loop Transformation Recipes for Code Generation and Auto-Tuning*Mary Hall, Jacqueline Chame, Chun Chen, Jaewook Shin and Gabe Rudy, Lecture Notes in Computer Science, 2010, Volume 5898, Languages and Compilers for Parallel Computing, Springer-Verlag, Pages 50-64.*Autotuning and Specialization: Speeding up Nek5000 with Compiler Technology*(poster) J. Shin, M. W. Hall, J. Chame, C. Chen, P. F. Fischer, P. D. Hovland, SC’09, Nov. 2009- “
*GPU Acceleration of the Generalized Interpolation Material Point Method*,” W. Chiang, M. DeLisi, T. Hummel, T. Prete, K. Tew, M. Hall, P. Wallstedt, and J. Guilkey,*Symposium on Application Accelerators for High Performance Computing*, July, 2009. - “
*Assembling Large Mosaics of Electron Microscope Images using GPU*,” (poster paper) Kannan Venkataraju, Mark Kim, Dan Gerszewski, James R. Anderson, and Mary Hall,*Symposium on Application Accelerators for High Performance Computing*, July, 2009. - “
*An Integrated Framework for Parameter-based Optimization of Scientific Work flows*,” V. S. Kumar, P. Sadayappan, G. Mehta, K. Vahi, E. Deelman, V. Ratnakar, J. Kim, Y. Gil, M. Hall, T. Kurc, J. Saltz,*Proceedings of the International Symposium on High Performance Distributed Computing*, June, 2009. - “
*Model-Guided Autotuning of High-Productivity Languages for Petascale Computing*,” H. Zima M. Hall, C. Chen, J. Chame, In*Proceedings of the International Symposium on High Performance Distributed Computing*, June, 2009. - “
*A Scalable Autotuning Framework for Compiler Optimization*,” A. Tiwari, C. Chen, J. Chame, M. Hall and J. K. Hollingsworth, In*Proceedings of the International Parallel and Distributed Processing Symposium*, May, 2009. - “
*HPC and Grid Computing for Integrative Biomedical Research*,” T. Kurc, S. Hastings, V. Kumar, S. Langella, A. Sharma, T. Pan, S. Oster, D. Ervin, J. Permar, S. Narayanan, Y. Gil, E. Deelman, M. Hall, J. Saltz,*International Journal of High Performance Computing Applications*, 2009. - “
*Compiler Research: The Next Fifty Years*,” M. Hall, D. Padua and K. Pingali,*Communications of the ACM*, Feb. 2009. - “
*Computation reuse in domain-specific optimization of signal recognition*“, (poster paper) Melina Demertzi, Pedro C. Diniz, Mary W. Hall, Anna C. Gilbert, and Yi Wang, In*Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays (FPGA ’09)*,

Feb. 2009, p. 281. - “
*Evaluating Compiler Technology for Control-Flow Optimizations for Multimedia Extension**Architectures*,” J. Shin, M. Hall and J. Chame. Award paper invited from MSP 7*International Journal of Embedded Systems*, 2009. - “
*PERI Auto-Tuning*,” David H. Bailey, Jacqueline Chame, Chun Chen, Jack Dongarra, Mary Hall, Jeffrey K. Hollingsworth, Paul Hovland, Shirley Moore, Keith Seymour, Jaewook Shin, Ananta Tiwari, Sam Williams, Haihang You,*Journal of Physics: Conference Series*, Vol. 125, 2008. - “
*Self-Configuring Applications for Heterogeneous Systems: Program Composition and Optimization Using Cognitive Techniques*,” M. Hall, Y. Gil and R. Lucas.*Proceedings of the IEEE, Special Issue on Cutting-Edge Computing*, Vol. 96(5), May 2008. - “
*Model-Guided Performance Tuning of Parameter Values: A Case Study with Molecular Dynamics Visualization*,” Y. Nelson, B. Bansal, M. Hall, A. Nakano, and K. Lerman,*Proceedings of the Workshop on High-Level Parallel Programming Models and Supportive Environments*, held in conjunction with IPDPS ’08, April, 2008.