Publications

  1. Improving Communication by Optimizing On-Node Data Movement with Data Layout Tuowen Zhao, Mary Hall, Hans Johansen, and Samuel Williams. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). Mar. 2021
  2. Data-Driven Mixed Precision Sparse Matrix Vector Multiplication for GPUs Khalid Ahmad, Hari Sundar, Mary Hall, ACM Transactions on Architectures and Code Optimization 16(5), Dec. 2019.
  3. Exploiting Reuse and Vectorization in Blocked Stencil Computations on CPUs and GPUs T. Zhao, S. Williams, M. Hall, H. Johansen  International Conference on Supercomputing, Networking, Storage and Analysis (SC), Nov. 2019.
  4. SWIRL: High-Performance Many-Core CPU Code Generation for Deep Neural Networks Anand Venkat, Tharindu Rusira, Raj Barik, Mary Hall, Leonard Truong, International Journal of High-Performance Computing Applications, 33(6), 2019.
  5. Sparse Computation Data Dependence Simplification for Efficient Compiler-Generated Inspectors M. Mohammadi, K. Cheshmi, E. Davis, M. Hall, M. Dehnavi, P. Nandy, C. Olschanowsky, A. Venkat. T. Yuki, M. Strout, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2019.
  6. The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code M. M. Strout, M. Hall and C. Olschanowsky, Proceedings of the IEEE 106(11):1921–1934, Nov. 2018.
  7. Autotuning in High-Performance Computing Applications Prasanna Balaprakash, Jack Dongarra, Todd Gamblin, Mary Hall, Jeffrey K. Hollingsworth, Boyana Norris, and Richard Vuduc, Proceedings of the IEEE 106(11):2068–2083, Nov. 2018.
  8. Automating Wavefront Parallelization for Sparse Matrix Codes A. Venkat, M. Mohamadi, J. Park, R. Barik, H. Rong, M. Strout, M. Hall, International Conference on Supercomputing, Networking, Storage and Analysis (SC), Nov. 2016, Best Paper Finalist.
  1. Data-Driven Mixed Precision Sparse Matrix Vector Multiplication for GPUs Khalid Ahmad, Hari Sundar, Mary Hall, ACM Transactions on Architectures and Code Optimization 16(5), Dec. 2019.
  2. SWIRL: High-Performance Many-Core CPU Code Generation for Deep Neural Networks Anand Venkat, Tharindu Rusira, Raj Barik, Mary Hall, Leonard Truong, International Journal of High-Performance Computing Applications, 33(6), 2019.
  3. The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code M. M. Strout, M. Hall and C. Olschanowsky, Proceedings of the IEEE 106(11):1921–1934, Nov. 2018.
  4. Autotuning in High-Performance Computing Applications Prasanna Balaprakash, Jack Dongarra, Todd Gamblin, Mary Hall, Jeffrey K. Hollingsworth, Boyana Norris, and Richard Vuduc, Proceedings of the IEEE 106(11):2068–2083, Nov. 2018.
  5. Student Cluster Competition 2017, Team University of Utah: Reproducing Vectorization of the Tersoff Multi-Body Potential on the Intel Broadwell and Intel Skylake Platforms J. Lake, Q. Chao, H. Eyre, E. Ford, K. Parker, K. Savoie, H. Sundar, M. Hall, Parallel Computing 79, Jul. 2018.
  6. Reproducing ParConnect for SC16 Marek Baranowski, Braden Caywood, Hannah Eyre, Janaan Lake, Kevin Parker, Kincaid Savoie, Hari Sundar, Mary Hall, Parallel Computing 70:18–21, Dec. 2017.
  7. Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers P. Basu, S. Williams, B. Van Straalen, L. Oliker, P. Colella, and M. Hall, Parallel Computing 64(C):50–64, May 2017.
  8. Designing a Tunable Nested Data-Parallel Programming System S. Muralidharan, M. Garland, A. Sidelnik, M. Hall, ACM Transactions on Architecture and Code Optimization, 13(4), December 2016.
  9. Towards Making Autotuning Mainstream  P. Basu, M. Hall, M. Khan, S. Maindola, S. Muralidharan, S. Ramalingam, A. Rivera, M. Shantharam, A. Venkat, International Journal of High Performance Computing Applications, 27(4), November 2013.
  10. A script-based autotuning compiler system to generate high-performance CUDA code  M. Khan, P. Basu, G. Rudy, M. Hall, C. Chen, and J. Chame. ACM Transactions on Architecture and Code Optimization, 9(4), January 2013.
  11. Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters  H. Dursun, M. Kunaseth, K. Nomura, J. Chame, R.F. Lucas, C. Chen, M. Hall, R.K. Kalia, A. Nakano, P. Vashishta, The Journal of Supercomputing, 62(2):946-966, December 2012.
  12. Auto-tuning Full Applications: A Case Study A. Tiwari, C. Chen, C. Liao, J. Chame, J. Hollingsworth, M. Hall and D. Quinlan, International Journal of High Performance Computing Applications, 25(3):286-294, Aug. 2011.
  13. Domain-Specific Optimization of Signal Recognition Targeting FPGAs M. Demertzi, P.C. Diniz, M.W. Hall, A.C. Gilbert and Y.Wang, ACM Transactions on Reconfigurable Technology and Systems, 4(2), May, 2011.
  14. Parameterized specification, configuration and execution of data-intensive scientific work-flows  V.S. Kumar, T. Kurc, V. Ratnakar, J. Kim, G. Mehta, K. Vahi, Y.L. Nelson, P. Sadayappan, E. Deelman, Y. Gil, M. Hall and J. Saltz, Cluster Computing, April 2010.
  15. Compiler Research: The Next Fifty Years M. Hall, D. Padua and K.  Pingali, Communications of the ACM, Feb. 2009.
  16. Self-Configuring Applications for Heterogeneous Systems: Program Composition and Optimization Using Cognitive Techniques M. Hall, Y. Gil and R. Lucas. Proceedings of the IEEE, Special Issue on Cutting-Edge Computing, Vol. 96(5), May 2008.
  1. Improving Communication by Optimizing On-Node Data Movement with Data Layout Tuowen Zhao, Mary Hall, Hans Johansen, and Samuel Williams. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). Mar. 2021
  2. Exploiting Reuse and Vectorization in Blocked Stencil Computations on CPUs and GPUs T. Zhao, S. Williams, M. Hall, H. Johansen  International Conference on Supercomputing, Networking, Storage and Analysis (SC), Nov. 2019.
  3. Sparse Computation Data Dependence Simplification for Efficient Compiler-Generated Inspectors M. Mohammadi, K. Cheshmi, E. Davis, M. Hall, M. Dehnavi, P. Nandy, C. Olschanowsky, A. Venkat. T. Yuki, M. Strout, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2019.
  4. Automating Wavefront Parallelization for Sparse Matrix Codes A. Venkat, M. Mohamadi, J. Park, R. Barik, H. Rong, M. Strout, M. Hall, International Conference on Supercomputing, Networking, Storage and Analysis (SC), Nov. 2016, Best Paper Finalist.
  5. Synchronization Tradeoffs in GPU Implementations of Graph Algorithms R. Kaleem, A. Venkat, S. Pai, M. Hall, K. Pingali, Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2016.
  6. Architecture-Adaptive Code Variant Tuning S. Muralidharan, A. Roy, M. Hall, M. Garland, and P. Rai, Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2016.
  7. Generating Efficient Tensor Contractions for GPUs T. Nelson, A. Rivera, P. Balaprakash, M. Hall, P.D. Hovland, E. Jessup, B. Norris, Proceedings of the IEEE International Conference on Parallel Processing (ICPP), Sept. 2015.
  8. Loop and Data Transformations for Sparse Matrix Code Anand Venkat, Mary Hall, Michelle Strout, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June, 2015.
  9. Nitro: A Framework for Adaptive Code Variant Tuning S. Muralidharan, M. Shantharam, M. Hall, M. Garland, B. Catanzaro, Proceedings of the International Parallel and Distributed Processing Symposium, May, 2014.
  10. Non-Affine Extensions to Polyhedral Code Generation A. Venkat, M. Shantharam, M. Hall, M. M. Strout, Proceedings of the International Conference on Code Generation and Optimization, Feb. 2014.
  11. Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid P. Basu, S. Williams, B. Van Straalen, A. Venkat, L. Oliker, M. Hall, High Performance Computing Conference (HiPC), December 2013.
  12. Analyzing the effect of compiler optimizations on application reliability M. Demertzi, M. Annavaram and M. Hall, Proceedings of the IEEE International Symposium on Workload Characterization, Nov., 2011.
  13. EigenCFA: Accelerating Flow Analysis with GPUs  T. Prabhu, S. Ramalingam , M. Might, M. Hall, In ACM SIGPLAN Principles of Programming Languages, Jan. 2011.
  14. Autotuning and Specialization: Speeding up Nek5000 with Compiler Technology Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul Fischer, Paul D. Hovland, International Conference on Supercomputing, June, 2010.
  15. An Integrated Framework for Parameter-based Optimization of Scientific Workflows  V. S. Kumar, P. Sadayappan, G. Mehta, K. Vahi, E. Deelman, V. Ratnakar, J. Kim, Y. Gil, M. Hall, T. Kurc, J. Saltz, Proceedings of the International Symposium on High Performance Distributed Computing, June, 2009.
  16. Model-Guided Autotuning of High-Productivity Languages for Petascale Computing H. Zima M. Hall, C. Chen, J. Chame, In Proceedings of the International Symposium on High Performance Distributed Computing, June, 2009.
  17. A Scalable Autotuning Framework for Compiler Optimization A. Tiwari, C. Chen, J. Chame, M. Hall and J. K. Hollingsworth, In Proceedings of the International Parallel and Distributed Processing Symposium, May, 2009.
  1. Polyhedral Compilation Support for C++ Features: A Case Study with CPPTRAJ A. Roy, D. Roe, M. Hall, T. Cheatham, Lecture Notes in Computer Science, 2019, Volume 11403, Languages and Compilers for Parallel Computing 2017, Springer Verlag, Pages 26-35
  2. Polyhedral Compiler Technology in Collaboration with Autotuning Important to Domain-Specific Frameworks for HPC M. Hall and P. Basu, Lecture Notes in Computer Science, 2017, Volume 10136, Languages and Compilers for Parallel Computing, Springer Verlag
  3. Optimizing LOBPCG: Sparse Matrix Loop and Data Transformations in Action K. Ahmad, A. Venkat and M. Hall, Lecture Notes in Computer Science, 2017, Volume 10136, Languages and Compilers for Parallel Computing 2016, Springer Verlag, Pages 221-231
  4. A Programming Language Interface to Describe Transformations and Code Generation,” G. Rudy, M. Khan, M. Hall, C. Chen and J. Chame, Lecture Notes in Computer Science, 2011, Volume 6548, Languages and Compilers for Parallel Computing, Springer Verlag, Pages 136-150
  5. Languages and Compilers for Autotuning,” M.W. Hall and J. Chame, In Performance Tuning of Scienti c Applications, edited by David Bailey, Robert F. Lucas and Sam Williams. Taylor and Francis publishers, Nov. 2010.
  6. Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology,” Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul D. Hovland, Software Automatic Tuning: from concepts to state-of-the-art results, edited by Keita Teranishi, John Cavazos, Ken Naono and Reiji Suda, Springer-Verlag Publishers, 2010, Pages 353-370
  7. Loop Transformation Recipes for Code Generation and Auto-Tuning,” Mary Hall, Jacqueline Chame, Chun Chen, Jaewook Shin and Gabe Rudy, Lecture Notes in Computer Science, 2010, Volume 5898, Languages and Compilers for Parallel Computing, Springer-Verlag, Pages 50-64
  1. Delivering performance-portable stencil computations on CPUs and GPUs using Bricks Zhao, Tuowen, Samuel Williams, Mary Hall, and Hans Johansen. In 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Nov. 2018.
  2. A Novel Variable-Blocking Representation for Efficient Sparse Matrix-Vector Multiply on GPUs T. Zhao, T. Rusira, K. Ahmad, and M. Hall, (poster), SC16, November, 2016.
  3. Improving High-Performance Sparse Libraries using Compiler-Assisted Specialization : A PETSc Case Study,” Shreyas Ramalingam, M. Hall and C. Chen, Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS), held in conjunction with International Parallel and Distributed Processing Symposium, May 2012
  4. Understanding the Behavior of Pthread Applications on Non-Uniform Cache Architectures,” G. S. Sachdev, K. Sudan, M. W. Hall, and R. Balasubramonian, (poster paper), In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Oct. 2011
  5. “Generating High Performance Libraries using CHiLL and Autotuning,” S. Ramalingam and M. Hall, (poster), International Workshop on Languages and Compilers for Parallel Computing, Sept. 2011
  6. Evaluating graph coloring on GPUs,” P. Grosset, P. Zhu, S. Liu, S. Venkatasubramanian, and M. Hall. In Proceedings of the 16th ACM symposium on Principles and practice of parallel programming (PPoPP ’11), Feb. 2011. Received runner-up for Best Student Poster.
  7. “CUDA-CHiLL: Using Compiler-Based Autotuning to Generate High-Performance GPU Libraries,” M. Khan, G. Rudy, C. Chen, M. Hall, J. Chame, (poster) SC’10, Nov. 2010
  8. Automatic High-Performance GPU code Generation using CUDA-CHiLL, (poster) Malik Khan, Jacqueline Chame, Gabe Rudy, Chun Chen, Mary Hall, Mark Hall, Nvidia GPU Technology Conference, Sept. 2010
  9. Takagi Factorization on GPU using CUDA,” (poster paper), Gagandeep S. Sachdev, Vishay Vanjani and Mary W. Hall, Symposium on Application Accelerators for High Performance Computing, July, 2010
  10. GPU Accelerated Particle System for Triangulated Surface Meshes,” (poster paper), B. Peterson, M. Datar, M. Hall and R. Whitaker, Symposium on Application Accelerators for High Performance Computing, July, 2010.
  11. “Autotuning and Specialization: Speeding up Nek5000 with Compiler Technology,” (poster) J. Shin, M. W. Hall, J. Chame, C. Chen, P. F. Fischer, P. D. Hovland, SC’09, Nov. 2009
  12. Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology,” Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul D. Hovland, International Workshop on Automatic Performance Tuning, October, 2009
  13. Assembling Large Mosaics of Electron Microscope Images using GPU,” (poster paper) Kannan Venkataraju, Mark Kim, Dan Gerszewski, James R. Anderson, and Mary Hall, Symposium on Application Accelerators for High Performance Computing, July, 2009.
  14. Computation reuse in domain-specific optimization of signal recognition“, (poster paper) Melina Demertzi, Pedro C. Diniz, Mary W. Hall, Anna C. Gilbert, and Yi Wang, In Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays (FPGA ’09),
    Feb. 2009, p. 281.
  15. Model-Guided Performance Tuning of Parameter Values: A Case Study with Molecular Dynamics Visualization,” Y. Nelson, B. Bansal, M. Hall, A. Nakano, and K. Lerman, Proceedings of the Workshop on High-Level Parallel Programming Models and Supportive Environments, held in conjunction with IPDPS ’08, April, 2008.
  1. “ABSTRACTIONS AND STRATEGIES FORADAPTIVE PROGRAMMING”, Saurav Muralidharan, Doctoral’s thesis, 2016.
  2. “AN INTEGRATED COMPILER AND RUNTIMEFRAMEWORK FOR SPARSE MATRIX CODES”, Anand Venkat, Doctoral’s thesis, 2016.
  3. “COMPILER OPTIMIZATIONS AND AUTOTUNINGFOR STENCILS AND GEOMETRIC MULTIGRID”, Protonu Basu, Doctoral’s thesis, 2016.
  4. “PERFORMANCE MODELING FORARCHITECTURAL ANDPROGRAM ANALYSIS”, Yu Jung Lo, Master’s thesis, 2015.
  5. “USING AUTOTUNING FOR ACCELERATINGTENSOR CONTRACTION ON GRAPHICSPROCESSING UNITS (GPUS)”, Axel Y. Rivera, Master’s thesis, 2014.
  6. “IMPROVING HIGH-PERFORMANCE SPARSELIBRARIES USING COMPILER ASSISTEDSPECIALIZATION: A PETSC (PORTABLE,EXTENSIBLE TOOLKIT FOR SCIENTIFICCOMPUTATION) CASE STUDY”, Shreyas Ramalingam, Master’s thesis, 2012.
  7. “CUDA-CHILL: A PROGRAMMING LANGUAGE INTERFACE FOR GPGPU OPTIMIZATIONS AND CODE GENERATION”, Gabe Rudy, Master’s thesis, 2010.
  1. Improving Communication by Optimizing On-Node Data Movement with Data Layout Tuowen Zhao, Mary Hall, Hans Johansen, and Samuel Williams. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). Mar. 2021
  2. Data-Driven Mixed Precision Sparse Matrix Vector Multiplication for GPUs Khalid Ahmad, Hari Sundar, Mary Hall, ACM Transactions on Architectures and Code Optimization 16(5), Dec. 2019.
  3. Exploiting Reuse and Vectorization in Blocked Stencil Computations on CPUs and GPUs T. Zhao, S. Williams, M. Hall, H. Johansen  International Conference on Supercomputing, Networking, Storage and Analysis (SC), Nov. 2019.
  4. SWIRL: High-Performance Many-Core CPU Code Generation for Deep Neural Networks Anand Venkat, Tharindu Rusira, Raj Barik, Mary Hall, Leonard Truong, International Journal of High-Performance Computing Applications, 33(6), 2019.
  5. Sparse Computation Data Dependence Simplification for Efficient Compiler-Generated Inspectors M. Mohammadi, K. Cheshmi, E. Davis, M. Hall, M. Dehnavi, P. Nandy, C. Olschanowsky, A. Venkat. T. Yuki, M. Strout, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2019.
  6. The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code M. M. Strout, M. Hall and C. Olschanowsky, Proceedings of the IEEE 106(11):1921–1934, Nov. 2018.
  7. Autotuning in High-Performance Computing Applications Prasanna Balaprakash, Jack Dongarra, Todd Gamblin, Mary Hall, Jeffrey K. Hollingsworth, Boyana Norris, and Richard Vuduc, Proceedings of the IEEE 106(11):2068–2083, Nov. 2018.
  8. Delivering performance-portable stencil computations on CPUs and GPUs using Bricks Zhao, Tuowen, Samuel Williams, Mary Hall, and Hans Johansen. In 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Nov. 2018.
  9. “ABSTRACTIONS AND STRATEGIES FORADAPTIVE PROGRAMMING”, Saurav Muralidharan, Doctoral’s thesis, 2016.
  10. “AN INTEGRATED COMPILER AND RUNTIMEFRAMEWORK FOR SPARSE MATRIX CODES”, Anand Venkat, Doctoral’s thesis, 2016.
  11. Automating Wavefront Parallelization for Sparse Matrix Codes A. Venkat, M. Mohamadi, J. Park, R. Barik, H. Rong, M. Strout, M. Hall, International Conference on Supercomputing, Networking, Storage and Analysis (SC), Nov. 2016, Best Paper Finalist.
  12. A Novel Variable-Blocking Representation for Efficient Sparse Matrix-Vector Multiply on GPUs T. Zhao, T. Rusira, K. Ahmad, and M. Hall, (poster), SC16, November, 2016.
  13. “COMPILER OPTIMIZATIONS AND AUTOTUNINGFOR STENCILS AND GEOMETRIC MULTIGRID”, Protonu Basu, Doctoral’s thesis, 2016.
  14. Optimizing LOBPCG: Sparse Matrix Loop and Data Transformations in Action K. Ahmad, A. Venkat and M. Hall, Lecture Notes in Computer Science, 2017, Volume 10136, Languages and Compilers for Parallel Computing 2016, Springer Verlag, Pages 221-231
  15. Loop and Data Transformations for Sparse Matrix Code, ” Anand Venkat, Mary Hall, Michelle Strout, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June, 2015.
  16. “PERFORMANCE MODELING FORARCHITECTURAL ANDPROGRAM ANALYSIS”, Yu Jung Lo, Master’s thesis, 2015.
  17. Nitro: A Framework for Adaptive Code Variant Tuning,” S. Muralidharan, M. Shantharam, M. Hall, M. Garland, B. Catanzaro, Proceedings of the International Parallel and Distributed Processing Symposium, May, 2014.
  18. “USING AUTOTUNING FOR ACCELERATINGTENSOR CONTRACTION ON GRAPHICSPROCESSING UNITS (GPUS)”, Axel Y. Rivera, Master’s thesis, 2014.
  19. Non-affine Extensions to Polyhedral Code Generation,” A. Venkat, M. Shantharam, M. Hall, M. M. Strout, Proceedings of the International Conference on Code Generation and Optimization, Feb. 2014.
  20. Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid,” P. Basu, S. Williams, B. Van Straalen, A. Venkat, L. Oliker, M. Hall, High Performance Computing Conference (HiPC), December 2013.
  21. Towards Making Autotuning Mainstream,” P. Basu, M. Hall, M. Khan, S. Maindola, S. Muralidharan, S. Ramalingam, A. Rivera, M. Shantharam, A. Venkat, International Journal of High Performance Computing Applications, 27(4), November 2013.
  22. A script-based autotuning compiler system to generate high-performance CUDA code,” M. Khan, P. Basu, G. Rudy, M. Hall, C. Chen, and J. Chame. ACM Transactions on Architecture and Code Optimization, 9(4), January 2013.
  23. Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters,” H. Dursun, M. Kunaseth, K. Nomura, J. Chame, R.F. Lucas, C. Chen, M. Hall, R.K. Kalia, A. Nakano, P. Vashishta, The Journal of Supercomputing, 62(2):946-966, December 2012.
  24. Understanding ACM’s Past,” M. Hall, Communications of the ACM, 55(12), December 2012.
  25. “IMPROVING HIGH-PERFORMANCE SPARSELIBRARIES USING COMPILER ASSISTEDSPECIALIZATION: A PETSC (PORTABLE,EXTENSIBLE TOOLKIT FOR SCIENTIFICCOMPUTATION) CASE STUDY”, Shreyas Ramalingam, Master’s thesis, 2012.
  26. Improving High-Performance Sparse Libraries using Compiler-Assisted Specialization : A PETSc Case Study,” Shreyas Ramalingam, M. Hall and C. Chen, Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS), held in conjunction with International Parallel and Distributed Processing Symposium, May 2012
  27. Analyzing the effect of compiler optimizations on application reliability,” M. Demertzi, M. Annavaram and M. Hall, Proceedings of the IEEE International Symposium on Workload Characterization, Nov., 2011.
  28. Understanding the Behavior of Pthread Applications on Non-Uniform Cache Architectures,G. S. Sachdev, K. Sudan, M. W. Hall, and R. Balasubramonian, (poster paper), In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Oct. 2011
  29. “Generating High Performance Libraries using CHiLL and Autotuning,” S. Ramalingam and M. Hall, (poster), International Workshop on Languages and Compilers for Parallel Computing, Sept. 2011
  30. Auto-tuning Full Applications: A Case Study,” A. Tiwari, C. Chen, C. Liao, J. Chame, J. Hollingsworth, M. Hall and D. Quinlan, International Journal of High Performance Computing Applications, 25(3):286-294, Aug. 2011.
  31. Domain-Specific Optimization of Signal Recognition Targeting FPGAs,” M. Demertzi, P.C. Diniz, M.W. Hall, A.C. Gilbert and Y.Wang, ACM Transactions on Reconfigurable Technology and Systems, 4(2), May, 2011.
  32. Evaluating graph coloring on GPUs,” P. Grosset, P. Zhu, S. Liu, S. Venkatasubramanian, and M. Hall. In Proceedings of the 16th ACM symposium on Principles and practice of parallel programming (PPoPP ’11), Feb. 2011. Received runner-up for Best Student Poster.
  33. EigenCFA: Accelerating Flow Analysis with GPUs,” T. Prabhu, S. Ramalingam , M. Might, M. Hall, In ACM SIGPLAN Principles of Programming Languages, Jan. 2011.
  34. A Programming Language Interface to Describe Transformations and Code Generation,” G. Rudy, M. Khan, M. Hall, C. Chen and J. Chame, Lecture Notes in Computer Science, 2011, Volume 6548, Languages and Compilers for Parallel Computing, Springer Verlag, Pages 136-150
  35. Languages and Compilers for Autotuning,” M.W. Hall and J. Chame, In Performance Tuning of Scientic Applications, edited by David Bailey, Robert F. Lucas and Sam Williams. Taylor and Francis publishers, Nov. 2010.
  36. “CUDA-CHiLL: Using Compiler-Based Autotuning to Generate High-Performance GPU Libraries,” M. Khan, G. Rudy, C. Chen, M. Hall, J. Chame, (poster) SC’10, Nov. 2010
  37. “Automatic High-Performance GPU code Generation using CUDA-CHiLL”, (poster) Malik Khan, Jacqueline Chame, Gabe Rudy, Chun Chen, Mary Hall, Mark Hall, Nvidia GPU Technology Conference, Sept. 2010
  38. “CUDA-CHILL: A PROGRAMMING LANGUAGE INTERFACE FOR GPGPU OPTIMIZATIONS AND CODE GENERATION”, Gabe Rudy, Master’s thesis, 2010.
  39. Takagi Factorization on GPU using CUDA,” (poster paper), Gagandeep S. Sachdev, Vishay Vanjani and Mary W. Hall, Symposium on Application Accelerators for High Performance Computing, July, 2010
  40. GPU Accelerated Particle System for Triangulated Surface Meshes,” (poster paper), B. Peterson, M. Datar, M. Hall and R. Whitaker, Symposium on Application Accelerators for High Performance Computing, July, 2010.
  41. Autotuning and Specialization: Speeding up Nek5000 with Compiler Technology,” Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul Fischer, Paul D. Hovland, International Conference on Supercomputing, June, 2010.
  42. Parameterized specification, configuration and execution of data-intensive scientific work-flows,” V.S. Kumar, T. Kurc, V. Ratnakar, J. Kim, G. Mehta, K. Vahi, Y.L. Nelson, P. Sadayappan, E. Deelman, Y. Gil, M. Hall and J. Saltz, Cluster Computing, April 2010.
  43. Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology,Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul D. Hovland, Software Automatic Tuning: from concepts to state-of-the-art results, edited by Keita Teranishi, John Cavazos, Ken Naono and Reiji Suda, Springer-Verlag Publishers, 2010, Pages 353-370
  44. Loop Transformation Recipes for Code Generation and Auto-Tuning Mary Hall, Jacqueline Chame, Chun Chen, Jaewook Shin and Gabe Rudy, Lecture Notes in Computer Science, 2010, Volume 5898, Languages and Compilers for Parallel Computing, Springer-Verlag, Pages 50-64.
  45. Autotuning and Specialization: Speeding up Nek5000 with Compiler Technology (poster) J. Shin, M. W. Hall, J. Chame, C. Chen, P. F. Fischer, P. D. Hovland, SC’09, Nov. 2009
  46. Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology,Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul D. Hovland, International Workshop on Automatic Performance Tuning, October, 2009
  47. GPU Acceleration of the Generalized Interpolation Material Point Method,” W. Chiang, M. DeLisi, T. Hummel, T. Prete, K. Tew, M. Hall, P. Wallstedt, and J. Guilkey, Symposium on Application Accelerators for High Performance Computing, July, 2009.
  48. Assembling Large Mosaics of Electron Microscope Images using GPU,” (poster paper) Kannan Venkataraju, Mark Kim, Dan Gerszewski, James R. Anderson, and Mary Hall, Symposium on Application Accelerators for High Performance Computing, July, 2009.
  49. An Integrated Framework for Parameter-based Optimization of Scientific Work flows,” V. S. Kumar, P. Sadayappan, G. Mehta, K. Vahi, E. Deelman, V. Ratnakar, J. Kim, Y. Gil, M. Hall, T. Kurc, J. Saltz, Proceedings of the International Symposium on High Performance Distributed Computing, June, 2009.
  50. Model-Guided Autotuning of High-Productivity Languages for Petascale Computing,” H. Zima M. Hall, C. Chen, J. Chame, In Proceedings of the International Symposium on High Performance Distributed Computing, June, 2009.
  51. A Scalable Autotuning Framework for Compiler Optimization,” A. Tiwari, C. Chen, J. Chame, M. Hall and J. K. Hollingsworth, In Proceedings of the International Parallel and Distributed Processing Symposium, May, 2009.
  52. HPC and Grid Computing for Integrative Biomedical Research,” T. Kurc, S. Hastings, V. Kumar, S. Langella, A. Sharma, T. Pan, S. Oster, D. Ervin, J. Permar, S. Narayanan, Y. Gil, E. Deelman, M. Hall, J. Saltz, International Journal of High Performance Computing Applications, 2009.
  53. Compiler Research: The Next Fifty Years,” M. Hall, D. Padua and K. Pingali, Communications of the ACM, Feb. 2009.
  54. Computation reuse in domain-specific optimization of signal recognition“, (poster paper) Melina Demertzi, Pedro C. Diniz, Mary W. Hall, Anna C. Gilbert, and Yi Wang, In Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays (FPGA ’09),
    Feb. 2009, p. 281.
  55. Evaluating Compiler Technology for Control-Flow Optimizations for Multimedia Extension Architectures,” J. Shin, M. Hall and J. Chame. Award paper invited from MSP 7 International Journal of Embedded Systems, 2009.
  56. PERI Auto-Tuning,” David H. Bailey, Jacqueline Chame, Chun Chen, Jack Dongarra, Mary Hall, Jeffrey K. Hollingsworth, Paul Hovland, Shirley Moore, Keith Seymour, Jaewook Shin, Ananta Tiwari, Sam Williams, Haihang You, Journal of Physics: Conference Series, Vol. 125, 2008.
  57. Self-Configuring Applications for Heterogeneous Systems: Program Composition and Optimization Using Cognitive Techniques,” M. Hall, Y. Gil and R. Lucas. Proceedings of the IEEE, Special Issue on Cutting-Edge Computing, Vol. 96(5), May 2008.
  58. Model-Guided Performance Tuning of Parameter Values: A Case Study with Molecular Dynamics Visualization,” Y. Nelson, B. Bansal, M. Hall, A. Nakano, and K. Lerman, Proceedings of the Workshop on High-Level Parallel Programming Models and Supportive Environments, held in conjunction with IPDPS ’08, April, 2008.