1. Improving Communication by Optimizing On-Node Data Movement with Data Layout Tuowen Zhao, Mary Hall, Hans Johansen, and Samuel Williams. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). Mar. 2021
  2. Data-Driven Mixed Precision Sparse Matrix Vector Multiplication for GPUs Khalid Ahmad, Hari Sundar, Mary Hall, ACM Transactions on Architectures and Code Optimization 16(5), Dec. 2019.
  3. Exploiting Reuse and Vectorization in Blocked Stencil Computations on CPUs and GPUs T. Zhao, S. Williams, M. Hall, H. Johansen  International Conference on Supercomputing, Networking, Storage and Analysis (SC), Nov. 2019.
  4. SWIRL: High-Performance Many-Core CPU Code Generation for Deep Neural Networks Anand Venkat, Tharindu Rusira, Raj Barik, Mary Hall, Leonard Truong, International Journal of High-Performance Computing Applications, 33(6), 2019.
  5. Sparse Computation Data Dependence Simplification for Efficient Compiler-Generated Inspectors M. Mohammadi, K. Cheshmi, E. Davis, M. Hall, M. Dehnavi, P. Nandy, C. Olschanowsky, A. Venkat. T. Yuki, M. Strout, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2019.
  6. The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code M. M. Strout, M. Hall and C. Olschanowsky, Proceedings of the IEEE 106(11):1921–1934, Nov. 2018.
  7. Autotuning in High-Performance Computing Applications Prasanna Balaprakash, Jack Dongarra, Todd Gamblin, Mary Hall, Jeffrey K. Hollingsworth, Boyana Norris, and Richard Vuduc, Proceedings of the IEEE 106(11):2068–2083, Nov. 2018.
  8. Delivering performance-portable stencil computations on CPUs and GPUs using Bricks Zhao, Tuowen, Samuel Williams, Mary Hall, and Hans Johansen. In 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Nov. 2018.
  9. “ABSTRACTIONS AND STRATEGIES FORADAPTIVE PROGRAMMING”, Saurav Muralidharan, Doctoral’s thesis, 2016.
  10. “AN INTEGRATED COMPILER AND RUNTIMEFRAMEWORK FOR SPARSE MATRIX CODES”, Anand Venkat, Doctoral’s thesis, 2016.
  11. Automating Wavefront Parallelization for Sparse Matrix Codes A. Venkat, M. Mohamadi, J. Park, R. Barik, H. Rong, M. Strout, M. Hall, International Conference on Supercomputing, Networking, Storage and Analysis (SC), Nov. 2016, Best Paper Finalist.
  12. A Novel Variable-Blocking Representation for Efficient Sparse Matrix-Vector Multiply on GPUs T. Zhao, T. Rusira, K. Ahmad, and M. Hall, (poster), SC16, November, 2016.
  13. “COMPILER OPTIMIZATIONS AND AUTOTUNINGFOR STENCILS AND GEOMETRIC MULTIGRID”, Protonu Basu, Doctoral’s thesis, 2016.
  14. Optimizing LOBPCG: Sparse Matrix Loop and Data Transformations in Action K. Ahmad, A. Venkat and M. Hall, Lecture Notes in Computer Science, 2017, Volume 10136, Languages and Compilers for Parallel Computing 2016, Springer Verlag, Pages 221-231
  15. Loop and Data Transformations for Sparse Matrix Code, ” Anand Venkat, Mary Hall, Michelle Strout, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June, 2015.
  16. “PERFORMANCE MODELING FORARCHITECTURAL ANDPROGRAM ANALYSIS”, Yu Jung Lo, Master’s thesis, 2015.
  17. Nitro: A Framework for Adaptive Code Variant Tuning,” S. Muralidharan, M. Shantharam, M. Hall, M. Garland, B. Catanzaro, Proceedings of the International Parallel and Distributed Processing Symposium, May, 2014.
  18. “USING AUTOTUNING FOR ACCELERATINGTENSOR CONTRACTION ON GRAPHICSPROCESSING UNITS (GPUS)”, Axel Y. Rivera, Master’s thesis, 2014.
  19. Non-affine Extensions to Polyhedral Code Generation,” A. Venkat, M. Shantharam, M. Hall, M. M. Strout, Proceedings of the International Conference on Code Generation and Optimization, Feb. 2014.
  20. Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid,” P. Basu, S. Williams, B. Van Straalen, A. Venkat, L. Oliker, M. Hall, High Performance Computing Conference (HiPC), December 2013.
  21. Towards Making Autotuning Mainstream,” P. Basu, M. Hall, M. Khan, S. Maindola, S. Muralidharan, S. Ramalingam, A. Rivera, M. Shantharam, A. Venkat, International Journal of High Performance Computing Applications, 27(4), November 2013.
  22. A script-based autotuning compiler system to generate high-performance CUDA code,” M. Khan, P. Basu, G. Rudy, M. Hall, C. Chen, and J. Chame. ACM Transactions on Architecture and Code Optimization, 9(4), January 2013.
  23. Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters,” H. Dursun, M. Kunaseth, K. Nomura, J. Chame, R.F. Lucas, C. Chen, M. Hall, R.K. Kalia, A. Nakano, P. Vashishta, The Journal of Supercomputing, 62(2):946-966, December 2012.
  24. Understanding ACM’s Past,” M. Hall, Communications of the ACM, 55(12), December 2012.
  25. “IMPROVING HIGH-PERFORMANCE SPARSELIBRARIES USING COMPILER ASSISTEDSPECIALIZATION: A PETSC (PORTABLE,EXTENSIBLE TOOLKIT FOR SCIENTIFICCOMPUTATION) CASE STUDY”, Shreyas Ramalingam, Master’s thesis, 2012.
  26. Improving High-Performance Sparse Libraries using Compiler-Assisted Specialization : A PETSc Case Study,” Shreyas Ramalingam, M. Hall and C. Chen, Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS), held in conjunction with International Parallel and Distributed Processing Symposium, May 2012
  27. Analyzing the effect of compiler optimizations on application reliability,” M. Demertzi, M. Annavaram and M. Hall, Proceedings of the IEEE International Symposium on Workload Characterization, Nov., 2011.
  28. Understanding the Behavior of Pthread Applications on Non-Uniform Cache Architectures,G. S. Sachdev, K. Sudan, M. W. Hall, and R. Balasubramonian, (poster paper), In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Oct. 2011
  29. “Generating High Performance Libraries using CHiLL and Autotuning,” S. Ramalingam and M. Hall, (poster), International Workshop on Languages and Compilers for Parallel Computing, Sept. 2011
  30. Auto-tuning Full Applications: A Case Study,” A. Tiwari, C. Chen, C. Liao, J. Chame, J. Hollingsworth, M. Hall and D. Quinlan, International Journal of High Performance Computing Applications, 25(3):286-294, Aug. 2011.
  31. Domain-Specific Optimization of Signal Recognition Targeting FPGAs,” M. Demertzi, P.C. Diniz, M.W. Hall, A.C. Gilbert and Y.Wang, ACM Transactions on Reconfigurable Technology and Systems, 4(2), May, 2011.
  32. Evaluating graph coloring on GPUs,” P. Grosset, P. Zhu, S. Liu, S. Venkatasubramanian, and M. Hall. In Proceedings of the 16th ACM symposium on Principles and practice of parallel programming (PPoPP ’11), Feb. 2011. Received runner-up for Best Student Poster.
  33. EigenCFA: Accelerating Flow Analysis with GPUs,” T. Prabhu, S. Ramalingam , M. Might, M. Hall, In ACM SIGPLAN Principles of Programming Languages, Jan. 2011.
  34. A Programming Language Interface to Describe Transformations and Code Generation,” G. Rudy, M. Khan, M. Hall, C. Chen and J. Chame, Lecture Notes in Computer Science, 2011, Volume 6548, Languages and Compilers for Parallel Computing, Springer Verlag, Pages 136-150
  35. Languages and Compilers for Autotuning,” M.W. Hall and J. Chame, In Performance Tuning of Scientic Applications, edited by David Bailey, Robert F. Lucas and Sam Williams. Taylor and Francis publishers, Nov. 2010.
  36. “CUDA-CHiLL: Using Compiler-Based Autotuning to Generate High-Performance GPU Libraries,” M. Khan, G. Rudy, C. Chen, M. Hall, J. Chame, (poster) SC’10, Nov. 2010
  37. “Automatic High-Performance GPU code Generation using CUDA-CHiLL”, (poster) Malik Khan, Jacqueline Chame, Gabe Rudy, Chun Chen, Mary Hall, Mark Hall, Nvidia GPU Technology Conference, Sept. 2010
  38. “CUDA-CHILL: A PROGRAMMING LANGUAGE INTERFACE FOR GPGPU OPTIMIZATIONS AND CODE GENERATION”, Gabe Rudy, Master’s thesis, 2010.
  39. Takagi Factorization on GPU using CUDA,” (poster paper), Gagandeep S. Sachdev, Vishay Vanjani and Mary W. Hall, Symposium on Application Accelerators for High Performance Computing, July, 2010
  40. GPU Accelerated Particle System for Triangulated Surface Meshes,” (poster paper), B. Peterson, M. Datar, M. Hall and R. Whitaker, Symposium on Application Accelerators for High Performance Computing, July, 2010.
  41. Autotuning and Specialization: Speeding up Nek5000 with Compiler Technology,” Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul Fischer, Paul D. Hovland, International Conference on Supercomputing, June, 2010.
  42. Parameterized specification, configuration and execution of data-intensive scientific work-flows,” V.S. Kumar, T. Kurc, V. Ratnakar, J. Kim, G. Mehta, K. Vahi, Y.L. Nelson, P. Sadayappan, E. Deelman, Y. Gil, M. Hall and J. Saltz, Cluster Computing, April 2010.
  43. Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology,Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul D. Hovland, Software Automatic Tuning: from concepts to state-of-the-art results, edited by Keita Teranishi, John Cavazos, Ken Naono and Reiji Suda, Springer-Verlag Publishers, 2010, Pages 353-370
  44. Loop Transformation Recipes for Code Generation and Auto-Tuning Mary Hall, Jacqueline Chame, Chun Chen, Jaewook Shin and Gabe Rudy, Lecture Notes in Computer Science, 2010, Volume 5898, Languages and Compilers for Parallel Computing, Springer-Verlag, Pages 50-64.
  45. Autotuning and Specialization: Speeding up Nek5000 with Compiler Technology (poster) J. Shin, M. W. Hall, J. Chame, C. Chen, P. F. Fischer, P. D. Hovland, SC’09, Nov. 2009
  46. Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology,Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul D. Hovland, International Workshop on Automatic Performance Tuning, October, 2009
  47. GPU Acceleration of the Generalized Interpolation Material Point Method,” W. Chiang, M. DeLisi, T. Hummel, T. Prete, K. Tew, M. Hall, P. Wallstedt, and J. Guilkey, Symposium on Application Accelerators for High Performance Computing, July, 2009.
  48. Assembling Large Mosaics of Electron Microscope Images using GPU,” (poster paper) Kannan Venkataraju, Mark Kim, Dan Gerszewski, James R. Anderson, and Mary Hall, Symposium on Application Accelerators for High Performance Computing, July, 2009.
  49. An Integrated Framework for Parameter-based Optimization of Scientific Work flows,” V. S. Kumar, P. Sadayappan, G. Mehta, K. Vahi, E. Deelman, V. Ratnakar, J. Kim, Y. Gil, M. Hall, T. Kurc, J. Saltz, Proceedings of the International Symposium on High Performance Distributed Computing, June, 2009.
  50. Model-Guided Autotuning of High-Productivity Languages for Petascale Computing,” H. Zima M. Hall, C. Chen, J. Chame, In Proceedings of the International Symposium on High Performance Distributed Computing, June, 2009.
  51. A Scalable Autotuning Framework for Compiler Optimization,” A. Tiwari, C. Chen, J. Chame, M. Hall and J. K. Hollingsworth, In Proceedings of the International Parallel and Distributed Processing Symposium, May, 2009.
  52. HPC and Grid Computing for Integrative Biomedical Research,” T. Kurc, S. Hastings, V. Kumar, S. Langella, A. Sharma, T. Pan, S. Oster, D. Ervin, J. Permar, S. Narayanan, Y. Gil, E. Deelman, M. Hall, J. Saltz, International Journal of High Performance Computing Applications, 2009.
  53. Compiler Research: The Next Fifty Years,” M. Hall, D. Padua and K. Pingali, Communications of the ACM, Feb. 2009.
  54. Computation reuse in domain-specific optimization of signal recognition“, (poster paper) Melina Demertzi, Pedro C. Diniz, Mary W. Hall, Anna C. Gilbert, and Yi Wang, In Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays (FPGA ’09),
    Feb. 2009, p. 281.
  55. Evaluating Compiler Technology for Control-Flow Optimizations for Multimedia Extension Architectures,” J. Shin, M. Hall and J. Chame. Award paper invited from MSP 7 International Journal of Embedded Systems, 2009.
  56. PERI Auto-Tuning,” David H. Bailey, Jacqueline Chame, Chun Chen, Jack Dongarra, Mary Hall, Jeffrey K. Hollingsworth, Paul Hovland, Shirley Moore, Keith Seymour, Jaewook Shin, Ananta Tiwari, Sam Williams, Haihang You, Journal of Physics: Conference Series, Vol. 125, 2008.
  57. Self-Configuring Applications for Heterogeneous Systems: Program Composition and Optimization Using Cognitive Techniques,” M. Hall, Y. Gil and R. Lucas. Proceedings of the IEEE, Special Issue on Cutting-Edge Computing, Vol. 96(5), May 2008.
  58. Model-Guided Performance Tuning of Parameter Values: A Case Study with Molecular Dynamics Visualization,” Y. Nelson, B. Bansal, M. Hall, A. Nakano, and K. Lerman, Proceedings of the Workshop on High-Level Parallel Programming Models and Supportive Environments, held in conjunction with IPDPS ’08, April, 2008.