Publications

  1. Loop and Data Transformations for Sparse Matrix Code, “Anand Venkat, Mary Hall, Michelle Strout, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June, 2015.
  2. Compiler-Directed Transformation for Higher-Order Stencils,” Protonu Basu, Samuel Williams, Brian Van Straalen, Leonid Oliker, Philip Colella, Mary Hall,  International Parallel And Distributed Processing Symposium (IPDPS) 2015, May, 2014.
  3. Converting Stencils to Accumulations for Communication-Avoiding Optimization in Geometric Multigrid,” Protonu Basu, Samuel Williams, Brian Van Straalen, Leonid Oliker, Mary Hall Workshop on Optimizing Stencil Computations (WOSC) at SPLASH 2014, May, 2014.
  4. Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid,” Protonu Basu, Samuel Williams, Brian Van Straalen, A. Venkat, Leonid Oliker, Mary Hall Workshop on Optimizing Stencil Computations (WOSC) at SPLASH 2013, May, 2013.
  5. Nitro: A Framework for Adaptive Code Variant Tuning,” S. Muralidharan, M. Shantharam, M. Hall, M. Garland, B. Catanzaro, Proceedings of the International Parallel and Distributed Processing Symposium, May, 2014.
  6. “Non-affine Extensions to Polyhedral Code Generation,” A. Venkat, M. Shantharam, M. Hall, M. M. Strout, Proceedings of the International Conference on Code Generation and Optimization, Feb. 2014.
  7. Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid,” P. Basu, S. Williams, B. Van Straalen, A. Venkat, L. Oliker, M. Hall, High Performance Computing Conference (HiPC), December 2013.
  8. Towards Making Autotuning Mainstream,” P. Basu, M. Hall, M. Khan, S. Maindola, S. Muralidharan, S. Ramalingam, A. Rivera, M. Shantharam, A. Venkat, International Journal of High Performance Computing Applications, 27(4), November 2013.
  9. A script-based autotuning compiler system to generate high-performance CUDA code,” M. Khan, P. Basu, G. Rudy, M. Hall, C. Chen, and J. Chame. ACM Transactions on Architecture and Code Optimization, 9(4), January 2013.
  10. Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters,” H. Dursun, M. Kunaseth, K. Nomura, J. Chame, R.F. Lucas, C. Chen, M. Hall, R.K. Kalia, A. Nakano, P. Vashishta, The Journal of Supercomputing, 62(2):946-966, December 2012.
  1.  “Towards Making Autotuning Mainstream,” P. Basu, M. Hall, M. Khan, S. Maindola, S. Muralidharan, S. Ramalingam, A. Rivera, M. Shantharam, A. Venkat, International Journal of High Performance Computing Applications, 27(4), November 2013.
  2. A script-based autotuning compiler system to generate high-performance CUDA code,” M. Khan, P. Basu, G. Rudy, M. Hall, C. Chen, and J. Chame. ACM Transactions on Architecture and Code Optimization, 9(4), January 2013.
  3. Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters,” H. Dursun, M. Kunaseth, K. Nomura, J. Chame, R.F. Lucas, C. Chen, M. Hall, R.K. Kalia, A. Nakano, P. Vashishta, The Journal of Supercomputing, 62(2):946-966, December 2012.
  4. Understanding ACM’s Past,” M. Hall, Communications of the ACM, 55(12), December 2012.
  5. Auto-tuning Full Applications: A Case Study,” A. Tiwari, C. Chen, C. Liao, J. Chame, J. Hollingsworth, M. Hall and D. Quinlan, International Journal of High Performance Computing Applications, 25(3):286-294, Aug. 2011.
  6. Domain-Specifi c Optimization of Signal Recognition Targeting FPGAs,” M. Demertzi, P.C. Diniz, M.W. Hall, A.C. Gilbert and Y.Wang, ACM Transactions on Reconfi gurable Technology and Systems, 4(2), May, 2011.
  7. Parameterized speci fication, confi guration and execution of data-intensive scienti fic work-flows,” V.S. Kumar, T. Kurc, V. Ratnakar, J. Kim, G. Mehta, K. Vahi, Y.L. Nelson, P. Sadayappan, E. Deelman, Y. Gil, M. Hall and J. Saltz, Cluster Computing, April 2010.
  8. HPC and Grid Computing for Integrative Biomedical Research,” T. Kurc, S. Hastings, V. Kumar, S. Langella, A. Sharma, T. Pan, S. Oster, D. Ervin, J. Permar, S. Narayanan, Y. Gil, E. Deelman, M. Hall, J. Saltz, International Journal of High Performance Computing Applications, 2009.
  9. Compiler Research: The Next Fifty Years,” M. Hall, D. Padua and K.  Pingali, Communications of the ACM, Feb. 2009.
  10. Evaluating Compiler Technology for Control-Flow Optimizations for Multimedia Extension Architectures,” J. Shin, M. Hall and J. Chame. Award paper invited from MSP 7 International Journal of Embedded Systems, 2009.
  11. PERI Auto-Tuning,” David H. Bailey, Jacqueline Chame, Chun Chen, Jack Dongarra, Mary Hall, Je rey K. Hollingsworth, Paul Hovland, Shirley Moore, Keith Seymour, Jaewook Shin, Ananta Tiwari, Sam Williams, Haihang You, Journal of Physics: Conference Series, Vol. 125, 2008.
  12. Self-Con figuring Applications for Heterogeneous Systems: Program Composition and Optimization Using Cognitive Techniques,” M. Hall, Y. Gil and R. Lucas. Proceedings of the IEEE, Special Issue on Cutting-Edge Computing, Vol. 96(5), May 2008.
  1. Loop and Data Transformations for Sparse Matrix Code, ” Anand Venkat, Mary Hall, Michelle Strout, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June, 2015.
  2. Nitro: A Framework for Adaptive Code Variant Tuning,” S. Muralidharan, M. Shantharam, M. Hall, M. Garland, B. Catanzaro, Proceedings of the International Parallel and Distributed Processing Symposium, May, 2014.
  3. “Non-affine Extensions to Polyhedral Code Generation,” A. Venkat, M. Shantharam, M. Hall, M. M. Strout, Proceedings of the International Conference on Code Generation and Optimization, Feb. 2014.
  4. Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid,” P. Basu, S. Williams, B. Van Straalen, A. Venkat, L. Oliker, M. Hall, High Performance Computing Conference (HiPC), December 2013.
  5. Analyzing the eff ect of compiler optimizations on application reliability,” M. Demertzi, M. Annavaram and M. Hall, Proceedings of the IEEE International Symposium on Workload Characterization, Nov., 2011.
  6. EigenCFA: Accelerating Flow Analysis with GPUs,” T. Prabhu, S. Ramalingam , M. Might, M. Hall, In ACM SIGPLAN Principles of Programming Languages, Jan. 2011.
  7. Autotuning and Specialization: Speeding up Nek5000 with Compiler Technology,” Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul Fischer, Paul D. Hovland, International Conference on Supercomputing, June, 2010.
  8. GPU Acceleration of the Generalized Interpolation Material Point Method,” W. Chiang, M. DeLisi, T. Hummel, T. Prete, K. Tew, M. Hall, P. Wallstedt, and J. Guilkey, Symposium on Application Accelerators for High Performance Computing, July, 2009.
  9. An Integrated Framework for Parameter-based Optimization of Scientific Work flows,” V. S. Kumar, P. Sadayappan, G. Mehta, K. Vahi, E. Deelman, V. Ratnakar, J. Kim, Y. Gil, M. Hall, T. Kurc, J. Saltz, Proceedings of the International Symposium on High Performance Distributed Computing, June, 2009.
  10. Model-Guided Autotuning of High-Productivity Languages for Petascale Computing,” H. Zima M. Hall, C. Chen, J. Chame, In Proceedings of the International Symposium on High Performance Distributed Computing, June, 2009.
  11. A Scalable Autotuning Framework for Compiler Optimization,” A. Tiwari, C. Chen, J. Chame, M. Hall and J. K. Hollingsworth, In Proceedings of the International Parallel and Distributed Processing Symposium, May, 2009.
  1. “A Programming Language Interface to Describe Transformations and Code Generation,” G. Rudy, M. Khan, M. Hall, C. Chen and J. Chame, Lecture Notes in Computer Science, 2011, Volume 6548, Languages and Compilers for Parallel Computing, Springer Verlag, Pages 136-150
  2. Languages and Compilers for Autotuning,” M.W. Hall and J. Chame, In Performance Tuning of Scienti c Applications, edited by David Bailey, Robert F. Lucas and Sam Williams. Taylor and Francis publishers, Nov. 2010.
  3. “Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology,” Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul D. Hovland, Software Automatic Tuning: from concepts to state-of-the-art results, edited by Keita Teranishi, John Cavazos, Ken Naono and Reiji Suda, Springer-Verlag Publishers, 2010, Pages 353-370
  4. “Loop Transformation Recipes for Code Generation and Auto-Tuning,” Mary Hall, Jacqueline Chame, Chun Chen, Jaewook Shin and Gabe Rudy, Lecture Notes in Computer Science, 2010, Volume 5898, Languages and Compilers for Parallel Computing, Springer-Verlag, Pages 50-64
  1. “Improving High-Performance Sparse Libraries using Compiler-Assisted Specialization : A PETSc Case Study,” Shreyas Ramalingam, M. Hall and C. Chen, Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS), held in conjunction with International Parallel and Distributed Processing Symposium, May 2012
  2. “Understanding the Behavior of Pthread Applications on Non-Uniform Cache Architectures,” G. S. Sachdev, K. Sudan, M. W. Hall, and R. Balasubramonian, (poster paper), In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Oct. 2011
  3. “Generating High Performance Libraries using CHiLL and Autotuning,” S. Ramalingam and M. Hall, (poster), International Workshop on Languages and Compilers for Parallel Computing, Sept. 2011
  4. Evaluating graph coloring on GPUs,” P. Grosset, P. Zhu, S. Liu, S. Venkatasubramanian, and M. Hall. In Proceedings of the 16th ACM symposium on Principles and practice of parallel programming (PPoPP ’11), Feb. 2011. Received runner-up for Best Student Poster.
  5. “CUDA-CHiLL: Using Compiler-Based Autotuning to Generate High-Performance GPU Libraries,” M. Khan, G. Rudy, C. Chen, M. Hall, J. Chame, (poster) SC’10, Nov. 2010
  6. “Automatic High-Performance GPU code Generation using CUDA-CHiLL”, (poster) Malik Khan, Jacqueline Chame, Gabe Rudy, Chun Chen, Mary Hall, Mark Hall, Nvidia GPU Technology Conference, Sept. 2010
  7. “Takagi Factorization on GPU using CUDA,” (poster paper), Gagandeep S. Sachdev, Vishay Vanjani and Mary W. Hall, Symposium on Application Accelerators for High Performance Computing, July, 2010
  8. GPU Accelerated Particle System for Triangulated Surface Meshes,” (poster paper), B. Peterson, M. Datar, M. Hall and R. Whitaker, Symposium on Application Accelerators for High Performance Computing, July, 2010.
  9. “Autotuning and Specialization: Speeding up Nek5000 with Compiler Technology,” (poster) J. Shin, M. W. Hall, J. Chame, C. Chen, P. F. Fischer, P. D. Hovland, SC’09, Nov. 2009
  10. “Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology,” Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul D. Hovland, International Workshop on Automatic Performance Tuning, October, 2009
  11. Assembling Large Mosaics of Electron Microscope Images using GPU,” (poster paper) Kannan Venkataraju, Mark Kim, Dan Gerszewski, James R. Anderson, and Mary Hall, Symposium on Application Accelerators for High Performance Computing, July, 2009.
  12. Computation reuse in domain-speci c optimization of signal recognition“, (poster paper) Melina Demertzi, Pedro C. Diniz, Mary W. Hall, Anna C. Gilbert, and Yi Wang, In Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays (FPGA ’09),
    Feb. 2009, p. 281.
  13. Model-Guided Performance Tuning of Parameter Values: A Case Study with Molecular Dynamics Visualization,” Y. Nelson, B. Bansal, M. Hall, A. Nakano, and K. Lerman, Proceedings of the Workshop on High-Level Parallel Programming Models and Supportive Environments, held in conjunction with IPDPS ’08, April, 2008.
  1. Loop and Data Transformations for Sparse Matrix Code, ” Anand Venkat, Mary Hall, Michelle Strout, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June, 2015.
  2. Nitro: A Framework for Adaptive Code Variant Tuning,” S. Muralidharan, M. Shantharam, M. Hall, M. Garland, B. Catanzaro, Proceedings of the International Parallel and Distributed Processing Symposium, May, 2014.
  3. “Non-affine Extensions to Polyhedral Code Generation,” A. Venkat, M. Shantharam, M. Hall, M. M. Strout, Proceedings of the International Conference on Code Generation and Optimization, Feb. 2014.
  4. Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid,” P. Basu, S. Williams, B. Van Straalen, A. Venkat, L. Oliker, M. Hall, High Performance Computing Conference (HiPC), December 2013.
  5. Towards Making Autotuning Mainstream,” P. Basu, M. Hall, M. Khan, S. Maindola, S. Muralidharan, S. Ramalingam, A. Rivera, M. Shantharam, A. Venkat, International Journal of High Performance Computing Applications, 27(4), November 2013.
  6. A script-based autotuning compiler system to generate high-performance CUDA code,” M. Khan, P. Basu, G. Rudy, M. Hall, C. Chen, and J. Chame. ACM Transactions on Architecture and Code Optimization, 9(4), January 2013.
  7. Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters,” H. Dursun, M. Kunaseth, K. Nomura, J. Chame, R.F. Lucas, C. Chen, M. Hall, R.K. Kalia, A. Nakano, P. Vashishta, The Journal of Supercomputing, 62(2):946-966, December 2012.
  8. Understanding ACM’s Past,” M. Hall, Communications of the ACM, 55(12), December 2012.
  9. “Improving High-Performance Sparse Libraries using Compiler-Assisted Specialization : A PETSc Case Study,” Shreyas Ramalingam, M. Hall and C. Chen, Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS), held in conjunction with International Parallel and Distributed Processing Symposium, May 2012
  10. Analyzing the effect of compiler optimizations on application reliability,” M. Demertzi, M. Annavaram and M. Hall, Proceedings of the IEEE International Symposium on Workload Characterization, Nov., 2011.
  11. “Understanding the Behavior of Pthread Applications on Non-Uniform Cache Architectures,” G. S. Sachdev, K. Sudan, M. W. Hall, and R. Balasubramonian, (poster paper), In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Oct. 2011
  12. “Generating High Performance Libraries using CHiLL and Autotuning,” S. Ramalingam and M. Hall, (poster), International Workshop on Languages and Compilers for Parallel Computing, Sept. 2011
  13. Auto-tuning Full Applications: A Case Study,” A. Tiwari, C. Chen, C. Liao, J. Chame, J. Hollingsworth, M. Hall and D. Quinlan, International Journal of High Performance Computing Applications, 25(3):286-294, Aug. 2011.
  14. Domain-Specific Optimization of Signal Recognition Targeting FPGAs,” M. Demertzi, P.C. Diniz, M.W. Hall, A.C. Gilbert and Y.Wang, ACM Transactions on Reconfigurable Technology and Systems, 4(2), May, 2011.
  15. Evaluating graph coloring on GPUs,” P. Grosset, P. Zhu, S. Liu, S. Venkatasubramanian, and M. Hall. In Proceedings of the 16th ACM symposium on Principles and practice of parallel programming (PPoPP ’11), Feb. 2011. Received runner-up for Best Student Poster.
  16. EigenCFA: Accelerating Flow Analysis with GPUs,” T. Prabhu, S. Ramalingam , M. Might, M. Hall, In ACM SIGPLAN Principles of Programming Languages, Jan. 2011.
  17. “A Programming Language Interface to Describe Transformations and Code Generation,” G. Rudy, M. Khan, M. Hall, C. Chen and J. Chame, Lecture Notes in Computer Science, 2011, Volume 6548, Languages and Compilers for Parallel Computing, Springer Verlag, Pages 136-150
  18. Languages and Compilers for Autotuning,” M.W. Hall and J. Chame, In Performance Tuning of Scientic Applications, edited by David Bailey, Robert F. Lucas and Sam Williams. Taylor and Francis publishers, Nov. 2010.
  19. “CUDA-CHiLL: Using Compiler-Based Autotuning to Generate High-Performance GPU Libraries,” M. Khan, G. Rudy, C. Chen, M. Hall, J. Chame, (poster) SC’10, Nov. 2010
  20. “Automatic High-Performance GPU code Generation using CUDA-CHiLL”, (poster) Malik Khan, Jacqueline Chame, Gabe Rudy, Chun Chen, Mary Hall, Mark Hall, Nvidia GPU Technology Conference, Sept. 2010
  21. “CUDA-CHILL: A PROGRAMMING LANGUAGE INTERFACE FOR GPGPU OPTIMIZATIONS AND CODE GENERATION”, Gabe Rudy, Master’s thesis, 2010.
  22. “Takagi Factorization on GPU using CUDA,” (poster paper), Gagandeep S. Sachdev, Vishay Vanjani and Mary W. Hall, Symposium on Application Accelerators for High Performance Computing, July, 2010
  23. GPU Accelerated Particle System for Triangulated Surface Meshes,” (poster paper), B. Peterson, M. Datar, M. Hall and R. Whitaker, Symposium on Application Accelerators for High Performance Computing, July, 2010.
  24. Autotuning and Specialization: Speeding up Nek5000 with Compiler Technology,” Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul Fischer, Paul D. Hovland, International Conference on Supercomputing, June, 2010.
  25. Parameterized specification, configuration and execution of data-intensive scientific work-flows,” V.S. Kumar, T. Kurc, V. Ratnakar, J. Kim, G. Mehta, K. Vahi, Y.L. Nelson, P. Sadayappan, E. Deelman, Y. Gil, M. Hall and J. Saltz, Cluster Computing, April 2010.
  26. “Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology,” Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul D. Hovland, Software Automatic Tuning: from concepts to state-of-the-art results, edited by Keita Teranishi, John Cavazos, Ken Naono and Reiji Suda, Springer-Verlag Publishers, 2010, Pages 353-370
  27. “Loop Transformation Recipes for Code Generation and Auto-Tuning,” Mary Hall, Jacqueline Chame, Chun Chen, Jaewook Shin and Gabe Rudy, Lecture Notes in Computer Science, 2010, Volume 5898, Languages and Compilers for Parallel Computing, Springer-Verlag, Pages 50-64
  28. “Autotuning and Specialization: Speeding up Nek5000 with Compiler Technology,” (poster) J. Shin, M. W. Hall, J. Chame, C. Chen, P. F. Fischer, P. D. Hovland, SC’09, Nov. 2009
  29. “Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology,” Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul D. Hovland, International Workshop on Automatic Performance Tuning, October, 2009
  30. GPU Acceleration of the Generalized Interpolation Material Point Method,” W. Chiang, M. DeLisi, T. Hummel, T. Prete, K. Tew, M. Hall, P. Wallstedt, and J. Guilkey, Symposium on Application Accelerators for High Performance Computing, July, 2009.
  31. Assembling Large Mosaics of Electron Microscope Images using GPU,” (poster paper) Kannan Venkataraju, Mark Kim, Dan Gerszewski, James R. Anderson, and Mary Hall, Symposium on Application Accelerators for High Performance Computing, July, 2009.
  32. An Integrated Framework for Parameter-based Optimization of Scientific Work flows,” V. S. Kumar, P. Sadayappan, G. Mehta, K. Vahi, E. Deelman, V. Ratnakar, J. Kim, Y. Gil, M. Hall, T. Kurc, J. Saltz, Proceedings of the International Symposium on High Performance Distributed Computing, June, 2009.
  33. Model-Guided Autotuning of High-Productivity Languages for Petascale Computing,” H. Zima M. Hall, C. Chen, J. Chame, In Proceedings of the International Symposium on High Performance Distributed Computing, June, 2009.
  34. A Scalable Autotuning Framework for Compiler Optimization,” A. Tiwari, C. Chen, J. Chame, M. Hall and J. K. Hollingsworth, In Proceedings of the International Parallel and Distributed Processing Symposium, May, 2009.
  35. HPC and Grid Computing for Integrative Biomedical Research,” T. Kurc, S. Hastings, V. Kumar, S. Langella, A. Sharma, T. Pan, S. Oster, D. Ervin, J. Permar, S. Narayanan, Y. Gil, E. Deelman, M. Hall, J. Saltz, International Journal of High Performance Computing Applications, 2009.
  36. Compiler Research: The Next Fifty Years,” M. Hall, D. Padua and K. Pingali, Communications of the ACM, Feb. 2009.
  37. Computation reuse in domain-specific optimization of signal recognition“, (poster paper) Melina Demertzi, Pedro C. Diniz, Mary W. Hall, Anna C. Gilbert, and Yi Wang, In Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays (FPGA ’09),
    Feb. 2009, p. 281.
  38. Evaluating Compiler Technology for Control-Flow Optimizations for Multimedia Extension Architectures,” J. Shin, M. Hall and J. Chame. Award paper invited from MSP 7 International Journal of Embedded Systems, 2009.
  39. PERI Auto-Tuning,” David H. Bailey, Jacqueline Chame, Chun Chen, Jack Dongarra, Mary Hall, Jeffrey K. Hollingsworth, Paul Hovland, Shirley Moore, Keith Seymour, Jaewook Shin, Ananta Tiwari, Sam Williams, Haihang You, Journal of Physics: Conference Series, Vol. 125, 2008.
  40. Self-Configuring Applications for Heterogeneous Systems: Program Composition and Optimization Using Cognitive Techniques,” M. Hall, Y. Gil and R. Lucas. Proceedings of the IEEE, Special Issue on Cutting-Edge Computing, Vol. 96(5), May 2008.
  41. Model-Guided Performance Tuning of Parameter Values: A Case Study with Molecular Dynamics Visualization,” Y. Nelson, B. Bansal, M. Hall, A. Nakano, and K. Lerman, Proceedings of the Workshop on High-Level Parallel Programming Models and Supportive Environments, held in conjunction with IPDPS ’08, April, 2008.