Title |
Advanced Computing & Optimization Infrastructure for Extremely Latge-Scale Graphs on Post Peta-Scale Supercomputers |
Abstract |
In this talk, we present our ongoing research project. The objective of this project is to develop advanced computing and optimization infrastructures for extremely large-scale graphs on post peta-scale supercomputers. We explain our challenge to Graph 500 and Green Graph 500 benchmarks that are designed to measure the performance of a computer system for applications that require irregular memory and network access patterns. The 1st Graph500 list was released in November 2010. The Graph500 benchmark measures the performance of any supercomputer performing a BFS (Breadth-First Search) in terms of traversed edges per second (TEPS). We have implemented world’s first GPU-based BFS on the TSUBAME 2.0 supercomputer at Tokyo Institute of Technology in 2012. The Green Graph 500 list collects TEPS-per-watt metrics. In 2014, our project team was a winner of the 8th Graph500 benchmark and 3rd Green Graph 500 benchmark. We also present our parallel implementation for large-scale SDP (SemiDefinite Programming) problem. We solved the largest SDP problem (which has over 2.33 million constraints), thereby creating a new world record. Our implementation also achieved 1.774 PFlops in double precision for large-scale Cholesky factorization using 2,720 CPUs and 4,080 GPUs on the TSUBAME 2.5 supercomputer.
|
PageTop
Speaker |
Katsuki Fujisawa |
Affiliation |
GSIC, Tokyo Institute of Technology |
Biography |
Fujisawa has been a Full Professor at the Institute of Mathematics for Industry (IMI) of Kyushu University and a research director of the JST (Japan Science and Technology Agency) CREST (Core Research for Evolutional Science and Technology) post-Peta High Performance Computing. He received his Ph. D. from the Tokyo Institute of Technology in 1998. The objective of the JST CREST project is to develop an advanced computing and optimization infrastructure for extremely large-scale graphs on post peta-scale super-computers. His project team has challenged the Graph500 and Green Graph500 benchmarks, which are designed to measure the performance of a computer system for applications that require irregular memory and network access patterns. In 2014 and 2015, his project team was a winner of the 8th and 10th Graph500 and the 3rd and 5th Green Graph500 benchmarks, respectively. |
Title |
Inevitable Convergence of Big Data and HPC |
Abstract |
Big data applications such as health are, system biology, social networks, business intelligence, and electric power grids, etc., require fast and scalable data analytics capability, posing significant opportunities for HPC, as evidenced by recent attentions to the Graph500 list and the Green Graph500 list. In order to cope with massive capacity requirements of such big data applications, emerging NVM(Non-Volatile Memory) devices, such as Flash, realize low cost high energy-efficiency compared to conventional DRAM devices, at the expense of low throughput and latency, requiring deepening of the memory hierarchy.
Our recent projects “EBD (Extreme Big Data)” and “"ScaleGraph" aim to come up with a big data / HPC convergence architecture that provide such algorithms and abstractions. In particular, we devised a novel graph data offloading technique using NVMs for the hybrid BFS (Breadth-First Search) algorithm widely used in the Graph500 benchmark.
We show that our approach can achieve competitive performance to the conventional DRAM only approach while aggressively extending the memory footprints onto NVMs. We are also developing a scalable graph and other big data libraries, based on the languages, such as the X10 language, as instances of abstraction for extreme-scale computing systems with massive parallelism and deep memory hierarchy, in co-design with key applications such as genomics/HPC mapreduce, weather/data assimilation, social simulation/graphs, and image recognition/scalable deep learning. The results will be incorporated into our next generation TSUBAME3.0 which is to be commissioned by November 2016, as a big-data supercomputer. |
PageTop
Speaker |
Satoshi Matsuoka |
Affiliation |
GSIC, Tokyo Institute of Technology |
Biography |
Satoshi Matsuoka has been a Full Professor at the Global Scientific Information and Computing Center (GSIC), a Japanese national supercomputing center hosted by the Tokyo Institute of Technology, since 2001. He received his Ph. D. from the University of Tokyo in 1993. He is the leader of the TSUBAME series of supercomputers, including TSUBAME2.0 which was the first supercomputer in Japan to exceed Petaflop performance and became the 4th fastest in the world on the Top500 in Nov. 2010, as well as the recent TSUBAME-KFC becoming #1 in the world for power efficiency for both the Green 500 and Green Graph 500 lists in Nov. 2013. He is also currently leading several major supercomputing research projects, such as the MEXT Green Supercomputing, JSPS Billion-Scale Supercomputer Resilience, as well as the JST-CREST Extreme Big Data. He has written over 500 articles according to Google Scholar, and chaired numerous ACM/IEEE conferences, most recently the overall Technical Program Chair at the ACM/IEEE Supercomputing Conference (SC13) in 2013. He is a fellow of the ACM and European ISC, and has won many awards, including the JSPS Prize from the Japan Society for Promotion of Science in 2006, awarded by his Highness Prince Akishino, the ACM Gordon Bell Prize in 2011, the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology in 2012, and recently the 2014 IEEE-CS Sidney Fernbach Memorial Award, the highest prestige in the field of HPC. |
Title |
Towards A Runtime Framework for Dynamic Adaptive Applications |
Abstract |
Static programming methods based on MPI and OpenMP have successfully served a broad applications’ community for over two decades. But with the core counts numbering in the millions and multi-scale multi-physics applications becoming ever more prevalent, dynamic application algorithms are requiring greater runtime support from HPC systems even as the community sets its sights on exascale. ParalleX is a conceptual model for support of dynamic adaptive execution. This presentation describes the HPX-5 runtime system that supports guided computing through a reduction to practice of the ParalleX execution model. New results will be presented for such widely used codes as LULESH and the HPCG benchmark. A feature of this implementation is that it is performance portable with no changes to code required as the scale of target systems change. Nonetheless, runtime system overheads can impede efficiency and scaling even as it is intended to improve both. This tradeoff will be discussed as part of this presentation. |
PageTop
Speaker |
Thomas Sterling |
Affiliation |
Indiana University, Caltech, ORNL |
Biography |
Dr. Thomas Sterling holds the position of Professor of Intelligent Systems Engineering at the Indiana University (IU) School of Informatics and Computing as well as the Chief Scientist and Associate Director of the Center for Research in Extreme Scale Technologies (CREST). Since receiving his Ph.D from MIT in 1984 as a Hertz Fellow Dr. Sterling has engaged in applied research in fields associated with parallel computing system structures, semantics, and operation in industry, government labs, and academia. Dr. Sterling is best known as the "father of Beowulf" for his pioneering research in commodity/Linux cluster computing. He was awarded the Gordon Bell Prize in 1997 with his collaborators for this work. He was the PI of the HTMT Project sponsored by NSF, DARPA, NSA, and NASA to explore advanced technologies and their implication for high-end system architectures. Other research projects included the DARPA DIVA PIM architecture project with USC-ISI, the Cray Cascade Petaflops architecture project sponsored by the DARPA HPCS Program, and the Gilgamesh high-density computing project at NASA JPL. Thomas Sterling is currently engaged in research associated with the innovative ParalleX execution model for extreme scale computing to establish the foundation principles to guide the co-design for the development of future generation Exascale computing systems by the end of this decade. ParalleX is currently the conceptual centerpiece of the XPRESS project as part of the DOE X-stack program and has been demonstrated in proof-of-concept in the HPX runtime system software. Dr. Sterling is the co-author of six books and holds six patents. He was the recipient of the 2013 Vanguard Award. In 2014, he was named a fellow of the American Association for the Advancement of Science. |
Title |
Application-Centric Overlay Cloud Utilizing Inter-Cloud |
Abstract |
In this talk, we present overlay cloud architecture for building virtual clouds in Inter-Cloud. In the overlay cloud architecture, the middleware, Virtual Cloud Provider (VCP), overlays virtualized computing resources on physical cloud computing resources using the Linux container technology and overlays virtual network resources upon network such as the Internet. VCP enables the user to automatically build the customized computing environment for the user application. We also present our R&D plan of technologies building the overlay cloud architecture, including middleware development, resource selection strategies, applications (Genome Sequencing and Fluid Acoustic Simulation), and testbed operation. |
PageTop
Speaker |
Kento Aida |
Affiliation |
National Institute of Informatics |
Biography |
Kento Aida received Dr. Eng. in electrical engineering from Waseda University in 1997. He joined Tokyo Institute of Technology and became a research scientist in 1997, an assistant professor in 1999, and an associate professor in 2003, respectively. He is now a professor at National Institute of Informatics and a visiting professor at the Department of Information Processing in Tokyo Institute of Technology from 2007. |
Title |
Data-intensive and Simulation-intensive Computing: Parallel or Perpendicular |
Abstract |
To extend the scientific benefits of high performance computing to the exascale, the hardware requirements are daunting, but the algorithmic adaptations required to migrate today’s successful “bulk synchronous” open source parallel scientific software base to the anticipated exascale environment are more so. They include: (1) reducing synchronization scope and frequency, (2) reducing memory traffic per core, (3) exploiting more instruction-uniform concurrency (as in GPUs), and (4) relying more on algorithms for fault tolerance than expensive hardware redundancy. We briefly recap the architectural constraints and map some ongoing research being jointly pursued in PhD theses in the Extreme Computing Research Center (ECRC) at KAUST and in the Global Scientific Information and Computing (GSIC) Center at Tokyo Tech onto these directions.
|
PageTop
Speaker |
Davvid Keyes |
Affiliation |
GSIC, Tokyo Institute of Technology |
Biography |
David Keyes directs the Extreme Computing Research Center at KAUST. He earned a BSE in Aerospace and Mechanical Sciences from Princeton in 1978 and PhD in Applied Mathematics from Harvard in 1984. Keyes works at the interface between parallel computing and the numerical analysis of PDEs, with a focus on scalable implicit solvers. Newton-Krylov-Schwarz (NKS), Additive Schwarz Preconditioned Inexact Newton (ASPIN), and Algebraic Fast Multipole (AFM) methods are methods he helped name and popularize. Before joining KAUST as a founding dean in 2009, he led scalable solver software projects in the SciDAC and ASCI programs of the US DOE, ran university collaboration programs at LLNL’s ISCR and NASA’s ICASE, and taught at Columbia, Old Dominion, and Yale Universities. He is a Fellow of SIAM and AMS, and has been awarded the ACM Gordon Bell Prize, the IEEE Sidney Fernbach Award, and the SIAM Prize for Distinguished Service to the Profession.
|