Addr. | 2-12-1 O-okayama, Meguroku, Tokyo 152-8550 JAPAN |
Contact this mail address. |
SC12 TOP | Presentations | Booth Talk Schedule | Booth Posters | Booth Map |
Tuesday, Nov.13 | |
11:30 ~ 12:00 | Dynamic Load Balance of Large-scale Applications on a GPU-supercomputer TSUBAME2.0 |
Takayuki Aoki - GSIC, Tokyo Institute of Technology | |
13:30 ~ 14:00 | Physis: An Implicitly Parallel Stencil Framework for Heterogeneous Machines |
Naoya Maruyama - RIKEN AICS | |
15:30 ~ 16:00 | High-Performance General Solver for Extremely Large-Scale Semidefinite Programming Problems |
Katsuki Fujisawa - Chuo University & JST CREST | |
Wednesday, Nov.14 | |
11:00 ~ 11:30 | Ultra-fast Metagenomic Data Analysis Pipeline on TSUBAME2.0 |
Masanori Kakuta - Graduate School of Information Science and Engineering, Tokyo Institute of Technology | |
13:00 ~ 13:30 | Using Hardware Design Trends to Inform Development of Future Energy Efficient Programming Models |
John Shalf - Department Head for Computer Science, Computing Research Division, Lawrence Berkeley National Laboratoryy | |
14:00 ~ 14:30 | ULPHPC: Ultra Low Power Supercomputing to Achieve 1000-fold Power Efficiency Improvement in 10 Years TSUBAME2.0 |
Satoshi Matsuoka - GSIC, Tokyo Institute of Technology | |
15:00 ~ 15:30 | New Directions in Extreme-Scale Operating Systems and Runtime Software |
Pete Beckman - Argonne National Laboratory | |
Title | Dynamic Load Balance of Large-scale Applications on a GPU-supercomputer TSUBAME2.0 |
Abstract | Stencil applications such as CFD (Computational Fluid Dynamics), structural analysis or wave propagations and particle simulations based on short-range interaction have local memory accesses. Domain decomposition is applied to them to minimize data communications between neighboring nodes. However, AMR (Adaptive Mesh Refinement) changes the resolution non-uniformly and particle simulations change the spatial distributions dynamically. In order to keep equal computation for each node, dynamic load balance is necessary to be introduced. Recent progress on TSUBAME 2.0 will be shown for large-scale applications. |
Speaker | Takayuki Aoki |
Affiliation | GSIC, Tokyo Institute of Technology |
Biography | Takayuki Aoki received a BSc in Applied Physics (1983), an MSc in Energy Science and Dr.Sci (1989) from Tokyo Institute of Technology, was a Visiting Fellow in Cornell University and the Max-Planck Institute in Germany for one year, has been a professor in Tokyo Institute of Technology since 2001 and the deputy director of the Global Scientific Information and Computing Center since 2009. He received the Computational Mechanics Achievement Award from Japan Society of Mechanical Engineers and many awards and honors in GPU computing, scientific visualization, and others. His team got the Gordon Bell Prize, Special Achievement in Scalability and Time-to-Solution in 2011. He was also recognized as a CUDA fellow by NVIDIA in 2012. |
Title | Physis: An Implicitly Parallel Stencil Framework for Heterogeneous Machines |
Abstract | Physis is an application framework for stencil computations that is designed to achieve both performance and productivity in large-scale parallel computing systems, in particular heterogeneous GPU-accelerated supercomputers. The framework consists of an implicitly parallel domain-specific language for stencil computations and its translators for various target parallel architectures. This talk presents the current status of the framework and its performance studies using several application benchmarks. |
Speaker | Naoya Maruyama |
Affiliation | RIKEN AICS |
Biography | Naoya Maruyama is a Team Leader at the Advanced Institute for Computational Science, RIKEN, where he leads the HPC Programming Framework Research Team. Since receiving his Ph.D. from Tokyo Institute of Technology in 2008, he has engaged in various research fields related to high performance computing. His team at AICS currently focuses on high performance software stacks for simplifying development of highly optimized, fault-tolerant computational science applications on current and future supercomputers, notably the K computer at RIKEN. He is also the PI of a 5-year JST CREST research project, "Highly Productive, High Performance Application Frameworks for Post Petascale Computing," and also a co-PI of a Future HPC Infrastructure Feasibility Study, where his main role is to establish a benchmark set for next-generation leading supercomputers. He has published peer-reviewed papers in major international conferences and journals on high performance computing, particularly on heterogeneous computing, fault tolerance, and low power computing. |
Title | High-Performance General Solver for Extremely Large-Scale Semidefinite Programming Problems |
Abstract | Semidefinite programming (SDP) is one of the most important problems among optimization problems at present. It is relevant to a wide range of fields such as combinatorial optimization, structural optimization, control theory, economics, quantum chemistry, sensor network location and data mining. The capability to solve extremely large-scale SDP problems will have a significant effect on the current and future applications of SDP. In 1995, Fujisawa et al. started the SDPA(Semidefinite programming algorithm) Project aimed at solving large-scale SDP problems with high numerical stability and accuracy. SDPA is one of the main codes to solve general SDPs. SDPARA is a parallel version of SDPA on multiple processors with distributed memory, and it replaces two major bottleneck parts (the generation of the Schur complement matrix and its Cholesky factorization) of SDPA by their parallel implementation. In particular, it has been successfully applied to combinatorial optimization and truss topology optimization. The new version of SDPARA (7.5.0-G) on a large-scale supercomputer called TSUBAME 2.0 at the Tokyo Institute of Technology has successfully been used to solve the largest SDP problem (which has over 1.48 million constraints), and created a new world record. Our implementation has also achieved 533 TFlops in double precision for large-scale Cholesky factorization using 2,720 CPUs and 4,080 GPUs. |
Speaker | Katsuki Fujisawa |
Affiliation | Chuo University & JST CREST |
Biography | Katsuki Fujisawa is a Professor of the Chuo University, Tokyo, Japan. He is also a leader of JST(Japan Science and Technology Agency) CREST project. The objective of this project is to develop an advanced computing and optimization infrastructure for extremely large-scale graphs on post peta-scale supercomputers. Semidefinite programming (SDP) is one of the most important problems among optimization problems at present. In 1995, he started the SDPA(Semidefinite programming algorithm) Project aimed at solving large-scale SDP problems with high numerical stability and accuracy. In the last April, the new optimization software for SDP on a large-scale supercomputer called TSUBAME 2.0 at the Tokyo Institute of Technology has successfully been used to solve the largest SDP problem (which has over 1.48 million constraints), and created a new world record. |
Title | Ultra-fast Metagenomic Data Analysis Pipeline on TSUBAME2.0 |
Abstract | Metagenomic analysis is a study of genomes extracted directly from environmental samples and requires sensitive sequence similarity search which need large computation time for gene prediction and metabolic reconstruction. Moreover, metagenomic data have rapidly increased in size and number in recent years because of improvement of DNA sequencers and computational analysis has become a bottleneck on metagenomic analysis To solve the problem, we developed an ultra-fast metagenomic data analysis pipeline on TSUBAME 2.0. To accelerate the most time consuming step in this pipeline (i.e. sequence similarity search), we also developed GHOSTM, which is a GPU-accelerated homology search tool for metagenomics. This homology search tool is sensitive enough for metagenomic analysis, and has search speed that enables us to analyze huge metagenomic data. This pipeline also has an interactive interface, and shows visual summary. Users can easily submit their jobs, check for the states of the jobs, cancel jobs, and grasp the point of the results. |
Speaker | Masanori Kakuta |
Affiliation | Graduate School of Information Science and Engineering, Tokyo Institute of Technology |
Biography | Masanori Kakuta is a researcher at Tokyo Institute of Technology. He received his Ph.D. from the University of Tokyo in 2010. His current research interests are in the area of bioinformatics, in particular metagenomic data analysis. |
Title | Using Hardware Design Trends to Inform Development of Future Energy Efficient Programming Models |
Abstract | Our underlying machine model has presumed modest growth in parallelism, adherence to Amdahl ratios for speeds-and-feeds, weak-scaling for parallelism, uniform distance between processing elements within a node and between nodes in a system, and conservation of the all-important FLOP in our algorithm and software design. However, the new reality is that parallelism (particularly on-chip) is growing exponentially, bandwidth and memory capacity are increasingly scarce and force us towards strong-scaling, distance as measured by power and performance is increasingly non-uniform at all levels of the machine, and flops are now heavily discounted to the point that wasting any other resource to save a few flops is a losing proposition. The enormous mismatch between our ability to express concepts of locality and implicit parallelism in existing programming models subjects our codes to huge inefficiencies and reduced portability. Future programming models need to be designed to accommodate the first derivative of these emerging hardware trends (exponential parallelism growth, non-uniformity, SIMD width growth, and bandwidth conservation). The talk will discuss the evolution of this mismatch between hardware and abstract machine models, and discuss a path to develop hardware constructs that enable more efficient implementation of core programming model semantics (example PGAS-on-chip, fast global reduction networks). We can also explore software constructs that better reflect the real costs and layout of an algorithm onto the underlying machine architecture (example hierarchical data layout directives and programmable iteration spaces). |
Speaker | John Shalf |
Affiliation | Department Head for Computer Science, Computing Research Division, Lawrence Berkeley National Laboratory |
Biography | John is Department Head for Computing Science for the Computing Research Division of Lawrence Berkeley National Laboratory and Chief Technology Officer for the National Energy Research Supercomputing Center. |
Title | ULPHPC: Ultra Low Power Supercomputing to Achieve 1000-fold Power Efficiency Improvement in 10 Years |
Abstract | - |
Speaker | Satoshi Matsuoka |
Affiliation | GSIC, Tokyo Institute of Technology |
Biography | Satoshi Matsuoka is a Professor at the Global Scientific Information and Computing Center of Tokyo Institute of Technology (GSIC). He is the leader of TSUBAME series of supercomputers, which became the 4th fastest in the world on the Top500 and awarded the "Greenest Production Supercomputer in the World" by the Green 500 in November, 2010 and June 2011. He has also co-lead the Japanese national grid project NAREGI during 2003-2007, and is currently leading various projects such as the JST-CREST Ultra Low Power HPC and JSPS Billion-Scale Supercomputer Resilience. He has authored over 500 papers according to Google Scholar, and has chaired many ACM/IEEE conferences, including the Technical Papers Chair, Community Chair, and the upcoming Program Chair for Supercomputing Conferences 09, 11 and 13 respectively. He is a fellow of ACM and European ISC, and has won many awards including the JSPS Prize from the Japan Society for Promotion of Science in 2006, awarded by his Highness Prince Akishinomiya, the ACM Gordon Bell Prizes for 2011, and the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology in 2012. |
Title | New Directions in Extreme-Scale Operating Systems and Runtime Software |
Abstract | For more than a decade, extreme-scale operating systems and runtime software have been evolving very slowly. Today's large-scale systems use slightly retooled "node" operating systems glued together with ad hoc local agents to handle I/O, job launch, and management. These extreme-scale systems are only slightly more tightly integrated than are generic Linux clusters with InfiniBand. As we look forward to a new era for large-scale HPC systems, we see that power and fault management will become key design issues. Software management of power and support for resilience must now be part of the whole-system design. Extreme-scale operating systems and runtime software will not be simply today's node code with a few control interfaces, but rather a tightly integrated "global OS" that spans the entire platform and works cooperatively across portions of the machine in order to manage power and provide resilience. |
Speaker | Pete Beckman |
Affiliation | Argonne National Laboratory |
Biography | Pete Beckman is the founder and director of the Exascale Technology and Computing Institute at Argonne National Laboratory and the co-director of the Northwestern-Argonne Institute for Science and Engineering. From 2008-2010 he was the director of the Argonne Leadership Computing Facility, where he led the Argonne team working with IBM on the design of Mira, a 10 petaflop Blue Gene/Q, and helped found the International Exascale Software Project. Pete joined Argonne in 2002, serving first as director of engineering and later as chief architect for the TeraGrid, where he led the design and deployment team that created the world's most powerful Grid computing system for linking production HPC computing centers for the National Science Foundation. After the TeraGrid became fully operational, Pete started a research team focusing on petascale high-performance system software. As an industry leader, he founded a Turbolinux-sponsored research laboratory in 2000 that developed the world's first dynamic provisioning system for cloud computing and HPC clusters. The following year, Pete became vice president of Turbolinux's worldwide engineering efforts, managing development offices in the U.S., Japan, China, Korea, and Slovenia. Dr. Beckman has a Ph.D. in computer science from Indiana University (1993) and a B.A. in Computer Science, Physics, and Math from Anderson University (1985). |