Addr. | 2-12-1 O-okayama, Meguroku, Tokyo 152-8550 JAPAN |
Contact this mail address. |
SC18 TOP | News Release | Presentations | Booth Talk Schedule | Booth Posters | Booth Map | Photo Gallery |
Training ImageNet on Thousands of GPUs
Rio Yokota / Tokyo Institute of Technology
Accelerated Deep Learning Advances in HPC
William Tang / Princeton University
A fast low-order implicit unstructured finite-element solver for earthquake problems using GPU-based computers
Takuma Yamaguchi (Gordon Bell Finalist) / The University of Tokyo
VDI as Visualization Server for TSUBAME Supercomputer
Atsushi Okawa / Tokyo Institute of Technology
Scientific Application Development and Early Results on Summit
Tjerk Straatsma / Oak Ridge National Laboratory
Title | Training ImageNet on Thousands of GPUs |
Speaker | Rio Yokota / Tokyo Institute of Technology |
Abstract | ImageNet has become a common benchmark for large scale distributed deep learning, where teams at Facebook, UC Berkeley, Preferred Networks have independently performed runs on thousands of GPUs. The current state-of-the-art can train ImageNet using ResNet-50 for 90 epochs in about 15 minutes. However, data-parallel implementation of such large scale deep learning requires very large batch sizes, which has a detrimental effect on both the optimization and generalizability. We are currently investigating alternative optimization methods that are less sensitive to the increase in batch size. Large scale runs have been conducted on TSUBAME3.0 using 2048GPUs. |
Title | Accelerated Deep Learning Advances in HPC |
Speaker | William Tang / Princeton University |
Abstract | Recent HPC-relevant advances in the deployment of deep learning recurrent nets have been demonstrated in exciting scaling studies of Princeton’s new Deep Learning Code -- "FRNN (Fusion Recurrent Neural Net) Code on modern GPU systems. This is clearly a “big-data” project in that it has direct access to the huge EUROFUSION/JET disruption data base of over a half-petabyte to drive these studies 1. FRNN implements a distributed data parallel synchronous stochastic gradient approach with “Tensorflow”2 and “Theano” 3 libraries at the backend and MPI for communication. This deep learning software has recently demonstrated excellent scaling up to 6000 GPU's on Titan ? enabled by a 2017 Oak Ridge Leadership Computing Facility (OLCF) Director’s Discretionary Award. This has enabled stimulating progress toward the goal of establishing the practical feasibility of using leadership class supercomputers to greatly enhance training of neural nets to enable transformational impact on key discovery science application domains such as Fusion Energy Science. Powerful systems targeted for near-future deployment of our deep learning software include: (1) Japan’s new “Tsubame 3” system with 3000 P-100 GPU’s; (2) Switzerland’s “Piz Daint” CRAY XC50 system with its 4500 P100 GPU’s; (2) Switzerland’s “Piz Daint” CRAY XC50 system with its 4500 P100 GPU’s; and (3) OLCF’S “Summit-Dev” system. Summarily, statistical Deep Learning software trained on very large data sets hold exciting promise for delivering much-needed predictive tools capable of accelerating scientific knowledge discovery in HPC. The associated creative methods being developed also has significant potential for cross-cutting benefit to a number of important application areas in science and industry. |
Title | A fast low-order implicit unstructured finite-element solver for earthquake problems using GPU-based computers |
Speaker | Takuma Yamaguchi (Gordon Bell Finalist) / The University of Tokyo |
Abstract | In understanding earthquake generation/propagation processes and reducing the damage, it's important to conduct numerical simulation considering the complex geometry and material heterogeneity. To handle massive computational cost due to large domain and high resolution, we propose a fast low-order finite element solver accelerated by GPUs. In finite-element solver, sparse matrix vector multiplication becomes computationally expensive. Dense computation introduced by time-parallel algorithm reduces the random data access in Element-by-Element method and attains 2.2 times speedup per vector from the original kernel when combined with the utilization of shared memory. Our proposed solver on Piz-Daint is 2.79 times faster than SC14 Gordon Bell Finalist solver. We demonstrate crustal deformation computation with a 2,403,562,056 degree-of-freedom finite-element model targeting Eastern Mediterranean crust and mantle. The same techniques are applicable for the earthquake simulation in urban environments and this framework is used as the basis of our solver we propose as a 2018 finalist of the Gordon Bell Prize. |
Title | VDI as Visualization Server for TSUBAME Supercomputer |
Speaker | Atsushi Okawa / Tokyo Institute of Technology |
Abstract |
TSUBAME VDI (Virtual Desktop Infrastructure) is an experimental VDI system to evaluate high performance VDI for supercomputers. Supercomputers generate large data files of simulated results, etc. on their storage systems. In general, users have to download those files to their local environments before check or visualize the results. The use of VDI system allows low-latency access to them. This talk includes demonstration of the remote access to actual VDI system in Tokyo. |
Title | Scientific Application Development and Early Results on Summit |
Speaker | Tjerk Straatsma / Oak Ridge National Laboratory |
Abstract | Summit, the world fastest supercomputer, located in the Oak Ridge Leadership Computing Facility at the DOE Oak Ridge National Laboratory will provide unprecedented computational resources for open science supported by the DOE user programs. The unique aspects of its GPU-accelerated architecture are reviewed in this presentation. The collaborative efforts to prepare scientific modeling and simulation as well as data-intensive computing applications to take advantage of the architectural features of Summit are highlighted, and early scientific results enabled by the porting and development work presented. |