Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Resilient neural network training for accelerators with computing errors

Xu, Dawen, Xing, Kouzi, Liu, Cheng, Wang, Ying, Dai, Yulin, Cheng, Long orcid logoORCID: 0000-0003-1638-059X, Li, Huawei and Zhang, Lei (2019) Resilient neural network training for accelerators with computing errors. In: 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 15-17 July 2019, New York, USA.

Abstract
—With the advancements of neural networks, customized accelerators are increasingly adopted in massive AI applications. To gain higher energy efficiency or performance, many hardware design optimizations such as near-threshold logic or overclocking can be utilized. In these cases, computing errors may happen and the computing errors are difficult to be captured by conventional training on general purposed processors (GPPs). Applying the offline trained neural network models to the accelerators with errors directly may lead to considerable prediction accuracy loss. To address this problem, we explore the resilience of neural network models and relax the accelerator design constraints to enable aggressive design options. First of all, we propose to train the neural network models using the accelerators’ forward computing results such that the models can learn both the data and the computing errors. In addition, we observe that some of the neural network layers are more sensitive to the computing errors. With this observation, we schedule the most sensitive layer to the attached GPP to reduce the negative influence of the computing errors. According to the experiments, the neural network models obtained from the proposed training outperform the original models significantly when the CNN accelerators are affected by computing errors.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Two dimensional displays; Hafnium; resilient training; CNN accelerator; relaxed design constrain; fault tolerance
Subjects:UNSPECIFIED
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Published in: 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors. ASAP . IEEE.
Publisher:IEEE
Official URL:http://dx.doi.org/10.1109/ASAP.2019.00-23
Copyright Information:© 2019 IEEE
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:National Natural Science Foundation of China (No. 61874124), Chinese Academy of Sciences STS (No. KFJ-STS-SCYD-226), European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement (No. 799066)
ID Code:24293
Deposited On:20 Mar 2020 11:28 by Long Cheng . Last Modified 20 Mar 2020 11:28
Documents

Full text available as:

[thumbnail of 2019-ASAP.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
250kB
Metrics

Altmetric Badge

Dimensions Badge

Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record