Xu, Dawen, Xing, Kouzi, Liu, Cheng, Wang, Ying, Dai, Yulin, Cheng, Long ORCID: 0000-0003-1638-059X, Li, Huawei and Zhang, Lei (2019) Resilient neural network training for accelerators with computing errors. In: 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 15-17 July 2019, New York, USA.
Abstract
—With the advancements of neural networks, customized accelerators are increasingly adopted in massive AI
applications. To gain higher energy efficiency or performance,
many hardware design optimizations such as near-threshold
logic or overclocking can be utilized. In these cases, computing
errors may happen and the computing errors are difficult
to be captured by conventional training on general purposed
processors (GPPs). Applying the offline trained neural network
models to the accelerators with errors directly may lead to
considerable prediction accuracy loss.
To address this problem, we explore the resilience of neural
network models and relax the accelerator design constraints to
enable aggressive design options. First of all, we propose to
train the neural network models using the accelerators’ forward
computing results such that the models can learn both the data
and the computing errors. In addition, we observe that some of
the neural network layers are more sensitive to the computing
errors. With this observation, we schedule the most sensitive
layer to the attached GPP to reduce the negative influence of
the computing errors. According to the experiments, the neural
network models obtained from the proposed training outperform
the original models significantly when the CNN accelerators are
affected by computing errors.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Uncontrolled Keywords: | Two dimensional displays; Hafnium; resilient training; CNN accelerator; relaxed design constrain; fault tolerance |
Subjects: | UNSPECIFIED |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Published in: | 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors. ASAP . IEEE. |
Publisher: | IEEE |
Official URL: | http://dx.doi.org/10.1109/ASAP.2019.00-23 |
Copyright Information: | © 2019 IEEE |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
Funders: | National Natural Science Foundation of China (No. 61874124), Chinese Academy of Sciences STS (No. KFJ-STS-SCYD-226), European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement (No. 799066) |
ID Code: | 24293 |
Deposited On: | 20 Mar 2020 11:28 by Long Cheng . Last Modified 20 Mar 2020 11:28 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
250kB |
Metrics
Altmetric Badge
Dimensions Badge
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record