Efficient Implementation of Deep Neural Networks on Resource-constrained Devices

Author	: Maedeh Hemmat
Publisher	:
Total Pages	: 0
Release	: 2022
ISBN-10	: OCLC:1343909718
ISBN-13	:
Rating	: 4/5 ( Downloads)

DOWNLOAD EBOOK

Book Synopsis Efficient Implementation of Deep Neural Networks on Resource-constrained Devices by : Maedeh Hemmat

Download or read book Efficient Implementation of Deep Neural Networks on Resource-constrained Devices written by Maedeh Hemmat and published by . This book was released on 2022 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: In recent years, Deep Neural Networks (DNNs) have emerged as an impressively successful model to perform complicated tasks including object classification, speech recognition, autonomous vehicle, etc. To provide better accuracy, state-of-the-art neural network models are designed to be deeper (i.e., having more layers) and larger (i.e., having more parameters within each layer). It subsequently has increased the computational and memory costs of DNNs, mandating their efficient hardware implementation, especially on resource-constrained devices such as embedded systems and mobile devices. This challenge can be investigated from two aspects: computation and storage. On one hand, state-of-the-art DNNs require the execution of billions of operations for each inference. This is while the computational power of embedded systems is tightly limited. On the other hand, DNN models require storage of several Megabytes of parameters which can't fit in the on-chip memory of these devices. More importantly, these systems are usually battery-powered with a limited energy budget to access memory and perform computations.This dissertation aims to make contributions towards improving the efficiency of DNN deployments on resource-constraint devices. Our contributions can be categorized into three aspects. First, we propose an iterative framework that enables dynamic reconfiguration of an already-trained Convolutional Neural Network (CNN) in hardware during inference. The reconfiguration enables input-dependent approximation of the CNN at run-time, leading to significant energy savings without any significant degradation in classification accuracy. Our proposed framework breaks each inference into several iterations and fetches only a fraction of the weights from off-chip memory at each iteration to perform the computations. It then decides to either terminate the network or fetch more weights to do the inference, based on the difficulty of the received input. The termination condition can be also adjusted to trade off classification accuracy and energy consumption at run-time. Second, we exploit the user-dependent behavior of DNNs and propose a personalized inference framework that prunes an already-trained neural network model based on the preferences of individual users and without the need to retrain the network. Our key observation is that an individual user may only encounter a tiny fraction of the trained classes on a regular basis. Hence, storing trained models (pruned or not) for all possible classes on local devices is costly and unnecessary for the user's needs. Our personalized framework minimizes the memory, computation, and energy consumption of the network on the local device as it processes neurons on a need basis (i.e., only when the user expects to encounter a specific output class). Third, we propose a framework for distributed inference of DNNs across multiple edge devices to improve the communication and latency overheads. Our framework utilizes many parallel, independent-running edge devices which communicate only once to a single 'back-end' device (also an edge device) to aggregate their predictions and produce the result of the inference. To achieve this distributed implementation, our framework first partitions the classes of the complex DNN into subsets to be assigned across the available edge devices while considering the computational resources of each device. The DNN is then aggressively pruned for each device for its set of assigned classes. Each smaller DNN (SNN) is further configured to return a 'Don't Know' when encountered by an input from an unassigned class. Each SNN is generated from the complex DNN at the beginning and then loaded onto its corresponding edge device, without the need for retraining. To perform inference, each SNN will perform an inference based on its received input.

Efficient Implementation Of Deep Neural Networks On Resource Constrained Devices

Efficient Implementation of Deep Neural Networks on Resource-constrained Devices

Efficient Implementation of Deep Neural Networks on Resource-constrained Devices Related Books

Efficient Implementation of Deep Neural Networks on Resource-constrained Devices

Embedded Deep Learning

Efficient Processing of Deep Neural Networks

Towards Safe and Efficient Application of Deep Neural Networks in Resource-constrained Real-time Embedded Systems

Resource Constrained Neural Architecture Design