Fast and Low-Power Quantized Fixed Posit High-Accuracy DNN Implementation

Mekie, Joycee

doi:10.1109/TVLSI.2021.3131609

Fast and Low-Power Quantized Fixed Posit High-Accuracy DNN Implementation

Source

IEEE Transactions on Very Large Scale Integration VLSI Systems

ISSN

10638210

Date Issued

2022-01-01

Author(s)

Walia, Sumit

Tej, Bachu Varun

Kabra, Arpita

Devnath, Joydeep

Mekie, Joycee

DOI

10.1109/TVLSI.2021.3131609

Volume

30

Issue

1

Abstract

This brief compares quantized float-point representation in posit and fixed-posit formats for a wide variety of pre-trained deep neural networks (DNNs). We observe that fixed-posit representation is far more suitable for DNNs as it results in a faster and low-power computation circuit. We show that accuracy remains within the range of 0.3% and 0.57% of top-1 accuracy for posit and fixed-posit quantization. We further show that the posit-based multiplier requires higher power-delay-product (PDP) and area, whereas fixed-posit reduces PDP and area consumption by 71% and 36%, respectively, compared to (Devnath et al., 2020) for the same bit-width.

Unpaywall

URI

http://repository.iitgn.ac.in/handle/IITG2025/26333

Subjects

Convolutional neural net (CNN) | deep neural network (DNN) | fixed-posit representation | posit number system | quantization