

In each of the images, bounding boxes are carefully annotated and labeled. In this paper, we have introduced a new image dataset OkkhorNama for Fingerspelled Bangladeshi Sign Language including all 46 signs with images over 12K. To obtain optimal performance, along with the state-of-the-art CNN model, the requirement of a high-quality sign language dataset cannot be foreseen. Computer Vision is playing a vital role in this regard by developing a sustainable system to understand the signs for machine translations. In recent years, lots of researches are being conducted to interpret Bangladeshi Sign Language (BdSL) to the means that general people can communicate with people having a hearing impairment and reduce the verbal gap between them.

We have used YOLOv4 as the object detection model.We have also proposed three new signs for the sentence generation task and built a dataset consisting of 12.5k BdSL images of 49 different classes where 39 are Bangla alphabets, 10 are Bangla digits, and the three new proposed signs. This paper proposes a system for BdSL recognition that can interpret BdSL from a sequence of images or a video stream and generate both textual sentences and speech in real-time. Most of them classify either digits or alphabets and also face time delay. Recently lots of research has been on Bangla Sign Language(BdSL) classification, not on Bangla sentence and speech generation. A medium recognizing sign language and converting it into text and speech could fill the gap. Being a visual means of communication, it deprives the mutes to communicate with people having a visual impairment. Sign language, the non-verbal language used by the people with hearing and speaking dis-ability, known as the deaf and mute, to connect the bridge of communication with others.
