|
Abstract:
|
American Sign Language (or ASL ) is the dominant sign language of deaf people in United States and parts of Canada . 500 ,000 to 2 million people use ASL as their primary language in United States . ASL uses hands , face and body , with constantly changing movements and orientations . Since the language is based on gestures , not a printed alphabet , it gets difficult to know the associated meaning given a video . There are multimedia tools and dictionaries available to view a sign video for a given word but there are no dictionaries available which , given a sign video will respond with corresponding word . This egged on development of video based lookup in ASL dictionary . The vision is to have a system in which a user will be able to lookup the meaning of an ASL sign simply by performing gesture in front of a video camera synced to a computer . The computer will compare the unknown sign with a database of signs to identify the most likely matches . In existing ASL lexicon project a user submits a query sign video and the application finds the most similar signs from the system database . The existing system evaluates the similarity between the query video and every sign video in the video database , using Dynamic Time Warping (DTW ) distance . DTW is an algorithm for measuring similarity between two sequences which may vary in time or speed . DTW is a method that allows a computer to find an optimal match between two given sequences (e .g . time series ) with certain restrictions . The sequences are warped non - linearly in the time dimension to determine a measure of their similarity independent of certain non -linear variations in the time dimension . The existing ASL lexicon project uses the similarity in hand locations and orientations to lookup a gesture in dictionary of ASL signs . The ability of DTW to cater to temporal misalignments helps us recognize signs differing in time or speed . ASL periodic signs are signs which have repetition of an action . The number of times this action is repeated is signer ‘s discretion . DTW in such cases will still attempt to align input video with the dictionary sign video . The alignment will not be meaningful , if the number of times an action is repeated in input video differs from dictionary sign video . Since DTW in such case results in non -meaningful association , it ultimately results in poor similarity responses . The paper attempts to correct this problem of ‘ Incorrect Period Matching ‘ . The paper contributes by defining a protocol for annotating periodic signs and introduces a method for improving system accuracy on such signs . It builds an informative database for periodic video signs on an ASL lexicon dataset of 1113 unique signs . It captures the temporal information (start and end of period ) for all the periods executed in each periodic sign video . This paper provides a mechanism to generate periods synthetically . We use the periodic temporal information of the last period to create subsequent periods for the training video . We successfully corrected the problem which spurred while using DTW on periodic signs , by synthetically generating periods . |