Reading PAGE

Peer Evaluation activity

Trusted by 1
Reviews 1
Downloads 14198
Views 59
Full text requests 2
Followed by 1
Following... 3
Funded by 4

Total impact ?

    Send a

    David has...

    Trusted 1
    Reviewed 0
    Emailed 0
    Shared/re-used 0
    Discussed 0
    Invited 0
    Collected 9

    This was brought to you by:

    block this user David M. W. Powers Trusted member

    Professor / David.Powers@flinders.edu.au

    Flinders University, School of Computer Science, Engineering and Mathematics, Adelaide, South Australia
    KUB/Tilburg University, ITK/Institute for Language and Knowledge Technology, Tilburg, Brabant, Holland
    FB Informatik/Faculty of Computer Science, University of Kaiserslautern, Germany
    Telecom Paris/ENST, Paris, France
    Macquarie University, Sydney, NSW, Australia
    UNSW/University of New South Wales, Sydney, NSW, Australia
    Sydney University, Sydney, NSW, Australia
    Cardiff University, Linguistics Department, Cardiff, Wales, UK
    Beijing Municipal Lab for Multimedia & Intelligent Software, Beijing University of Technology, Beijing, China

    Spatial and Temporal Visual Speech Feature for Chinese Phonemes

    Export to Mendeley

    This paper aims to propose a practical set of features for representing the visual speech of Chinese phonemes. The state and hence visibility of teeth and tongue play important roles in pronunciation, but discriminating them in images or video is tricky. This paper introduces the concept of inner appearance features based on structural analysis. Our experiment results show preliminary evidence that describing the pixel distribution of the upper and lower inner mouth separately can improve the ability to discriminate useful facial features as well as individual phonemes. The Chinese phonemes defined in the SAPI Speech Interface generally corresponding to one character or morpheme, and our dynamic feature is proposed based on the traditional division of these syllabic phonemes into a consonant- like onset and a vowel- and/or nasal-like coda. Features are established by combining a series of frames and identifying the most salient change frame as the key frame to avoid provide an objective framework for phoneme onset recognition. Our work provides a basis for bimodal AudioVisual Chinese speech recognition as well as unimodal Visual speech reading, but is also targeted to AudioVisual speaking face/talking head synthesis.

    Oh la laClose

    Your session has expired but don’t worry, your message
    has been saved.Please log in and we’ll bring you back
    to this page. You’ll just need to click “Send”.

    Your evaluation is of great value to our authors and readers. Many thanks for your time.

    Review Close

    Short review
    Select a comment
    Select a grade
    You and the author
    Anonymity My review is anonymous( Log in  or  Register )
    publish
    Close

    When you're done, click "publish"

    Only blue fields are mandatory.

    Relation to the author*
    Overall Comment*
    Anonymity* My review is anonymous( Log in  or  Register )
     

    Focus & Objectives*

    Have the objectives and the central topic been clearly introduced?

    Novelty & Originality*

    Do you consider this work to be an interesting contribution to knowledge?

    Arrangement, Transition and Logic

    Are the different sections of this work well arranged and distributed?

    Methodology & Results

    Is the author's methodology relevant to both the objectives and the results?

    Data Settings & Figures

    Were tables and figures appropriate and well conceived?

    References and bibliography

    Is this work well documented and has the bibliography been properly established?

    Writing

    Is this work well written, checked and edited?

    Write Your Review (you can paste text as well)
    Please be civil and constructive. Thank you.


    Grade (optional, N/A by default)

    N/A 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10
    Close

    Your mailing list is currently empty.
    It will build up as you send messages
    and links to your peers.

     No one besides you has access to this list.
    Close
    Enter the e-mail addresses of your recipients in the box below.  Note: Peer Evaluation will NOT store these email addresses   log in
    Your recipients

    Your message:

    Your email : Your email address will not be stored or shared with others.

    Your message has been sent.

    Description

    Title : Spatial and Temporal Visual Speech Feature for Chinese Phonemes
    Author(s) : Xibin Jia a, Yanfang Han, David Powers, Xiyuan Bao
    Abstract : This paper aims to propose a practical set of features for representing the visual speech of Chinese phonemes. The state and hence visibility of teeth and tongue play important roles in pronunciation, but discriminating them in images or video is tricky. This paper introduces the concept of inner appearance features based on structural analysis. Our experiment results show preliminary evidence that describing the pixel distribution of the upper and lower inner mouth separately can improve the ability to discriminate useful facial features as well as individual phonemes. The Chinese phonemes defined in the SAPI Speech Interface generally corresponding to one character or morpheme, and our dynamic feature is proposed based on the traditional division of these syllabic phonemes into a consonant- like onset and a vowel- and/or nasal-like coda. Features are established by combining a series of frames and identifying the most salient change frame as the key frame to avoid provide an objective framework for phoneme onset recognition. Our work provides a basis for bimodal AudioVisual Chinese speech recognition as well as unimodal Visual speech reading, but is also targeted to AudioVisual speaking face/talking head synthesis.
    Keywords : Chinese Visemes, Spatiotemporal Learning

    Subject : AudioVisual Speech Recognition
    Area : Computer Science
    Language : English
    Affiliations Flinders University, School of Computer Science, Engineering and Mathematics, Adelaide, South Australia
    Beijing Municipal Lab for Multimedia & Intelligent Software, Beijing University of Technology, Beijing, China
    Journal : Journal of Information and Computational Science

    Leave a comment

    This contribution has not been reviewed yet. review?

    You may receive the Trusted member label after :

    • Reviewing 10 uploads, whatever the media type.
    • Being trusted by 10 peers.
    • If you are blocked by 10 peers the "Trust label" will be suspended from your page. We encourage you to contact the administrator to contest the suspension.

    Does this seem fair to you? Please make your suggestions.

    Please select an affiliation to sign your evaluation:

    Cancel Evaluation Save

    Please select an affiliation:

    Cancel   Save

    David's Peer Evaluation activity

    Trusted by 1
    Reviews 1
    Downloads 14198
    Views 59
    Full text requests 2
    Followed by 1
    Following... 3
    Funded by 4
    • Australian Speech Science Infrastructure: An Audio-Video Speech Corpus of Australian English, Grant Number LE0989734 / Year 2009
    • From Talking Heads to Thinking Heads: A Research Platform for Human Communication Science , Grant Number TS0669874 / Year 2006
    • Heterodensity neuroimaging techniques for spatiotemporal identification and localization , Grant Number DP0988686 / Year 2009
    • Enhanced brain and muscle signal separation verified by electrical scalp recordings from paralysed awake humans, Grant Number DP110101473 / Year 2011

    David has...

    Trusted 1
    Reviewed 0
    Emailed 0
    Shared/re-used 0
    Discussed 0
    Invited 0
    Collected 9
    Invite this peer to...
    Title
    Start date (dd/mm/aaaa)
    Location
    URL
    Message
    send
    Close

    Full Text request

    Your request will be sent.

    Please enter your email address to be notified
    when this article becomes available

    Your email


     
    Your email address will not be shared or spammed.