* eSpeak - Introduction
@ Jonathan Duddington
` Lorenzo Taylor
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Jonathan Duddington @ UTC (permalink / raw)
To: speakup
I've just found this mailing list from a comment on the
espeak.sourceforge.net forum. It's good to see people using (or trying
to use) eSpeak.
eSpeak is not new, I originally wrote it about 10 years ago for the
Acorn/RISC OS platform. The Linux version is a re-write of the code
which I've been doing over the past few months. I'm definitely not an
expert on Linux system programming or system administration, so I'll
probably need to ask for help in those areas if the need arises.
eSpeak uses the "sinusoidal" technique of synthesis. Basically it
makes vowels and sonorant consonants, eg. [r,l,m,n,w] by adding
together the sine waves of harmonics in varying proportions.
Unvoiced consonants such as [h,t,s,f,k] are simply recorded sound
samples, while voiced consonants eg [v,z,d,g] are a mixture of these
two methods.
It should be interesting to see what are the specific needs of a TTS
engine for use with speakup, and how eSpeak can be improved to give a
better match to them.
I've been using eSpeak on Linux mostly with the KDE TTS system (KTTS).
I don't know anything about speakup or speech-despatcher yet, other
than a quick look at the speakup user guide.
A problem has been reported with eSpeak, where it locks up when it
finds the sound device is already in use. I've made a fix for this for
the next release, which I'll probably make in a few days. Perhaps
there are some other changes that will be useful too. Let me know :-)
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: eSpeak - Introduction eSpeak - Introduction Jonathan Duddington @ ` Lorenzo Taylor ` Willem van der Walt ` (2 subsequent siblings) 3 siblings, 0 replies; 12+ messages in thread From: Lorenzo Taylor @ UTC (permalink / raw) To: Speakup is a screen review system for Linux. -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ah! You found me! I am the one who reported the lockup problem. I am one of the people here who are using Speakup's interface to Speech-dispatcher to connect with eSpeak. Your speech synthesizer has been getting a lot of mention here on this list recently, as it is quickly replacing Festival and Flite for many of us who are using software speech on a regular basis. I am sure many of us will be able to help you with any issues you may have with Linux and Linux programming just as you have greatly helped us by writing the first good sounding and fast software speech synthesizer for use with Speakup. Thanks for a great program, Lorenzo - -- They have been at a great feast of languages, and stolen the scraps. -- William Shakespeare, "Love's Labour's Lost" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) iD8DBQFEPbdJG9IpekrhBfIRAgs1AJ9/+PudmQ2LyJZymhbZBnX6g3w4UgCglkXN 99ZjmGEN3YWyuS/oaD9rL/4= =Et6y -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: eSpeak - Introduction eSpeak - Introduction Jonathan Duddington ` Lorenzo Taylor @ ` Willem van der Walt ` Jonathan Duddington ` Kirk Reiser ` eSpeak -- features wish list Hynek Hanke 3 siblings, 1 reply; 12+ messages in thread From: Willem van der Walt @ UTC (permalink / raw) To: Jonathan Duddington; +Cc: speakup Hi Jonathan Thanks for a much needed useful free software synthesizer. The clarity and accuracy of the speech is quite good. I am interested in the process of creating a new language using espeak. Where can I get more detail on that? I think there is a bug in the espeak program when reading long text files. To me it is no problem as I am using speech-dispatcher which sends smaller chunks of text at a time, but others might be using the file feature. The program segfaults after a time. Thanks again for a great program. Regards, Willem On Thu, 13 Apr 2006, Jonathan Duddington wrote: > I've just found this mailing list from a comment on the > espeak.sourceforge.net forum. It's good to see people using (or trying > to use) eSpeak. > > eSpeak is not new, I originally wrote it about 10 years ago for the > Acorn/RISC OS platform. The Linux version is a re-write of the code > which I've been doing over the past few months. I'm definitely not an > expert on Linux system programming or system administration, so I'll > probably need to ask for help in those areas if the need arises. > > eSpeak uses the "sinusoidal" technique of synthesis. Basically it > makes vowels and sonorant consonants, eg. [r,l,m,n,w] by adding > together the sine waves of harmonics in varying proportions. > > Unvoiced consonants such as [h,t,s,f,k] are simply recorded sound > samples, while voiced consonants eg [v,z,d,g] are a mixture of these > two methods. > > It should be interesting to see what are the specific needs of a TTS > engine for use with speakup, and how eSpeak can be improved to give a > better match to them. > > I've been using eSpeak on Linux mostly with the KDE TTS system (KTTS). > I don't know anything about speakup or speech-despatcher yet, other > than a quick look at the speakup user guide. > > A problem has been reported with eSpeak, where it locks up when it > finds the sound device is already in use. I've made a fix for this for > the next release, which I'll probably make in a few days. Perhaps > there are some other changes that will be useful too. Let me know :-) > > > _______________________________________________ > Speakup mailing list > Speakup@braille.uwo.ca > http://speech.braille.uwo.ca/mailman/listinfo/speakup > -- This message is subject to the CSIR's copyright, terms and conditions and e-mail legal notice. Views expressed herein do not necessarily represent the views of the CSIR. CSIR E-mail Legal Notice http://mail.csir.co.za/CSIR_eMail_Legal_Notice.html CSIR Copyright, Terms and Conditions http://mail.csir.co.za/CSIR_Copyright.html For electronic copies of the CSIR Copyright, Terms and Conditions and the CSIR Legal Notice send a blank message with REQUEST LEGAL in the subject line to HelpDesk@csir.co.za. This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. MailScanner thanks Transtec Computers for their support. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: eSpeak - Introduction ` Willem van der Walt @ ` Jonathan Duddington ` ace 0 siblings, 1 reply; 12+ messages in thread From: Jonathan Duddington @ UTC (permalink / raw) To: speakup In article <Pine.LNX.4.62.0604130852320.28756@localhost.localdomain>, Willem van der Walt <wvdwalt@csir.co.za> wrote: > I think there is a bug in the espeak program when reading long text > files. To me it is no problem as I am using speech-dispatcher which > sends smaller chunks of text at a time, but others might be using > the file feature. The program segfaults after a time. That's puzzling, and worrying. I often speak large text files with speak -f textile and I've not had any problem like that. Were you using any other options? > I am interested in the process of creating a new language using > espeak. Where can I get more detail on that? Firstly read the Documents section from the web site (http://espeak.sourceforge.net): Dictionary, Phonemes, Phoneme Tables, (and download phonsource.zip referenced from there). There is another program which I haven't released yet which compiles the Phoneme Tables, together with sound recordings of consonants and formant specifications of vowels. It also includes an editor for the vowels. The interface needs tidying up a bit, but the biggest job is writing user instructions so that others can use it. I hope to do this though. If you want to try it out without instructions, I could make it available fairly soon :-) It would be very interesting if someone did do another language implementation. It would help to identify language-dependent features. Depending on which language, I might need to add some new features to the speech engine. Firstly you need to get a phonological description of your language (eg. what phonemes it uses). Looking up "yourlanguage language" in wikipedia might give some useful information. It may be that, as a first approximation, you can use already provided phonemes from the list in the Phonemes.html document. You can try out example words in your new language by giving phoneme codes enclosed within double square brackets, eg: speak "[h@l'oU w'3:ld]]" would say "hello world" in English, speak "[g'yt@n t'A:g]]" would say "güten tag" in German, using the [y] phoneme, which isn't used in English, but which is already provided in eSpeak. Perhaps you can find a set of passable phonemes for your language (you can implement more accurate versions later). A Bantu language would be more of a challenge (eg. tonal language, click consonants). Then you can start constructing pronunciation rules in the <language>_rules file. The <language>_list file gives exceptions and also those common words which are usually unstressed ("the", "is", "in", etc). See the "data/" directory in eSpeak's "source.zip" download package for examples. Hopefully your language's spelling rules won't be as difficult as English! Set up a Voice in espeak-data/voices for your language (specifying your language's dictionary, but keeping the default phoneme set for now) and compile the dictionary files with speak --compile=yourvoice That should give you a very rudimentary implementation of your language. It might be intelligible :-) eSpeak is written in C++. You can write a new Translator class for your language which can keep the functions of the base Translator class, or can set options to vary their effect (eg. which syllable of a word usually takes the main stress, or the difference in length between stressed and unstressed syllables), or can override them with replacement functions (eg. a new sentence intonation algorithm). Now your language should be sounding better, as you listen to it speaking, notice problems, make adjustments to rules, phoneme realizations, and various tuning parameters. If you're serious about implementing a language, then I'll be happy to help with support, program features, information and documentation. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: eSpeak - Introduction ` Jonathan Duddington @ ` ace ` Jonathan Duddington 0 siblings, 1 reply; 12+ messages in thread From: ace @ UTC (permalink / raw) To: Speakup is a screen review system for Linux. -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, This is a very interesting conversation. If you don't mind me asking, Jonathan, have you had any courses in linguistics? This is my particular area of study and Spanish is my second language. I would be more than happy to help design a Spanish language for your synthesizer; however, I am not a coder. How much knowledge of C/C++ coding is necessary to create a language? I think helping to create a language for a synthesizer would be good practice for my interest in linguistics. Thanks for a good synthesizer. I am having a slight issue but I will address that in another message. Robby -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) iD8DBQFEPpAowamDI1Gy2GIRAnB9AKDx+269tPgeNBxPg3FQvZASHxes8ACgjqBr qxiLuos3IHhDv8CT7BncsxA= =aKRp -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: eSpeak - Introduction ` ace @ ` Jonathan Duddington 0 siblings, 0 replies; 12+ messages in thread From: Jonathan Duddington @ UTC (permalink / raw) To: Speakup is a screen review system for Linux. In article <20060413175344.GA26811@amelia.voyager.net>, ace <ace@freedomchat.org> wrote: > This is a very interesting conversation. If you don't mind me asking, > Jonathan, have you had any courses in linguistics? No, just reading books and websites and trial and error :-) > This is my particular area of study and Spanish is my second > language. I would be more than happy to help design a Spanish > language for your synthesizer; however, I am not a coder. How much > knowledge of C/C++ coding is necessary to create a language? Probably not much for Spanish. I doubt there's much to change in the program other than adjusting numbers. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: eSpeak - Introduction eSpeak - Introduction Jonathan Duddington ` Lorenzo Taylor ` Willem van der Walt @ ` Kirk Reiser ` Jonathan Duddington ` eSpeak -- features wish list Hynek Hanke 3 siblings, 1 reply; 12+ messages in thread From: Kirk Reiser @ UTC (permalink / raw) To: Speakup is a screen review system for Linux. Hello and welcome Jonathan: It is a refreshing addition to the available software synthesis programs of today. If you are interested in extending it's capabilities to some degree, inline control parametres for the common adjustable settings would be nice. Being able to modify speed, pitch, intonation and the like is very useful for producing features such as pitch change on capital letters for example. The doubletalk hardware synthesizer from RC Systems has a nice and concise command language set you may want to look at as an example. You can find documentation for various synths at ftp://linux-speakup.org/pub/linux/goodies/synths-documentation. It isn't important to support the entire command set but the basic commands would make (e)speak more flexible for the blind computer user. A very nice job on your synth in my opinion! Kirk -- Kirk Reiser The Computer Braille Facility e-mail: kirk@braille.uwo.ca University of Western Ontario phone: (519) 661-3061 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: eSpeak - Introduction ` Kirk Reiser @ ` Jonathan Duddington ` Lorenzo Taylor ` Kirk Reiser 0 siblings, 2 replies; 12+ messages in thread From: Jonathan Duddington @ UTC (permalink / raw) To: Speakup is a screen review system for Linux. In article <x71ww13hg0.fsf@speech.braille.uwo.ca>, Kirk Reiser <kirk@braille.uwo.ca> wrote: > If you are interested in extending it's capabilities to some degree, > inline control parametres for the common adjustable settings would be > nice. Being able to modify speed, pitch, intonation and the like is > very useful for producing features such as pitch change on capital > letters for example. The doubletalk hardware synthesizer from RC > Systems has a nice and concise command language set you may want to > look at as an example. Yes, I think embedded control commands within the text stream would be a useful addition. This should be possible with some work. I'm not sure exactly which document you meant. I looked at "DoubleTalk-developers.txt" which has a section "Interrogating DoubleTalk" which mentions various parameters (including tone, articulation, expression, punc level) but doesn't define them. I didn't find a section on setting parameters from within the text stream. Perhaps better would be if you could start by specifying exactly what you would like ideally. What parameters, and the syntax of how they could be specified within the text. Should they be based on some established speech mark-up language? I'm not familiar with that topic. Would you need to write a speakup module specific to eSpeak rather than using a generic one? Is the pitch variation meant to be the same voice adjusting his pitch, or does a different pitch imply different voice characteristics; eSpeak's "female" voice is a variation on the standard male voice but it needed adjustments to the formant frequencies as well as the pitch to sound anything like reasonable. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: eSpeak - Introduction ` Jonathan Duddington @ ` Lorenzo Taylor ` Kirk Reiser 1 sibling, 0 replies; 12+ messages in thread From: Lorenzo Taylor @ UTC (permalink / raw) To: Speakup is a screen review system for Linux. -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The idea of embedded commands sounds good, but the current Speech-dispatcher module is using eSpeak's command line interface. Allowing embedded control commands would, I think, require a specific eSpeak module, which wouldn't at all be a bad thing, and in fact is probably needed anyway,but the sound device fix would no longer work properly. At this point, eSpeak is going to exit when the sound device is in use. If we implement embedded control commands, it would require that eSpeak keep running no matter what until Speech-dispatcher is stopped. In that case, if the sound device is in use, eSpeak will need to simply skip speaking rather than exit completely. Over all, I really like the idea, and the Doubletalk command set, if you can find it, :) would be a good place to start. Regarding your pitch adjustment question, the pitch adjustment should be plus or minus the base pitch of the voice in question. For example, the base pitch could be 0, a negative number would lower the pitch, and a positive number could raise the pitch. Another way to implement this option would be to set your default pitch in the middle and define a highest and lowest pitch. Then 0 could map to the lowest pitch and 100 or whatever could map to the highest pitch for a specific voice. I personally prefer the plus/minus implementation, but most other hardware and software synthesizers available today seem to use the default to middle approach. It's your call. In either case, the formant frequencies probably don't need to be changed along with the pitch, at this point you can try just changing the pitch itself. Thanks much and hope this helps some, Lorenzo - -- You will inherit millions of dollars. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) iD8DBQFEPl4BG9IpekrhBfIRAuOJAKDCzyMYAgylV8xjZLZX4/GtMf62YwCgknhI UOx26AOVqWP5anAj61J8scg= =vf4q -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: eSpeak - Introduction ` Jonathan Duddington ` Lorenzo Taylor @ ` Kirk Reiser ` eSpeak - embedded commands Jonathan Duddington 1 sibling, 1 reply; 12+ messages in thread From: Kirk Reiser @ UTC (permalink / raw) To: Speakup is a screen review system for Linux. Jonathan Duddington <jsd@clara.co.uk> writes: > Yes, I think embedded control commands within the text stream would be > a useful addition. This should be possible with some work. > > I'm not sure exactly which document you meant. I looked at > "DoubleTalk-developers.txt" which has a section "Interrogating > DoubleTalk" which mentions various parameters (including tone, > articulation, expression, punc level) but doesn't define them. I > didn't find a section on setting parameters from within the text stream. Oops! Forsome reason the document containing the DoubleTalk command set and a lot of other data was not in that directory. I have now placed it there called DoubleTalk-rc8650.txt. There is a complete section on the dtlk's command set. > Perhaps better would be if you could start by specifying exactly what > you would like ideally. What parameters, and the syntax of how they > could be specified within the text. Typically a command within the text stream starts with a control-a is followed by one or two character parametre and a single command letter such as 'S' for speed. The parametres are usually a number between 0-9 or + and - for relative movement. So for example ctrl-a8S would give a fairly fast speech rate. > Should they be based on some established speech mark-up language? I'm > not familiar with that topic. That's why I recommended looking at the dtlk command set. It is complete yet concise which say the DECTalk command set is not. > Would you need to write a speakup module specific to eSpeak rather than > using a generic one? Currently speakup has a software synth output module which uses a subset of the dtlk command set and can be accessed directly by opening a device /dev/softsyn or can be accessed through the speech dispatcher system with a stub program speechd-up which opens the device and feeds output to speech dispatcher. > Is the pitch variation meant to be the same voice adjusting his pitch, > or does a different pitch imply different voice characteristics; > eSpeak's "female" voice is a variation on the standard male voice but > it needed adjustments to the formant frequencies as well as the pitch > to sound anything like reasonable. You can use either method some folks are happy with a pitch shift while others prefer a totally different voice and yet others prefer tones or words like 'cap ', so it's an individual choice thing. The basic set of commands you would want would be rate 'S', pitch 'P', number and punctuation handling 'B', voice 'O' and maybe volume 'V'. A switch to handle language change and a number of other options would be useful as well. Kirk -- Kirk Reiser The Computer Braille Facility e-mail: kirk@braille.uwo.ca University of Western Ontario phone: (519) 661-3061 ^ permalink raw reply [flat|nested] 12+ messages in thread
* eSpeak - embedded commands ` Kirk Reiser @ ` Jonathan Duddington 0 siblings, 0 replies; 12+ messages in thread From: Jonathan Duddington @ UTC (permalink / raw) To: Speakup is a screen review system for Linux. In article <x7d5flh7xw.fsf@speech.braille.uwo.ca>, Kirk Reiser <kirk@braille.uwo.ca> wrote: > I have now placed it there called DoubleTalk-rc8650.txt. There is a > complete section on the dtlk's command set. Thanks. I've been working on this and I've put up the results so far, in file test-1.09b-linux at the Downloads page at http://espeak.sourceforge.net/ ... so you can try it out and see if I've got it right :-) I've implemented Pitch, Speed, Volume, and Reverberation (OK, the last probably isn't very useful, but it was easy). "Speed" should only be used as an adjustment, not as the overall master speed setting. Also I've added a -p option to the command line to adjust the pitch from there (and renamed the previous -p and -P phoneme-output options as -x and -X). ^ permalink raw reply [flat|nested] 12+ messages in thread
* eSpeak -- features wish list eSpeak - Introduction Jonathan Duddington ` (2 preceding siblings ...) ` Kirk Reiser @ ` Hynek Hanke 3 siblings, 0 replies; 12+ messages in thread From: Hynek Hanke @ UTC (permalink / raw) To: jsd; +Cc: speakup Hello Jonathan, I'm happy people like eSpeak so much and it seems it is a very good technology. I'm going to add the config script for Speech Dispatcher to the official distribution in the next release. You inquired about what features could you at for it to be more usable for accessibility purposes. I'm the main developer of Speech Dispatcher, a project that tries to unify the access of free software accessibility tools to speech synthesis engines. Basically, what we want to do right now, is to split Speech Dispatcher in two parts: message dispatching (prioritization etc.) and TTS API (access to synthesizers). For that purpose, we developed a requirements document for the API, which also more or less defines the capabilities we expect from the synthesizers. You might want to look at the requirements document http://lists.freedesktop.org/archives/accessibility/2006-March/000078.html It is still a draft and there will be some changes to it. But the sub-part about SSML deals with the synthesis settings capabilities which the users want or would like to have. Of course I'm posting the link to this document merely as a potential guideline for you. This API will be implemented by some layer above the engine drivers and missing MUST HAVE and SHOULD HAVE capabilities can still be emulated either in the engine drivers or in the covering layer. This API is being worked on by Brailcom (Speech Dispatcher), KDE and Gnome. In fact, KDE is going to use Speech Dispatcher soon. The things that would most help currently are: 1) Be able to return audio data, not play them itself. (This would enable us to write a native driver for Dispatcher or TTS API which could be a good improvement. Also it would instantly solve the audio problems.) 2) Settings for punctuation and capital letters signalization. (See TTS API requirements draft above, section SSML. This doesn't mean this functionality needs to be implemented with SSML or embedded markup. It can be a static settings to the binary (espeak --punctuation="all") ). 3) Some way of communication other than running the binary for each message again (which is more CPU expansive). See for example how Flite works with Dispatcher (linking a library) or how Festival works (provides a TCP/IP interface). I hope I didn't scare you very much :) Of course these are wishes and some of them rather longer-term wishes. I think you have done a great work! Thank you, Hynek Hanke ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~ UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
eSpeak - Introduction Jonathan Duddington
` Lorenzo Taylor
` Willem van der Walt
` Jonathan Duddington
` ace
` Jonathan Duddington
` Kirk Reiser
` Jonathan Duddington
` Lorenzo Taylor
` Kirk Reiser
` eSpeak - embedded commands Jonathan Duddington
` eSpeak -- features wish list Hynek Hanke
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).