* eSpeak - Introduction
@ Jonathan Duddington
` Lorenzo Taylor
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Jonathan Duddington @ UTC (permalink / raw)
To: speakup
I've just found this mailing list from a comment on the
espeak.sourceforge.net forum. It's good to see people using (or trying
to use) eSpeak.
eSpeak is not new, I originally wrote it about 10 years ago for the
Acorn/RISC OS platform. The Linux version is a re-write of the code
which I've been doing over the past few months. I'm definitely not an
expert on Linux system programming or system administration, so I'll
probably need to ask for help in those areas if the need arises.
eSpeak uses the "sinusoidal" technique of synthesis. Basically it
makes vowels and sonorant consonants, eg. [r,l,m,n,w] by adding
together the sine waves of harmonics in varying proportions.
Unvoiced consonants such as [h,t,s,f,k] are simply recorded sound
samples, while voiced consonants eg [v,z,d,g] are a mixture of these
two methods.
It should be interesting to see what are the specific needs of a TTS
engine for use with speakup, and how eSpeak can be improved to give a
better match to them.
I've been using eSpeak on Linux mostly with the KDE TTS system (KTTS).
I don't know anything about speakup or speech-despatcher yet, other
than a quick look at the speakup user guide.
A problem has been reported with eSpeak, where it locks up when it
finds the sound device is already in use. I've made a fix for this for
the next release, which I'll probably make in a few days. Perhaps
there are some other changes that will be useful too. Let me know :-)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: eSpeak - Introduction
eSpeak - Introduction Jonathan Duddington
@ ` Lorenzo Taylor
` Willem van der Walt
` (2 subsequent siblings)
3 siblings, 0 replies; 13+ messages in thread
From: Lorenzo Taylor @ UTC (permalink / raw)
To: Speakup is a screen review system for Linux.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ah! You found me! I am the one who reported the lockup problem. I am one of
the people here who are using Speakup's interface to Speech-dispatcher to
connect with eSpeak. Your speech synthesizer has been getting a lot of mention
here on this list recently, as it is quickly replacing Festival and Flite for
many of us who are using software speech on a regular basis. I am sure many of
us will be able to help you with any issues you may have with Linux and Linux
programming just as you have greatly helped us by writing the first good sounding
and fast software speech synthesizer for use with Speakup.
Thanks for a great program,
Lorenzo
- --
They have been at a great feast of languages, and stolen the scraps.
-- William Shakespeare, "Love's Labour's Lost"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
iD8DBQFEPbdJG9IpekrhBfIRAgs1AJ9/+PudmQ2LyJZymhbZBnX6g3w4UgCglkXN
99ZjmGEN3YWyuS/oaD9rL/4=
=Et6y
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: eSpeak - Introduction
eSpeak - Introduction Jonathan Duddington
` Lorenzo Taylor
@ ` Willem van der Walt
` Jonathan Duddington
` Kirk Reiser
` eSpeak -- features wish list Hynek Hanke
3 siblings, 1 reply; 13+ messages in thread
From: Willem van der Walt @ UTC (permalink / raw)
To: Jonathan Duddington; +Cc: speakup
Hi Jonathan
Thanks for a much needed useful free software synthesizer.
The clarity and accuracy of the speech is quite good. I am interested in
the process of creating a new language using espeak.
Where can I get more detail on that?
I think there is a bug in the espeak program when reading long text files.
To me it is no problem as I am using speech-dispatcher which sends smaller
chunks of text at a time, but others might be using the file feature.
The program segfaults after a time.
Thanks again for a great program.
Regards, Willem
On Thu, 13 Apr 2006, Jonathan Duddington wrote:
> I've just found this mailing list from a comment on the
> espeak.sourceforge.net forum. It's good to see people using (or trying
> to use) eSpeak.
>
> eSpeak is not new, I originally wrote it about 10 years ago for the
> Acorn/RISC OS platform. The Linux version is a re-write of the code
> which I've been doing over the past few months. I'm definitely not an
> expert on Linux system programming or system administration, so I'll
> probably need to ask for help in those areas if the need arises.
>
> eSpeak uses the "sinusoidal" technique of synthesis. Basically it
> makes vowels and sonorant consonants, eg. [r,l,m,n,w] by adding
> together the sine waves of harmonics in varying proportions.
>
> Unvoiced consonants such as [h,t,s,f,k] are simply recorded sound
> samples, while voiced consonants eg [v,z,d,g] are a mixture of these
> two methods.
>
> It should be interesting to see what are the specific needs of a TTS
> engine for use with speakup, and how eSpeak can be improved to give a
> better match to them.
>
> I've been using eSpeak on Linux mostly with the KDE TTS system (KTTS).
> I don't know anything about speakup or speech-despatcher yet, other
> than a quick look at the speakup user guide.
>
> A problem has been reported with eSpeak, where it locks up when it
> finds the sound device is already in use. I've made a fix for this for
> the next release, which I'll probably make in a few days. Perhaps
> there are some other changes that will be useful too. Let me know :-)
>
>
> _______________________________________________
> Speakup mailing list
> Speakup@braille.uwo.ca
> http://speech.braille.uwo.ca/mailman/listinfo/speakup
>
--
This message is subject to the CSIR's copyright, terms and conditions and
e-mail legal notice. Views expressed herein do not necessarily represent the
views of the CSIR.
CSIR E-mail Legal Notice
http://mail.csir.co.za/CSIR_eMail_Legal_Notice.html
CSIR Copyright, Terms and Conditions
http://mail.csir.co.za/CSIR_Copyright.html
For electronic copies of the CSIR Copyright, Terms and Conditions and the CSIR
Legal Notice send a blank message with REQUEST LEGAL in the subject line to
HelpDesk@csir.co.za.
This message has been scanned for viruses and dangerous content by MailScanner,
and is believed to be clean. MailScanner thanks Transtec Computers for their support.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: eSpeak - Introduction
eSpeak - Introduction Jonathan Duddington
` Lorenzo Taylor
` Willem van der Walt
@ ` Kirk Reiser
` Jonathan Duddington
` eSpeak -- features wish list Hynek Hanke
3 siblings, 1 reply; 13+ messages in thread
From: Kirk Reiser @ UTC (permalink / raw)
To: Speakup is a screen review system for Linux.
Hello and welcome Jonathan: It is a refreshing addition to the
available software synthesis programs of today. If you are interested
in extending it's capabilities to some degree, inline control
parametres for the common adjustable settings would be nice. Being
able to modify speed, pitch, intonation and the like is very useful
for producing features such as pitch change on capital letters for
example. The doubletalk hardware synthesizer from RC Systems has a
nice and concise command language set you may want to look at as an
example. You can find documentation for various synths at
ftp://linux-speakup.org/pub/linux/goodies/synths-documentation. It
isn't important to support the entire command set but the basic
commands would make (e)speak more flexible for the blind computer
user.
A very nice job on your synth in my opinion!
Kirk
--
Kirk Reiser The Computer Braille Facility
e-mail: kirk@braille.uwo.ca University of Western Ontario
phone: (519) 661-3061
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: eSpeak - Introduction
` Willem van der Walt
@ ` Jonathan Duddington
` ace
0 siblings, 1 reply; 13+ messages in thread
From: Jonathan Duddington @ UTC (permalink / raw)
To: speakup
In article <Pine.LNX.4.62.0604130852320.28756@localhost.localdomain>,
Willem van der Walt <wvdwalt@csir.co.za> wrote:
> I think there is a bug in the espeak program when reading long text
> files. To me it is no problem as I am using speech-dispatcher which
> sends smaller chunks of text at a time, but others might be using
> the file feature. The program segfaults after a time.
That's puzzling, and worrying. I often speak large text files with
speak -f textile
and I've not had any problem like that. Were you using any other
options?
> I am interested in the process of creating a new language using
> espeak. Where can I get more detail on that?
Firstly read the Documents section from the web site
(http://espeak.sourceforge.net): Dictionary, Phonemes, Phoneme Tables,
(and download phonsource.zip referenced from there).
There is another program which I haven't released yet which compiles
the Phoneme Tables, together with sound recordings of consonants and
formant specifications of vowels. It also includes an editor for the
vowels. The interface needs tidying up a bit, but the biggest job is
writing user instructions so that others can use it. I hope to do this
though. If you want to try it out without instructions, I could make
it available fairly soon :-)
It would be very interesting if someone did do another language
implementation. It would help to identify language-dependent features.
Depending on which language, I might need to add some new features to
the speech engine.
Firstly you need to get a phonological description of your language
(eg. what phonemes it uses). Looking up "yourlanguage language" in
wikipedia might give some useful information.
It may be that, as a first approximation, you can use already provided
phonemes from the list in the Phonemes.html document. You can try out
example words in your new language by giving phoneme codes enclosed
within double square brackets, eg:
speak "[h@l'oU w'3:ld]]"
would say "hello world" in English,
speak "[g'yt@n t'A:g]]"
would say "güten tag" in German, using the [y] phoneme, which isn't
used in English, but which is already provided in eSpeak.
Perhaps you can find a set of passable phonemes for your language (you
can implement more accurate versions later). A Bantu language would be
more of a challenge (eg. tonal language, click consonants).
Then you can start constructing pronunciation rules in the
<language>_rules file. The <language>_list file gives exceptions and
also those common words which are usually unstressed ("the", "is",
"in", etc). See the "data/" directory in eSpeak's "source.zip"
download package for examples. Hopefully your language's spelling
rules won't be as difficult as English!
Set up a Voice in espeak-data/voices for your language (specifying
your language's dictionary, but keeping the default phoneme set for
now) and compile the dictionary files with
speak --compile=yourvoice
That should give you a very rudimentary implementation of your
language. It might be intelligible :-)
eSpeak is written in C++. You can write a new Translator class for
your language which can keep the functions of the base Translator
class, or can set options to vary their effect (eg. which syllable of a
word usually takes the main stress, or the difference in length between
stressed and unstressed syllables), or can override them with
replacement functions (eg. a new sentence intonation algorithm). Now
your language should be sounding better, as you listen to it speaking,
notice problems, make adjustments to rules, phoneme realizations, and
various tuning parameters.
If you're serious about implementing a language, then I'll be happy to
help with support, program features, information and documentation.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: eSpeak - Introduction
` Kirk Reiser
@ ` Jonathan Duddington
` Lorenzo Taylor
` Kirk Reiser
0 siblings, 2 replies; 13+ messages in thread
From: Jonathan Duddington @ UTC (permalink / raw)
To: Speakup is a screen review system for Linux.
In article <x71ww13hg0.fsf@speech.braille.uwo.ca>,
Kirk Reiser <kirk@braille.uwo.ca> wrote:
> If you are interested in extending it's capabilities to some degree,
> inline control parametres for the common adjustable settings would be
> nice. Being able to modify speed, pitch, intonation and the like is
> very useful for producing features such as pitch change on capital
> letters for example. The doubletalk hardware synthesizer from RC
> Systems has a nice and concise command language set you may want to
> look at as an example.
Yes, I think embedded control commands within the text stream would be
a useful addition. This should be possible with some work.
I'm not sure exactly which document you meant. I looked at
"DoubleTalk-developers.txt" which has a section "Interrogating
DoubleTalk" which mentions various parameters (including tone,
articulation, expression, punc level) but doesn't define them. I
didn't find a section on setting parameters from within the text stream.
Perhaps better would be if you could start by specifying exactly what
you would like ideally. What parameters, and the syntax of how they
could be specified within the text.
Should they be based on some established speech mark-up language? I'm
not familiar with that topic.
Would you need to write a speakup module specific to eSpeak rather than
using a generic one?
Is the pitch variation meant to be the same voice adjusting his pitch,
or does a different pitch imply different voice characteristics;
eSpeak's "female" voice is a variation on the standard male voice but
it needed adjustments to the formant frequencies as well as the pitch
to sound anything like reasonable.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: eSpeak - Introduction
` Jonathan Duddington
@ ` Lorenzo Taylor
` Kirk Reiser
1 sibling, 0 replies; 13+ messages in thread
From: Lorenzo Taylor @ UTC (permalink / raw)
To: Speakup is a screen review system for Linux.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
The idea of embedded commands sounds good, but the current Speech-dispatcher
module is using eSpeak's command line interface. Allowing embedded control
commands would, I think, require a specific eSpeak module, which wouldn't at all
be a bad thing, and in fact is probably needed anyway,but the sound device fix
would no longer work properly. At this point, eSpeak is going to exit when the
sound device is in use. If we implement embedded control commands, it would
require that eSpeak keep running no matter what until Speech-dispatcher is
stopped. In that case, if the sound device is in use, eSpeak will need to
simply skip speaking rather than exit completely. Over all, I really like the
idea, and the Doubletalk command set, if you can find it, :) would be a good
place to start.
Regarding your pitch adjustment question, the pitch adjustment should be plus or
minus the base pitch of the voice in question. For example, the base pitch
could be 0, a negative number would lower the pitch, and a positive number could
raise the pitch. Another way to implement this option would be to set your
default pitch in the middle and define a highest and lowest pitch. Then 0 could
map to the lowest pitch and 100 or whatever could map to the highest pitch for a
specific voice. I personally prefer the plus/minus implementation, but most
other hardware and software synthesizers available today seem to use the default
to middle approach. It's your call. In either case, the formant frequencies
probably don't need to be changed along with the pitch, at this point you can
try just changing the pitch itself.
Thanks much and hope this helps some,
Lorenzo
- --
You will inherit millions of dollars.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
iD8DBQFEPl4BG9IpekrhBfIRAuOJAKDCzyMYAgylV8xjZLZX4/GtMf62YwCgknhI
UOx26AOVqWP5anAj61J8scg=
=vf4q
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: eSpeak - Introduction
` Jonathan Duddington
` Lorenzo Taylor
@ ` Kirk Reiser
` eSpeak - embedded commands Jonathan Duddington
1 sibling, 1 reply; 13+ messages in thread
From: Kirk Reiser @ UTC (permalink / raw)
To: Speakup is a screen review system for Linux.
Jonathan Duddington <jsd@clara.co.uk> writes:
> Yes, I think embedded control commands within the text stream would be
> a useful addition. This should be possible with some work.
>
> I'm not sure exactly which document you meant. I looked at
> "DoubleTalk-developers.txt" which has a section "Interrogating
> DoubleTalk" which mentions various parameters (including tone,
> articulation, expression, punc level) but doesn't define them. I
> didn't find a section on setting parameters from within the text stream.
Oops! Forsome reason the document containing the DoubleTalk command
set and a lot of other data was not in that directory. I have now
placed it there called DoubleTalk-rc8650.txt. There is a complete
section on the dtlk's command set.
> Perhaps better would be if you could start by specifying exactly what
> you would like ideally. What parameters, and the syntax of how they
> could be specified within the text.
Typically a command within the text stream starts with a control-a is
followed by one or two character parametre and a single command letter
such as 'S' for speed. The parametres are usually a number between
0-9 or + and - for relative movement. So for example ctrl-a8S would
give a fairly fast speech rate.
> Should they be based on some established speech mark-up language? I'm
> not familiar with that topic.
That's why I recommended looking at the dtlk command set. It is
complete yet concise which say the DECTalk command set is not.
> Would you need to write a speakup module specific to eSpeak rather than
> using a generic one?
Currently speakup has a software synth output module which uses a
subset of the dtlk command set and can be accessed directly by opening
a device /dev/softsyn or can be accessed through the speech dispatcher
system with a stub program speechd-up which opens the device and feeds
output to speech dispatcher.
> Is the pitch variation meant to be the same voice adjusting his pitch,
> or does a different pitch imply different voice characteristics;
> eSpeak's "female" voice is a variation on the standard male voice but
> it needed adjustments to the formant frequencies as well as the pitch
> to sound anything like reasonable.
You can use either method some folks are happy with a pitch shift
while others prefer a totally different voice and yet others prefer
tones or words like 'cap ', so it's an individual choice thing.
The basic set of commands you would want would be rate 'S', pitch 'P',
number and punctuation handling 'B', voice 'O' and maybe volume 'V'.
A switch to handle language change and a number of other options would
be useful as well.
Kirk
--
Kirk Reiser The Computer Braille Facility
e-mail: kirk@braille.uwo.ca University of Western Ontario
phone: (519) 661-3061
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: eSpeak - Introduction
` Jonathan Duddington
@ ` ace
` Jonathan Duddington
0 siblings, 1 reply; 13+ messages in thread
From: ace @ UTC (permalink / raw)
To: Speakup is a screen review system for Linux.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
This is a very interesting conversation. If you don't mind me asking,
Jonathan, have you had any courses in linguistics? This is my
particular area of study and Spanish is my second language. I would be
more than happy to help design a Spanish language for your synthesizer;
however, I am not a coder. How much knowledge of C/C++ coding is
necessary to create a language? I think helping to create a language
for a synthesizer would be good practice for my interest in linguistics.
Thanks for a good synthesizer. I am having a slight issue but I will
address that in another message.
Robby
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
iD8DBQFEPpAowamDI1Gy2GIRAnB9AKDx+269tPgeNBxPg3FQvZASHxes8ACgjqBr
qxiLuos3IHhDv8CT7BncsxA=
=aKRp
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 13+ messages in thread
* eSpeak -- features wish list
eSpeak - Introduction Jonathan Duddington
` (2 preceding siblings ...)
` Kirk Reiser
@ ` Hynek Hanke
3 siblings, 0 replies; 13+ messages in thread
From: Hynek Hanke @ UTC (permalink / raw)
To: jsd; +Cc: speakup
Hello Jonathan,
I'm happy people like eSpeak so much and it seems it is a very
good technology. I'm going to add the config script for Speech
Dispatcher to the official distribution in the next release. You
inquired about what features could you at for it to be more usable
for accessibility purposes.
I'm the main developer of Speech Dispatcher, a project that
tries to unify the access of free software accessibility tools
to speech synthesis engines.
Basically, what we want to do right now, is to split Speech
Dispatcher in two parts: message dispatching (prioritization etc.)
and TTS API (access to synthesizers). For that purpose, we
developed a requirements document for the API, which also
more or less defines the capabilities we expect from the
synthesizers. You might want to look at the requirements document
http://lists.freedesktop.org/archives/accessibility/2006-March/000078.html
It is still a draft and there will be some changes to it.
But the sub-part about SSML deals with the synthesis settings
capabilities which the users want or would like to have.
Of course I'm posting the link to this document merely as
a potential guideline for you. This API will be implemented by some
layer above the engine drivers and missing MUST HAVE and SHOULD HAVE
capabilities can still be emulated either in the engine drivers or in
the covering layer.
This API is being worked on by Brailcom (Speech Dispatcher), KDE and
Gnome. In fact, KDE is going to use Speech Dispatcher soon.
The things that would most help currently are:
1) Be able to return audio data, not play them itself.
(This would enable us to write a native driver for Dispatcher
or TTS API which could be a good improvement. Also it would
instantly solve the audio problems.)
2) Settings for punctuation and capital letters signalization.
(See TTS API requirements draft above, section SSML. This
doesn't mean this functionality needs to be implemented with
SSML or embedded markup. It can be a static settings to the
binary (espeak --punctuation="all") ).
3) Some way of communication other than running the binary
for each message again (which is more CPU expansive). See for
example how Flite works with Dispatcher (linking a library) or
how Festival works (provides a TCP/IP interface).
I hope I didn't scare you very much :) Of course these are wishes
and some of them rather longer-term wishes. I think you have done
a great work!
Thank you,
Hynek Hanke
^ permalink raw reply [flat|nested] 13+ messages in thread
* eSpeak - embedded commands
` Kirk Reiser
@ ` Jonathan Duddington
0 siblings, 0 replies; 13+ messages in thread
From: Jonathan Duddington @ UTC (permalink / raw)
To: Speakup is a screen review system for Linux.
In article <x7d5flh7xw.fsf@speech.braille.uwo.ca>,
Kirk Reiser <kirk@braille.uwo.ca> wrote:
> I have now placed it there called DoubleTalk-rc8650.txt. There is a
> complete section on the dtlk's command set.
Thanks.
I've been working on this and I've put up the results so far, in file
test-1.09b-linux
at the Downloads page at
http://espeak.sourceforge.net/
... so you can try it out and see if I've got it right :-)
I've implemented Pitch, Speed, Volume, and Reverberation (OK, the last
probably isn't very useful, but it was easy).
"Speed" should only be used as an adjustment, not as the overall master
speed setting.
Also I've added a -p option to the command line to adjust the pitch
from there (and renamed the previous -p and -P phoneme-output options
as -x and -X).
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: eSpeak - Introduction
` ace
@ ` Jonathan Duddington
0 siblings, 0 replies; 13+ messages in thread
From: Jonathan Duddington @ UTC (permalink / raw)
To: Speakup is a screen review system for Linux.
In article <20060413175344.GA26811@amelia.voyager.net>,
ace <ace@freedomchat.org> wrote:
> This is a very interesting conversation. If you don't mind me asking,
> Jonathan, have you had any courses in linguistics?
No, just reading books and websites and trial and error :-)
> This is my particular area of study and Spanish is my second
> language. I would be more than happy to help design a Spanish
> language for your synthesizer; however, I am not a coder. How much
> knowledge of C/C++ coding is necessary to create a language?
Probably not much for Spanish. I doubt there's much to change in the
program other than adjusting numbers.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: eSpeak - Introduction
[not found] <mailman.967.1144930700.625.speakup@braille.uwo.ca>
@ ` Michael Whapples
0 siblings, 0 replies; 13+ messages in thread
From: Michael Whapples @ UTC (permalink / raw)
To: speakup
As far as speech mark-up languages go, ssml seems to be the one that is
being currently used. Speech-dispatcher documentation mentions it in the
section about making output modules. I think if you were to make use of ssml
from speech-dispatcher, it would need to come from a specific module, the
documentation seems to say this, and it seems to indicate that there is a
file with the basic parts of the module written, so that a developer will
only have to add synth specific parts. More information about
speech-dispatcher output modules for developers is in the speech-dispatcher
documentation chapter 4.2 and the use of the generic module configuration
file is explained in chapter 2.
If you would like to see some examples of what a mark-up language can do
with speech, I could send you a sample file from the ETI eloquence synth I
have in windows, and the sound it produces (as mp3. ogg vorbis, etc). Before
anyone says eloquence does not use ssml, it might not, but it will give an
idea of what mark-up can do.
From
Michael whapples
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~ UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
eSpeak - Introduction Jonathan Duddington
` Lorenzo Taylor
` Willem van der Walt
` Jonathan Duddington
` ace
` Jonathan Duddington
` Kirk Reiser
` Jonathan Duddington
` Lorenzo Taylor
` Kirk Reiser
` eSpeak - embedded commands Jonathan Duddington
` eSpeak -- features wish list Hynek Hanke
[not found] <mailman.967.1144930700.625.speakup@braille.uwo.ca>
` eSpeak - Introduction Michael Whapples
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).