From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (qmail 7723 invoked by uid 0); 25 Sep 1996 23:18:08 -0000 MBOX-Line: From a.howell@student.qut.edu.au Thu Sep 26 01:16:45 1996 Received: (qmail 6266 invoked by uid 504); 25 Sep 1996 21:22:27 -0000 Received: (qmail 6257 invoked from smtpd); 25 Sep 1996 21:22:21 -0000 Received: from cublx2.cube.net (root@194.97.64.61) by goldfish.cube.net with SMTP; 25 Sep 1996 21:22:15 -0000 Received: from melia.qut.edu.au ([131.181.127.2]) by cublx2.cube.net with ESMTP id <24627-378>; Wed, 25 Sep 1996 13:10:05 +0100 Received: from sparrow.qut.edu.au (n1896814@sparrow.qut.edu.au) by melia.qut.edu.au (PMDF V5.0-7 #13254) id <01I9WROM6C8000O4G4@melia.qut.edu.au> for blinux-list@goldfish.cube.net; Wed, 25 Sep 1996 21:13:19 +1000 Received: from localhost (n1896814@localhost) by sparrow.qut.edu.au (8.7.6/8.7.3) with SMTP id VAA27960 for ; Wed, 25 Sep 1996 21:10:23 +1000 (EST) Date: Wed, 25 Sep 1996 21:10:23 +1000 (EST) From: AARON HOWELL Subject: Re: Speech in Linux and X-windows (fwd) To: blinux-list@goldfish.cube.net Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Content-transfer-encoding: 7BIT List-Id: This came through on another list I am subscribed to. I thought some of you, particularly those of you with unix/x programming experience might find this very interesting. Aaron ----------------------------------------------------------------------------- Aaron Howell. Q.U.T Equity Department, Technical Support/Training. work: a.howell@qut.edu.au Linux/Networking Support. home: a.howell@student.qut.edu.au phone +61-19-956-467 www: http://www.cnl.com.au/~aaron irc: DaRkAnGeL MODULE disclaimer; FROM STextIO IMPORT WriteString,WriteLn; BEGIN; WriteString("The opinions herein are mine, and do not in any way"); WriteString(" Reflect those of Q.U.T."); WriteLn; END disclaimer. ---------- Forwarded message ---------- Date: Tue, 24 Sep 1996 14:31:51 -0700 (PDT) From: Tim Noonan Reply-To: Keith Edwards To: GUISPEAK@LISTSERV.NODAK.EDU Subject: Re: Speech in Linux and X-windows (fwd) Originally From: Keith Edwards Hello, I'm not a regular subscriber to this list, but a message from it recently got forwarded to me and I felt that I should follow up. In a previous life, I was one of the developers of the UltraSonix X screenreader software at Georgia Tech (which was called Sonic X before that, and Mercator even before that...trademark issues and all). I was the project manager for part of that time. The status of the project is that work on the screenreader system at Georgia Tech officially ended back in December of last year. Most of the original developers have since gone their own way. My work now is unrelated to access. I've been in contact with the licensing folks at Tech, however, and we've gotten clearance to make the code available to anyone who wants it for non-commercial use. I've been *extremely* bogged down at work and haven't actually managed to jump through all the legal hoops to make this happen yet, but I'm going to try to do it as soon as I can. I'll forward an announcement to this list when it's done. Before I respond to some of Jim's specific comments, a bit of project history might be helpful. We started this project way back in 1991, with the goal of making a system that required no modifications to either the X client-side libraries or the X server. Basically this system was a "pseudo-server" -- a program that sits between client applications and the "real" X server and interprets the X protocol as it flies back and forth. This early version of the system had three great benefits for us: (1) it gave us a working prototype screenreader for X that we could play with, (2) it gave us some experience with the range of design possibilities for building X screenreaders, and (3) it generated attention about access issues inside the X Consortium. On item (2) above, we learned that it is *extremely* difficult to build a screenreader for X that works by just snooping X protocol traffic -- the X protocol is simply too low-level to build a reliable screenreader. This is actually something we knew before we started, but because of our design constraint of requiring no modifications to X, we were forced to go with it. A benefit of item (3) is that we were able to convince the X Consortium that a set of standard modifications could be made to the X platform that would make it easy for screenreaders (and other applications with similar needs) to work in the X environment. We built a second prototype screenreader based on a small set of modifications to X and worked with the X Consortium to get these built in to the base X distribution. As of X11R6.1, most of these "hooks" for screenreaders are present in the system. Unfortunately, one remaining piece of the puzzle is still lacking, and has not been standardized by the Consortium. (I won't go into all the grungy technical details on this piece here.) The current screenreader is based on the standardized hooks that are there, but because of the missing piece, it still requires a few modifications to the X libraries (not the server) to work. We think that our big successes with this project were 1. Actually getting the base X platform changed to incorporate hooks for accessibility. 2. Coming up with some novel interaction techniques for navigating through 2D graphical interfaces. ...and to a lesser degree... 3. Producing a system that uses the hooks to actually provide access to some X apps. I certaintly won't claim that the system is bullet-proof, but the basic machinery is all there and (IMHO) I think that quite a bit of it is fairly novel and interesting (but I'm biased of course :-). Our goal with the non-commercial release is to get the code out into the hands of seasoned X and Unix hackers and let them bring the system to the next level of robustness and usability. On to Jim's specific comments... jrebman@NETCOM.COM writes: > They are > integrating a text manipulation module (probably EMACSpeak) and a > refreshable braille output module they developed called, DOTSCREEN. They > are using a DEC Express, and because most of the development is specific > to this, as is EMACSpeak, I assume that this will be the only synthesizer > supported. The next most likely candidate would be the DEC PC, because > all that would be needed to support this synth would be a unix device > driver, and that would be much more likely to happen than a port of > textassist. By "they" I'm not sure if you mean the Georgia Tech folks, or the military intelligence folks, but I'll say a bit about how I/O works in UltraSonix. The basic idea with the system is that UltraSonix is built around a completely device-independent I/O framework that we did in-house. This framework lets you plug in a variety of devices--including speech synthesizers, external keypads, braille keyboards, and even speech recognizers--and, with a very minimal amount of programming, have them "just work" with the system. In essence, the I/O framework translates low-level device-specific events into higher level internal events used by UltraSonix. So if you can wrap a bit of code around a device to make it do this event translation, you're set to use it in UltraSonix. Having said that, we've built device support for the Dectalk DTC01, Dectalk Express, Entropic TruTalk (software-only synthesis), Alva 3/20 and 3/80 Braille terminals, a Genovations external keypad, and the IN3 speech recognition software (although support for this has fallen quite a bit out-of-date). The code to support these isn't in the form of Unix device drivers; it's in the form of small shared object modules that get dynamically loaded into the running UltraSonix executable as needed. > Even when all of this work is done, it will probably only be a mediocre > screen reader at best because it will most likely only support a very > limited number of apps. Explanation: For Ultrasonic to work effectively > with an application, it assumes some factors in application development > such as the use of motif, x-intrinsics toolkit, x-test extensions, > dynamically-linked libraries, etc., etc... In other words, applications > developers must follow a fairly rigidly-defined style guide. As with > Windows 3.1, some developers will do this, many will not, and at the > present time, there are no kernel-level accessibility extensions being > proposed, at least not any that either myself, or anybody else I know, > knows of. This is probably where the focus of the X-access project > should have been, but hindsight is 20/20. Some of the "factors in application development" that you talk about are simply the *way* that people develop in X. They're not really style-guide issues. For example, nearly all commercial (and much non-commercial) X development is done in the Motif toolkit. If you're using Motif, you're using the Xt Intrinsics layer on top of which it's built. This is not an optional thing that developers can do or not do, and it's certainly not a style guide issue. Likewise, the X Test extension is not something developers have a choice over. Pretty much 100% of the R5 and later X servers have this extension built-in, so UltraSonix can rely on this being there. There's nothing developers can do to turn it off. The accessibility extensions aren't "kernel level" (and probably shouldn't be, since Unix kernels typically know nothing of the window system anyway), but they are "X toolkit level," and which is about as good as you can get. Anyone using the X toolkit from the X Consortium already has most of these extensions in their applications, and they don't have to "do" anything to use them. In fact, they would have to go out of their way to make their applications *not* use these extensions. This was the whole point of our work with the X Consortium. There are a few factors that must be addressed by the app developers though, specifically dynamic linking. Right now, to work with UltraSonix, apps must be dynamically linked. This doesn't require any code changes on the parts of developers though. I should point out that even the dynamic linking issue exists only because not all of the required hooks are present in the standard X release. If (when?) these are present, then dynamic linking won't be required. > You can get the code from the technology office at Georgia Tech, but the > license is probably in the multi-thousands of dollars, and unless you > are, or have access to, a unix/X expert, it will probably be nothing but > frustration. I'll try to make the legal department at Georgia Tech happy and get this code out the door as soon as possible. Unfortunately I may have to agree with you about having a Unix/X expert handy, though. The software is a bit painful to port and requires a bit of mastery to get the modified X libraries running. And some things, particularly basic text navigation, are not as usable as they could be (building full-scale text-handling code a la commercial screenreaders is very time-consuming). Thanks for wading through all this. If anyone has any followups, please CC me on them since I won't get mail sent just to GUISPEAK. -keith ---- keith edwards xerox palo alto research center