public inbox for speakup@linux-speakup.org
 help / color / mirror / Atom feed
* accessing .docx documents
@  Chuck Hallenbeck
       [not found] ` <20090526150536.GA28352@csy.ca>
  0 siblings, 1 reply; 5+ messages in thread
From: Chuck Hallenbeck @  UTC (permalink / raw)
  To: Speakup is a screen review system for Linux.

Lately I have been sent some documents with the file suffix ".docx" and
am at a loss to access them with command line tools. Unzip unzips them
okay, but the result is thre or four directories of things, with the
bulk of the document in a huge two line file containing an XML header
on the first line, and the entire balance of the document on the second
line.

Has anybody figured out what to do with these things? Suggestions
appreciated.

Chuck

-- 
The Moon is Waxing Crescent (6% of Full)
                  My web site: www.hallenbeck.ftml.net
                      Microblog: http://identi.ca
                                --------
	If builders built buildings the way programmers wrote programs,
	then the first woodpecker that came along would destroy civilization.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: accessing .docx documents
       [not found] ` <20090526150536.GA28352@csy.ca>
@    ` Chris Brannon
       ` Chuck Hallenbeck
       ` Jason White
  0 siblings, 2 replies; 5+ messages in thread
From: Chris Brannon @  UTC (permalink / raw)
  To: Speakup is a screen review system for Linux.

Shane W wrote:
> On Tue, May 26, 2009 at 10:38:14AM -0400, Chuck Hallenbeck wrote:
> > Lately I have been sent some documents with the file suffix ".docx" and
> 
> They are ms-word 2007 documents. Unfortunately, I don't
> know of a Linux solution to open these. You can just use
> Word 2007 itself or I believe there is a conversion plugin

There is a tool called unoconv.  It will handle any file format that
OpenOffice can open or produce.  I think .docx is in the list of supported
formats.  It should be possible to convert these to plain text or html.

-- Chris

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: accessing .docx documents
     ` Chris Brannon
@      ` Chuck Hallenbeck
       ` Jason White
  1 sibling, 0 replies; 5+ messages in thread
From: Chuck Hallenbeck @  UTC (permalink / raw)
  To: Speakup is a screen review system for Linux.

On Tue, May 26, 2009 at 10:37:46AM -0500, Chris Brannon wrote:
> 
> There is a tool called unoconv.  It will handle any file format that
> OpenOffice can open or produce.  I think .docx is in the list of supported
> formats.  It should be possible to convert these to plain text or html.
> 
> -- Chris

Pacman couldn't find that package, but aursh did. Unfortunately it
seems to need a java runtime support package which is not installed
here. I might look into that.


Chuck



> _______________________________________________
> Speakup mailing list
> Speakup@braille.uwo.ca
> http://speech.braille.uwo.ca/mailman/listinfo/speakup

-- 
The Moon is Waxing Crescent (7% of Full)
                  My web site: www.hallenbeck.ftml.net
                      Microblog: http://identi.ca
                                --------
	If builders built buildings the way programmers wrote programs,
	then the first woodpecker that came along would destroy civilization.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: accessing .docx documents
     ` Chris Brannon
       ` Chuck Hallenbeck
@      ` Jason White
         ` Chuck Hallenbeck
  1 sibling, 1 reply; 5+ messages in thread
From: Jason White @  UTC (permalink / raw)
  To: speakup

Chris Brannon  <speakup@braille.uwo.ca> wrote:

>There is a tool called unoconv.  It will handle any file format that
>OpenOffice can open or produce.  I think .docx is in the list of supported
>formats.  It should be possible to convert these to plain text or html.

There is even a Debian package for it. I haven't installed or tested it,
however.

Package: unoconv
New: yes
State: not installed
Version: 0.3-5
Priority: extra
Section: text
Maintainer: Vincent Bernat <bernat@debian.org>
Uncompressed Size: 102k
Depends: python, python-uno
Recommends: openoffice.org-headless
Suggests: openoffice.org
Conflicts: odt2txt (<= 0.3-1)
Description: converter between OpenOffice.org document formats
 This package provides a commandline utility which can convert from any document
 format that OpenOffice can import to any document format it can export. It uses
 OpenOffice's UNO bindings for non-interactive conversion of documents. 
 
 Supported document formats include Open Document format, MS Word, MS Office
 Open/MS OOXML, PDF, HTML, XHTML, RTF, Docbook, and more.
Homepage: http://dag.wieers.com/home-made/unoconv/



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: accessing .docx documents
       ` Jason White
@        ` Chuck Hallenbeck
  0 siblings, 0 replies; 5+ messages in thread
From: Chuck Hallenbeck @  UTC (permalink / raw)
  To: Speakup is a screen review system for Linux.

Hi Jason,

On Fri, May 29, 2009 at 10:59:48AM +0000, Jason White wrote:
> 
> There is even a Debian package for it. I haven't installed or tested it,
> however.
> 
> Package: unoconv
> New: yes
> State: not installed
> Version: 0.3-5

Unoconv on archlinux does the job, but it was tricky to get running.
Although it is a command line tool, it does require that an X display
be available when it is run.  It does in fact convert .docx documents,
although that extension does not appear in the list of supported
formats. 

Cheryl Hodiak pointed me to a perl script which does just as well, if
not better, than unoconv with .docx files, and does not require any X
support. However, it is a single format converter, unlike unoconv. 
Scheryl's script was called cryptically docx2txt, and is available from
sourceforge.

Chuck





-- 
The Moon is Waxing Crescent (31% of Full)
                  My web site: www.hallenbeck.ftml.net
                      Microblog: http://identi.ca
                                --------
The reason that every major university maintains a department of
mathematics is that it's cheaper than institutionalizing all those people.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~ UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
 accessing .docx documents Chuck Hallenbeck
     [not found] ` <20090526150536.GA28352@csy.ca>
   ` Chris Brannon
     ` Chuck Hallenbeck
     ` Jason White
       ` Chuck Hallenbeck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).