Convert docx files

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • rjkantor
    Visitor
    • Feb 2014
    • 4

    Convert docx files

    I am trying to convert docx files to text prior to loading into BC4 under Ubuntu 13.10. On load, I am using this conversion command (below). I am unable to get the resulting filename.txt to load since it is not part of the conversion statement. How do I load the resultant converted txt file rather than the docx?

    /usr/bin/soffice --headless --convert-to txt:Text %s

    Thank you,
    Rob
  • Aaron
    Team Scooter
    • Oct 2007
    • 16000

    #2
    Hello,

    You would need to wrap the conversion into another, larger script which then takes the converted text file and places it into the %t variable. Where does your script currently put the target/converted .txt file? Can you then write that into a %t target?
    Aaron P Scooter Software

    Comment

    • rjkantor
      Visitor
      • Feb 2014
      • 4

      #3
      Aaron -

      /usr/bin/soffice --headless --convert-to txt:Text %s

      the txt in the command above tells the soffice to convert the named %s and change the extension to .txt. So the util manages the output file from the designation above.

      In general, how do I assign a value to the %t. Do you have an example?

      Rob

      Comment

      • Aaron
        Team Scooter
        • Oct 2007
        • 16000

        #4
        Hi Rob,

        I installed OpenOffice to play around with this. The --convert-to accepts an output directory, but not a target file. We would need a target file somehow.

        The surrounding script would look in your --convert-to --outdir /bcconvert/*.txt
        then assuming the directory is emptied otherwise:
        cat *.temp >> $2

        where $2 points back to %t as the script. script.bash %s %t from BC3's external conversion. My own Bash scripting skills are not quite up to par to write the whole conversion myself. If there was a way to make soffice pass to a specific filename instead of directory, we could use that more directly.
        Aaron P Scooter Software

        Comment

        • rjkantor
          Visitor
          • Feb 2014
          • 4

          #5
          new extension variable

          Is it possible to add a new variable which allows one to specify the new extension (txt) of the same filename?

          It seems I can echo out the resultant destination file, but I don't know how to assign it back to the %t which is not set until after my script would run. It is a chicken/egg issue. I can run my script but it needs to know the %t prior to execution.

          If you supported the destination extension you can then look for the source name with the new extension.

          Rob

          Comment

          • Aaron
            Team Scooter
            • Oct 2007
            • 16000

            #6
            Thanks for the suggestion. I'll add this to our wishlist and alternatively see if one of our developers can come up with a better wrapper script. If you are able to create one, please do let us know and we'll take a look at it.
            Aaron P Scooter Software

            Comment

            • dr_barnowl
              Expert
              • Apr 2008
              • 71

              #7
              Make the file move to %t

              We can't pass a new value of %t back to bcompare, so we have to make the output come to %t

              Code:
              #!/bin/sh
              #
              # docx-to-txt
              #
              # Converts MS Office Open XML (MOO-XML) DOCX to txt using soffice
              #
              # Moves output from a temporary batch folder to target
              #
              # Adrian Wilkins 2014 : licensed free for any use
              #
              # params 
              #
              # $1 = source file
              # $2 = target file
              
              SOURCE=$1
              TARGET=$2
              
              # Make some folders
              OUTPUTFOLDER=$(mktemp -d)
              TEMPHOME=$(mktemp -d)
              
              # Set a new HOME variable for this process or soffice will refuse to start another instance
              # This means this fails silently if you have a document window open
              HOME=$TEMPHOME
              
              # Convert
              soffice --headless --convert-to txt:Text --outdir "$OUTPUTFOLDER" "$SOURCE" 
              
              # Rename converted file to supplied target location
              OUTFILE=$(ls $OUTPUTFOLDER)
              mv "$OUTPUTFOLDER/$OUTFILE" "$TARGET"
              
              rm -rf $TEMPHOME $OUTPUTFOLDER

              Comment

              • Aaron
                Team Scooter
                • Oct 2007
                • 16000

                #8
                Thanks for the script! We'll take a look at this and see if we can incorporate it or something similar onto our Formats page.
                Aaron P Scooter Software

                Comment

                • rjkantor
                  Visitor
                  • Feb 2014
                  • 4

                  #9
                  Thanks this worked well.

                  Comment

                  Working...