[TAG] (Fwd: [Apertium-stuff] script teseracc2+apertium)
Jimmy O'Regan
joregan at gmail.com
Sun Jun 28 18:24:52 MSD 2009
This is a 2-cent tip from the Apertium list. My Spanish is rubbish,
but the message is something like 'This is what worked for me. Maybe
it needs a few filters to replace characters'
---------- Forwarded message ----------
From: studesoteric spain <stuesoteric at gmail.com>
To: TAG <tag at lists.linuxgazette.net>
Date: 2009/6/28
Subject: [Apertium-stuff] script teseracc2+apertium
To: apertium-stuff at lists.sourceforge.net
Esto es lo que hice ....
y mas o menos quiz?s poner algunos filtros para reemplazar los caracteres .....
#!/bin/bash
echo "################## Conv. PS ###################"
gs -SDEVICE=tiffg4 -r800x800 -sPAPERSIZE=A4 -sOutputFile=$1_%04d.tif
-dNOPAUSE -dBATCH $1.pdf
echo "################### OCRing ####################"
i=1
for page in $(ls *.tif); do
??????? echo -n "Pagina: $i - "
??????? tesseract $page $page
??????? echo "------------Principio P?gina : $i ---" >> $1.txt
??????? cat $page.txt >> $1.txt
??????? echo "------------Final P?gina: $1 ---" >> $1.txt
??????? i=$(expr $i + 1)
done
echo "################ limpieza !!! ################"
pdfimages -j $1 $1_img
rm $1_*.tif
rm $1_*.tif.txt
echo "################ traducir !!! ################"
apertium en-es $1.txt $1_tra.odt
gedit $1_tra.odt
echo "############################### Terminado #########"
Gracias :)
------------------------------------------------------------------------------
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
More information about the TAG
mailing list