wok diff tesseract-ocr/description.txt @ rev 25556

created recipes for nted and nted-lang
author Hans-G?nter Theisgen
date Sat Apr 22 14:54:15 2023 +0100 (19 months ago)
parents
children
line diff
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/tesseract-ocr/description.txt	Sat Apr 22 14:54:15 2023 +0100
     1.3 @@ -0,0 +1,28 @@
     1.4 +This package contains an OCR engine - libtesseract and
     1.5 +a command line program - tesseract.
     1.6 +
     1.7 +Tesseract 4 adds a new neural net (LSTM) based OCR engine
     1.8 +which is focused on line recognition, but also still
     1.9 +supports the legacy Tesseract OCR engine of Tesseract 3
    1.10 +which works by recognizing character patterns.
    1.11 +Compatibility with Tesseract 3 is enabled by using the
    1.12 +Legacy OCR Engine mode (--oem 0).
    1.13 +It also needs traineddata files which support the legacy
    1.14 +engine, for example those from the tessdata repository.
    1.15 +
    1.16 +The lead developer is Ray Smith. The maintainer is Zdenko
    1.17 +Podobny.
    1.18 +For a list of contributors see AUTHORS and GitHub's log
    1.19 +of contributors.
    1.20 +
    1.21 +Tesseract has unicode (UTF-8) support, and can recognize
    1.22 +more than 100 languages "out of the box".
    1.23 +
    1.24 +Tesseract supports various output formats: plain text,
    1.25 +hOCR (HTML), PDF, invisible-text-only PDF, TSV.
    1.26 +The main branch also has experimental support for ALTO
    1.27 +(XML) output.
    1.28 +
    1.29 +You should note that in many cases, in order to get better
    1.30 +OCR results, you'll need to improve the quality of the
    1.31 +image you are giving Tesseract.