wok view tesseract-ocr/description.txt @ rev 25632

Up gtklife (5.3)
author Pascal Bellard <pascal.bellard@slitaz.org>
date Sat Dec 23 14:06:29 2023 +0000 (11 months ago)
parents
children
line source
1 This package contains an OCR engine - libtesseract and
2 a command line program - tesseract.
4 Tesseract 4 adds a new neural net (LSTM) based OCR engine
5 which is focused on line recognition, but also still
6 supports the legacy Tesseract OCR engine of Tesseract 3
7 which works by recognizing character patterns.
8 Compatibility with Tesseract 3 is enabled by using the
9 Legacy OCR Engine mode (--oem 0).
10 It also needs traineddata files which support the legacy
11 engine, for example those from the tessdata repository.
13 The lead developer is Ray Smith. The maintainer is Zdenko
14 Podobny.
15 For a list of contributors see AUTHORS and GitHub's log
16 of contributors.
18 Tesseract has unicode (UTF-8) support, and can recognize
19 more than 100 languages "out of the box".
21 Tesseract supports various output formats: plain text,
22 hOCR (HTML), PDF, invisible-text-only PDF, TSV.
23 The main branch also has experimental support for ALTO
24 (XML) output.
26 You should note that in many cases, in order to get better
27 OCR results, you'll need to improve the quality of the
28 image you are giving Tesseract.