wok annotate tesseract-ocr/description.txt @ rev 25430

Up os-prober (1.81)
author Pascal Bellard <pascal.bellard@slitaz.org>
date Fri Aug 19 06:26:39 2022 +0000 (2022-08-19)
parents
children
rev   line source
Hans-G?nter@25351 1 This package contains an OCR engine - libtesseract and
Hans-G?nter@25351 2 a command line program - tesseract.
Hans-G?nter@25351 3
Hans-G?nter@25351 4 Tesseract 4 adds a new neural net (LSTM) based OCR engine
Hans-G?nter@25351 5 which is focused on line recognition, but also still
Hans-G?nter@25351 6 supports the legacy Tesseract OCR engine of Tesseract 3
Hans-G?nter@25351 7 which works by recognizing character patterns.
Hans-G?nter@25351 8 Compatibility with Tesseract 3 is enabled by using the
Hans-G?nter@25351 9 Legacy OCR Engine mode (--oem 0).
Hans-G?nter@25351 10 It also needs traineddata files which support the legacy
Hans-G?nter@25351 11 engine, for example those from the tessdata repository.
Hans-G?nter@25351 12
Hans-G?nter@25351 13 The lead developer is Ray Smith. The maintainer is Zdenko
Hans-G?nter@25351 14 Podobny.
Hans-G?nter@25351 15 For a list of contributors see AUTHORS and GitHub's log
Hans-G?nter@25351 16 of contributors.
Hans-G?nter@25351 17
Hans-G?nter@25351 18 Tesseract has unicode (UTF-8) support, and can recognize
Hans-G?nter@25351 19 more than 100 languages "out of the box".
Hans-G?nter@25351 20
Hans-G?nter@25351 21 Tesseract supports various output formats: plain text,
Hans-G?nter@25351 22 hOCR (HTML), PDF, invisible-text-only PDF, TSV.
Hans-G?nter@25351 23 The main branch also has experimental support for ALTO
Hans-G?nter@25351 24 (XML) output.
Hans-G?nter@25351 25
Hans-G?nter@25351 26 You should note that in many cases, in order to get better
Hans-G?nter@25351 27 OCR results, you'll need to improve the quality of the
Hans-G?nter@25351 28 image you are giving Tesseract.