wok annotate tesseract-ocr/description.txt @ rev 25430
Up os-prober (1.81)
author | Pascal Bellard <pascal.bellard@slitaz.org> |
---|---|
date | Fri Aug 19 06:26:39 2022 +0000 (2022-08-19) |
parents | |
children |
rev | line source |
---|---|
Hans-G?nter@25351 | 1 This package contains an OCR engine - libtesseract and |
Hans-G?nter@25351 | 2 a command line program - tesseract. |
Hans-G?nter@25351 | 3 |
Hans-G?nter@25351 | 4 Tesseract 4 adds a new neural net (LSTM) based OCR engine |
Hans-G?nter@25351 | 5 which is focused on line recognition, but also still |
Hans-G?nter@25351 | 6 supports the legacy Tesseract OCR engine of Tesseract 3 |
Hans-G?nter@25351 | 7 which works by recognizing character patterns. |
Hans-G?nter@25351 | 8 Compatibility with Tesseract 3 is enabled by using the |
Hans-G?nter@25351 | 9 Legacy OCR Engine mode (--oem 0). |
Hans-G?nter@25351 | 10 It also needs traineddata files which support the legacy |
Hans-G?nter@25351 | 11 engine, for example those from the tessdata repository. |
Hans-G?nter@25351 | 12 |
Hans-G?nter@25351 | 13 The lead developer is Ray Smith. The maintainer is Zdenko |
Hans-G?nter@25351 | 14 Podobny. |
Hans-G?nter@25351 | 15 For a list of contributors see AUTHORS and GitHub's log |
Hans-G?nter@25351 | 16 of contributors. |
Hans-G?nter@25351 | 17 |
Hans-G?nter@25351 | 18 Tesseract has unicode (UTF-8) support, and can recognize |
Hans-G?nter@25351 | 19 more than 100 languages "out of the box". |
Hans-G?nter@25351 | 20 |
Hans-G?nter@25351 | 21 Tesseract supports various output formats: plain text, |
Hans-G?nter@25351 | 22 hOCR (HTML), PDF, invisible-text-only PDF, TSV. |
Hans-G?nter@25351 | 23 The main branch also has experimental support for ALTO |
Hans-G?nter@25351 | 24 (XML) output. |
Hans-G?nter@25351 | 25 |
Hans-G?nter@25351 | 26 You should note that in many cases, in order to get better |
Hans-G?nter@25351 | 27 OCR results, you'll need to improve the quality of the |
Hans-G?nter@25351 | 28 image you are giving Tesseract. |