wok diff tramys/stuff/README @ rev 25062

Up foomatic-db-nonfree (20210824)
author Pascal Bellard <pascal.bellard@slitaz.org>
date Tue Jun 07 10:29:31 2022 +0000 (2022-06-07)
parents
children
line diff
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/tramys/stuff/README	Tue Jun 07 10:29:31 2022 +0000
     1.3 @@ -0,0 +1,306 @@
     1.4 +tramys - TRAnslate MY Slitaz
     1.5 +Tool for managing translation files for SliTaz GNU/Linux
     1.6 +
     1.7 +Aleksei Bobylev <al.bobylev@gmail.com>, 2014
     1.8 +
     1.9 +
    1.10 +Some random notes about tramys development.
    1.11 +
    1.12 +
    1.13 +The idea
    1.14 +========
    1.15 +
    1.16 +I like to use applications translated to my language. But in other hand I like
    1.17 +that fact SliTaz is not overloaded with unnecessary files.
    1.18 +
    1.19 +Some packages have their “twins”, which contains only translations. But note
    1.20 +that all translations for the GIMP takes about 30 MB! And for Wesnoth ~ 90 MB!!!
    1.21 +I don't need ALL that translations. Really. I need only one.
    1.22 +
    1.23 +Now we have some language-packs in the SliTaz repository. These packs contains
    1.24 +translations for several packages for several chosen locales. Not bad, but
    1.25 +what to do if I need translations for other packages not listed here?
    1.26 +
    1.27 +We set up the ftp.slitaz.org. Then I thought it was a good idea to seek and take
    1.28 +the files with translations you want. And what about automation?
    1.29 +
    1.30 +Translations we'll found in the “install” sub-folders. Something like that:
    1.31 +ftp://cook.slitaz.org/nano/install/usr/share/locale/ru/LC_MESSAGES/nano.mo
    1.32 +for “nano” package with “ru” locale.
    1.33 +
    1.34 +Also we can download the same file using possibility in the cooker:
    1.35 +http://cook.slitaz.org/cooker.cgi?download=../wok/nano/install/usr/share/locale/
    1.36 +ru/LC_MESSAGES/nano.mo
    1.37 +
    1.38 +
    1.39 +About gettext's locale search order
    1.40 +===================================
    1.41 +
    1.42 +I know it's easy as for a first glance. My locale is “ru_UA” (I speak Russian
    1.43 +and live in Ukraine). Gettext searches for “ru_UA” translations first. When it
    1.44 +not finds them, he throws the country from the locale and looking for a “ru”
    1.45 +translations.
    1.46 +
    1.47 +And I know that full locale definition can contain encoding and variant as
    1.48 +addition:
    1.49 +  language_COUNTRY.ENCODING@variant
    1.50 +And all parts except language are optional. Can't figure it out reading official
    1.51 +docs [[ https://www.gnu.org/software/gettext/manual/gettext.html ]]
    1.52 +So, experimenting with next piece of code:
    1.53 +
    1.54 +VAR0=$(LC_ALL="LL_CC.EE@VV" strace -o /tmp/s -e trace=file gettext test test; \
    1.55 +grep -F '/usr/lib/locale/' /tmp/s | \
    1.56 +grep -v locale-archive | \
    1.57 +grep -F '/LC_IDENTIFICATION' | \
    1.58 +sed 's|.*/usr/lib/locale/\([^/]*\).*|\1|g' | \
    1.59 +tr '\n' ' ')
    1.60 +echo "${VAR0#test}"
    1.61 +
    1.62 +Here using special non-existing value for LC_ALL variable with aim to see all
    1.63 +search variants (gettext stops to search if it found one of variants).
    1.64 +Try it with different LC_ALL such as "LL@VV" or "LL.EE", etc.
    1.65 +
    1.66 +
    1.67 +About preferred languages
    1.68 +=========================
    1.69 +
    1.70 +It is a good possibility I found in the Gettext docs, but not applied yet.
    1.71 +When Gettext not found any matched translations, it shows untranslated
    1.72 +(=English) messages. Some of us not learned English in the school; it can be
    1.73 +German of French. So, I want to see translations to my native language, or
    1.74 +(if they not exists), let's say, French translations. All it done with setting
    1.75 +of LANGUAGE environment variable.
    1.76 +
    1.77 +
    1.78 +About LC_ALL
    1.79 +============
    1.80 +
    1.81 +It is not correct to set this variable up in some cases. And if we want to more
    1.82 +mimic Gettext's behavior, we need to check many locale environment variables in
    1.83 +certain order.
    1.84 +
    1.85 +
    1.86 +About traffic saving
    1.87 +====================
    1.88 +
    1.89 +Now tramys downloading all the localization files, again and again.
    1.90 +It have no knowledge, is your existing localization file actual or outdated,
    1.91 +it just re-download it.
    1.92 +
    1.93 +Actually at the moment we have no simple solution. All solutions touches both
    1.94 +client- and server-side.
    1.95 +
    1.96 +1. Using GNU wget
    1.97 +-----------------
    1.98 +
    1.99 +It have '-N' option:
   1.100 +
   1.101 +  -N,  --timestamping            don't re-retrieve files unless newer than
   1.102 +                                 local.
   1.103 +
   1.104 +But: a) default SliTaz uses BusyBox's wget that have no '-N' option, and
   1.105 +b) SliTaz HTTP server not returned info about file's timestamp:
   1.106 +
   1.107 +$ curl -I 'http://cook.slitaz.org/cooker.cgi?download=../wok/nano/install/usr/sh
   1.108 +are/locale/ru/LC_MESSAGES/nano.mo'
   1.109 +HTTP/1.1 200 OK
   1.110 +Content-Type: application/octet-stream
   1.111 +Content-Length: 55436
   1.112 +Content-Disposition: attachment; filename=nano.mo
   1.113 +Date: Wed, 06 Aug 2014 20:53:37 GMT
   1.114 +Server: lighttpd (SliTaz GNU/Linux)
   1.115 +
   1.116 +Our FTP server returned "Last-Modified", but both wgets not working with it :(
   1.117 +
   1.118 +$ curl -I 'ftp://cook.slitaz.org/nano/install/usr/share/locale/ru/LC_MESSAGES/na
   1.119 +no.mo'
   1.120 +Last-Modified: Thu, 10 Apr 2014 20:34:37 GMT
   1.121 +Content-Length: 55436
   1.122 +Accept-ranges: bytes
   1.123 +
   1.124 +$ busybox wget -O /tmp/nano1.mo 'ftp://cook.slitaz.org/nano/install/usr/share/lo
   1.125 +cale/ru/LC_MESSAGES/nano.mo'
   1.126 +Connecting to cook.slitaz.org (37.187.4.13:21)
   1.127 +nano1.mo             100% |*******************************| 55436   0:00:00 ETA
   1.128 +
   1.129 +$ ls -l /tmp/nano1.mo
   1.130 +-rw-r--r-- 1 tux users 55436 Aug  6 22:01 /tmp/nano1.mo
   1.131 +
   1.132 +$ wget -O /tmp/nano2.mo 'ftp://cook.slitaz.org/nano/install/usr/share/locale/ru/
   1.133 +LC_MESSAGES/nano.mo'
   1.134 +
   1.135 +$ ls -l /tmp/nano2.mo
   1.136 +-rw-r--r-- 1 tux users 55436 Aug  6 22:03 /tmp/nano2.mo
   1.137 +
   1.138 +
   1.139 +2. Write new client-server infrastructure
   1.140 +-----------------------------------------
   1.141 +
   1.142 +We can write script instead of using two-byte solution (-N) :D
   1.143 +Using BusyBox's wget on client.
   1.144 +Script logic is followed. Client sends request to server: filename and date.
   1.145 +Server returns newer file or returns nothing... It need to establish only one
   1.146 +connection per file.
   1.147 +
   1.148 +
   1.149 +3. Using curl
   1.150 +-------------
   1.151 +
   1.152 +What about curl? Yes, it works:
   1.153 +
   1.154 +$ curl -R -o /tmp/nano.mo 'ftp://cook.slitaz.org/nano/install/usr/share/locale/r
   1.155 +u/LC_MESSAGES/nano.mo'
   1.156 +  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
   1.157 +                                 Dload  Upload   Total   Spent    Left  Speed
   1.158 +100 55436  100 55436    0     0  30898      0  0:00:01  0:00:01 --:--:-- 32361
   1.159 +
   1.160 +$ ls -l /tmp/nano.mo
   1.161 +-rw-r--r-- 1 tux users 55436 Apr 10 20:34 /tmp/nano.mo
   1.162 +
   1.163 +Also, curl can ask server for gzipped content for traffic saving, and
   1.164 +transparently ungzip it for you:
   1.165 +
   1.166 +$ curl -h
   1.167 +     --compressed    Request compressed response (using deflate or gzip)
   1.168 +
   1.169 +And wget can send any specified header to server:
   1.170 +
   1.171 +$ wget -h
   1.172 +       --header=STRING         insert STRING among the headers.
   1.173 +
   1.174 +
   1.175 +Small note about date
   1.176 +=====================
   1.177 +
   1.178 +Do you remember server answer?
   1.179 +
   1.180 +Last-Modified: Thu, 10 Apr 2014 20:34:37 GMT
   1.181 +
   1.182 +We can get date of file in this format using next code:
   1.183 +
   1.184 +LC_ALL=C; date -Rur ./nano.mo
   1.185 +Thu, 10 Apr 2014 20:34:37 UTC
   1.186 +
   1.187 +Only need to remove both “GMT” and “UTC” and now we can compare two strings
   1.188 +that contains date:
   1.189 +if [ "${SERVER_DATE% GMT}" != "${LOCAL_DATE% UTC}" ]; then ...
   1.190 +
   1.191 +
   1.192 +About lists format
   1.193 +==================
   1.194 +
   1.195 +Here three formats of localization: GNU gettext's mo-files, Qt's qm-files,
   1.196 +and other techniques (not supported yet).
   1.197 +
   1.198 +Gettext is more standardized: most often translation file called as package
   1.199 +name, and it uses hierarchical tree structure in the
   1.200 +/usr/share/locale/<locale name>/LC_MESSAGES
   1.201 +Most often, but not always.
   1.202 +
   1.203 +In other hand, Qt frequently uses one directory for all package's translations.
   1.204 +Something like /usr/share/<package>/translations/<package>_<locale>.qm
   1.205 +Not always too.
   1.206 +
   1.207 +We can save all filenames with full path into one archive like it done in the
   1.208 +tazpkg file /var/lib/tazpkg/files.list.lzma and will get 1.4 MiB file (48 KiB in
   1.209 +LZMA). But I prefer lists with special format, I think plain list contains too
   1.210 +much redundant info, and in some cases its too hard to determine which is the
   1.211 +locale part in the filename.
   1.212 +
   1.213 +So, let me describe lists format.
   1.214 +Here are one or more lines for package that supports localization. In the each
   1.215 +line here are four tab-delimited fields. First two are mandatory, and next two
   1.216 +are optional.
   1.217 +
   1.218 +1: package name
   1.219 +2: locale name
   1.220 +3: name of file that contains translations
   1.221 +4: full path to that file
   1.222 +
   1.223 +For “nano” package (30 lines):
   1.224 +nano	bg	nano.mo	/usr/share/locale/bg/LC_MESSAGES
   1.225 +...
   1.226 +nano	zh_TW	nano.mo	/usr/share/locale/zh_TW/LC_MESSAGES
   1.227 +
   1.228 +Now let's use some rules to make list smaller.
   1.229 +
   1.230 +RULE. Use “%” as placeholder for locale name in the path:
   1.231 +      /usr/share/locale/%/LC_MESSAGES
   1.232 +RULE. Combine several locales into one space-separated list:
   1.233 +      bg ca cs da de es eu fi fr ga gl hu id it ms nb nl nn pl pt_BR ro ru rw sr
   1.234 +      sv tr uk vi zh_CN zh_TW
   1.235 +RULE. Remove “.mo” from the end of filenames:
   1.236 +      nano
   1.237 +RULE. Remove filename completely if it equals to package name.
   1.238 +RULE. Remove default path “/usr/share/locale/%/LC_MESSAGES” completely.
   1.239 +RULE. We can avoid empty 3rd and/or 4th fields:
   1.240 +      empty 3:  field1 tab field2 tab tab field4
   1.241 +      empty 4:  field1 tab field2 tab field3
   1.242 +      empty3&4: field1 tab field2
   1.243 +
   1.244 +So, now rule for the “nano” package is very simple (one line):
   1.245 +nano	bg ca cs da de es eu fi fr ga gl hu id it ms nb nl nn pl pt_BR ro ru rw 
   1.246 +sr sv tr uk vi zh_CN zh_TW
   1.247 +
   1.248 +And few more rules to compress list more.
   1.249 +RULE. Combine several mo-files into one space-separated field if they have
   1.250 +      identical list of locales.
   1.251 +      Package “gtk+” contains “gtk20 gtk20-properties” in the third field.
   1.252 +      Also we can combine few paths into one space separated list.
   1.253 +RULE. Use shell-syntax constants to save few bytes more:
   1.254 +      US="/usr/share"
   1.255 +      LC="LC_MESSAGES"
   1.256 +      PY="/usr/lib/python2.7/site-packages"
   1.257 +      R="/usr/lib/R/library"
   1.258 +      RT="$R/translations/%/$LC"
   1.259 +
   1.260 +In some situations we have choice:
   1.261 +
   1.262 +lcdnurse	es fr he nl pt_BR ru th tr zh_CN		$US/$P/locale/%/$LC
   1.263 +lcdnurse	es fr nl pt_BR ru tr zh_CN	wxstd	$US/$P/locale/%/$LC
   1.264 +
   1.265 +or:
   1.266 +
   1.267 +lcdnurse	he th		$US/$P/locale/%/$LC
   1.268 +lcdnurse	es fr nl pt_BR ru tr zh_CN	$P wxstd	$US/$P/locale/%/$LC
   1.269 +
   1.270 +Both variants works, and no one is mistaken. Also, second variant is shorter by
   1.271 +24 bytes :)
   1.272 +
   1.273 +
   1.274 +Lists: to be or not...
   1.275 +======================
   1.276 +
   1.277 +While I developed tramys my lists slowly moves to be more and more outdated.
   1.278 +Many new packages in the wok, many upgrades... It seems like tramys lists
   1.279 +needs to released very frequently. And I can't write all the sophisticated
   1.280 +rules to automate process.
   1.281 +
   1.282 +It sounds not bad if we'll attach localization info to the package! Like
   1.283 +description file? No. The more files our filesystem contains — the slower it is.
   1.284 +Better to attach it to the package receipt. In this case we not need first
   1.285 +field. Something like:
   1.286 +
   1.287 +L10N="he th		$US/$P/locale/%/$LC
   1.288 +es fr nl pt_BR ru tr zh_CN	$P wxstd	$US/$P/locale/%/$LC"
   1.289 +
   1.290 +Off topic. I think it's better to place description to the package too:
   1.291 +description()
   1.292 +{
   1.293 +	cat << EOT
   1.294 +Description here
   1.295 +EOT
   1.296 +}
   1.297 +
   1.298 +And for compatibility: read info both from receipt (if any) and from lists.
   1.299 +
   1.300 +
   1.301 +TODO
   1.302 +====
   1.303 +
   1.304 +- Remove all translation files from all existing packages.
   1.305 +- Migrate lists to receipts.
   1.306 +- To support preferred languages in the LANGUAGE variable.
   1.307 +- Write server-side script to get only changed/newer translation files.
   1.308 +- Add tazpkg hook to get translations after package install (if user wants).
   1.309 +- ...