wok annotate tramys/stuff/README @ rev 17647

Up: binutils 2.25
author Alexander Medvedev <devl547@gmail.com>
date Sat Feb 14 22:11:20 2015 +0000 (2015-02-14)
parents
children
rev   line source
al@17040 1 tramys - TRAnslate MY Slitaz
al@17040 2 Tool for managing translation files for SliTaz GNU/Linux
al@17040 3
al@17040 4 Aleksei Bobylev <al.bobylev@gmail.com>, 2014
al@17040 5
al@17040 6
al@17040 7 Some random notes about tramys development.
al@17040 8
al@17040 9
al@17040 10 The idea
al@17040 11 ========
al@17040 12
al@17040 13 I like to use applications translated to my language. But in other hand I like
al@17040 14 that fact SliTaz is not overloaded with unnecessary files.
al@17040 15
al@17040 16 Some packages have their “twins”, which contains only translations. But note
al@17040 17 that all translations for the GIMP takes about 30 MB! And for Wesnoth ~ 90 MB!!!
al@17040 18 I don't need ALL that translations. Really. I need only one.
al@17040 19
al@17040 20 Now we have some language-packs in the SliTaz repository. These packs contains
al@17040 21 translations for several packages for several chosen locales. Not bad, but
al@17040 22 what to do if I need translations for other packages not listed here?
al@17040 23
al@17040 24 We set up the ftp.slitaz.org. Then I thought it was a good idea to seek and take
al@17040 25 the files with translations you want. And what about automation?
al@17040 26
al@17040 27 Translations we'll found in the “install” sub-folders. Something like that:
al@17040 28 ftp://cook.slitaz.org/nano/install/usr/share/locale/ru/LC_MESSAGES/nano.mo
al@17040 29 for “nano” package with “ru” locale.
al@17040 30
al@17040 31 Also we can download the same file using possibility in the cooker:
al@17040 32 http://cook.slitaz.org/cooker.cgi?download=../wok/nano/install/usr/share/locale/
al@17040 33 ru/LC_MESSAGES/nano.mo
al@17040 34
al@17040 35
al@17040 36 About gettext's locale search order
al@17040 37 ===================================
al@17040 38
al@17040 39 I know it's easy as for a first glance. My locale is “ru_UA” (I speak Russian
al@17040 40 and live in Ukraine). Gettext searches for “ru_UA” translations first. When it
al@17040 41 not finds them, he throws the country from the locale and looking for a “ru”
al@17040 42 translations.
al@17040 43
al@17040 44 And I know that full locale definition can contain encoding and variant as
al@17040 45 addition:
al@17040 46 language_COUNTRY.ENCODING@variant
al@17040 47 And all parts except language are optional. Can't figure it out reading official
al@17040 48 docs [[ https://www.gnu.org/software/gettext/manual/gettext.html ]]
al@17040 49 So, experimenting with next piece of code:
al@17040 50
al@17040 51 VAR0=$(LC_ALL="LL_CC.EE@VV" strace -o /tmp/s -e trace=file gettext test test; \
al@17040 52 grep -F '/usr/lib/locale/' /tmp/s | \
al@17040 53 grep -v locale-archive | \
al@17040 54 grep -F '/LC_IDENTIFICATION' | \
al@17040 55 sed 's|.*/usr/lib/locale/\([^/]*\).*|\1|g' | \
al@17040 56 tr '\n' ' ')
al@17040 57 echo "${VAR0#test}"
al@17040 58
al@17040 59 Here using special non-existing value for LC_ALL variable with aim to see all
al@17040 60 search variants (gettext stops to search if it found one of variants).
al@17040 61 Try it with different LC_ALL such as "LL@VV" or "LL.EE", etc.
al@17040 62
al@17040 63
al@17040 64 About preferred languages
al@17040 65 =========================
al@17040 66
al@17040 67 It is a good possibility I found in the Gettext docs, but not applied yet.
al@17040 68 When Gettext not found any matched translations, it shows untranslated
al@17040 69 (=English) messages. Some of us not learned English in the school; it can be
al@17040 70 German of French. So, I want to see translations to my native language, or
al@17040 71 (if they not exists), let's say, French translations. All it done with setting
al@17040 72 of LANGUAGE environment variable.
al@17040 73
al@17040 74
al@17040 75 About LC_ALL
al@17040 76 ============
al@17040 77
al@17040 78 It is not correct to set this variable up in some cases. And if we want to more
al@17040 79 mimic Gettext's behavior, we need to check many locale environment variables in
al@17040 80 certain order.
al@17040 81
al@17040 82
al@17040 83 About traffic saving
al@17040 84 ====================
al@17040 85
al@17040 86 Now tramys downloading all the localization files, again and again.
al@17040 87 It have no knowledge, is your existing localization file actual or outdated,
al@17040 88 it just re-download it.
al@17040 89
al@17040 90 Actually at the moment we have no simple solution. All solutions touches both
al@17040 91 client- and server-side.
al@17040 92
al@17040 93 1. Using GNU wget
al@17040 94 -----------------
al@17040 95
al@17040 96 It have '-N' option:
al@17040 97
al@17040 98 -N, --timestamping don't re-retrieve files unless newer than
al@17040 99 local.
al@17040 100
al@17040 101 But: a) default SliTaz uses BusyBox's wget that have no '-N' option, and
al@17040 102 b) SliTaz HTTP server not returned info about file's timestamp:
al@17040 103
al@17040 104 $ curl -I 'http://cook.slitaz.org/cooker.cgi?download=../wok/nano/install/usr/sh
al@17040 105 are/locale/ru/LC_MESSAGES/nano.mo'
al@17040 106 HTTP/1.1 200 OK
al@17040 107 Content-Type: application/octet-stream
al@17040 108 Content-Length: 55436
al@17040 109 Content-Disposition: attachment; filename=nano.mo
al@17040 110 Date: Wed, 06 Aug 2014 20:53:37 GMT
al@17040 111 Server: lighttpd (SliTaz GNU/Linux)
al@17040 112
al@17040 113 Our FTP server returned "Last-Modified", but both wgets not working with it :(
al@17040 114
al@17040 115 $ curl -I 'ftp://cook.slitaz.org/nano/install/usr/share/locale/ru/LC_MESSAGES/na
al@17040 116 no.mo'
al@17040 117 Last-Modified: Thu, 10 Apr 2014 20:34:37 GMT
al@17040 118 Content-Length: 55436
al@17040 119 Accept-ranges: bytes
al@17040 120
al@17040 121 $ busybox wget -O /tmp/nano1.mo 'ftp://cook.slitaz.org/nano/install/usr/share/lo
al@17040 122 cale/ru/LC_MESSAGES/nano.mo'
al@17040 123 Connecting to cook.slitaz.org (37.187.4.13:21)
al@17040 124 nano1.mo 100% |*******************************| 55436 0:00:00 ETA
al@17040 125
al@17040 126 $ ls -l /tmp/nano1.mo
al@17040 127 -rw-r--r-- 1 tux users 55436 Aug 6 22:01 /tmp/nano1.mo
al@17040 128
al@17040 129 $ wget -O /tmp/nano2.mo 'ftp://cook.slitaz.org/nano/install/usr/share/locale/ru/
al@17040 130 LC_MESSAGES/nano.mo'
al@17040 131
al@17040 132 $ ls -l /tmp/nano2.mo
al@17040 133 -rw-r--r-- 1 tux users 55436 Aug 6 22:03 /tmp/nano2.mo
al@17040 134
al@17040 135
al@17040 136 2. Write new client-server infrastructure
al@17040 137 -----------------------------------------
al@17040 138
al@17040 139 We can write script instead of using two-byte solution (-N) :D
al@17040 140 Using BusyBox's wget on client.
al@17040 141 Script logic is followed. Client sends request to server: filename and date.
al@17040 142 Server returns newer file or returns nothing... It need to establish only one
al@17040 143 connection per file.
al@17040 144
al@17040 145
al@17040 146 3. Using curl
al@17040 147 -------------
al@17040 148
al@17040 149 What about curl? Yes, it works:
al@17040 150
al@17040 151 $ curl -R -o /tmp/nano.mo 'ftp://cook.slitaz.org/nano/install/usr/share/locale/r
al@17040 152 u/LC_MESSAGES/nano.mo'
al@17040 153 % Total % Received % Xferd Average Speed Time Time Time Current
al@17040 154 Dload Upload Total Spent Left Speed
al@17040 155 100 55436 100 55436 0 0 30898 0 0:00:01 0:00:01 --:--:-- 32361
al@17040 156
al@17040 157 $ ls -l /tmp/nano.mo
al@17040 158 -rw-r--r-- 1 tux users 55436 Apr 10 20:34 /tmp/nano.mo
al@17040 159
al@17040 160 Also, curl can ask server for gzipped content for traffic saving, and
al@17040 161 transparently ungzip it for you:
al@17040 162
al@17040 163 $ curl -h
al@17040 164 --compressed Request compressed response (using deflate or gzip)
al@17040 165
al@17040 166 And wget can send any specified header to server:
al@17040 167
al@17040 168 $ wget -h
al@17040 169 --header=STRING insert STRING among the headers.
al@17040 170
al@17040 171
al@17040 172 Small note about date
al@17040 173 =====================
al@17040 174
al@17040 175 Do you remember server answer?
al@17040 176
al@17040 177 Last-Modified: Thu, 10 Apr 2014 20:34:37 GMT
al@17040 178
al@17040 179 We can get date of file in this format using next code:
al@17040 180
al@17040 181 LC_ALL=C; date -Rur ./nano.mo
al@17040 182 Thu, 10 Apr 2014 20:34:37 UTC
al@17040 183
al@17040 184 Only need to remove both “GMT” and “UTC” and now we can compare two strings
al@17040 185 that contains date:
al@17040 186 if [ "${SERVER_DATE% GMT}" != "${LOCAL_DATE% UTC}" ]; then ...
al@17040 187
al@17040 188
al@17040 189 About lists format
al@17040 190 ==================
al@17040 191
al@17040 192 Here three formats of localization: GNU gettext's mo-files, Qt's qm-files,
al@17040 193 and other techniques (not supported yet).
al@17040 194
al@17040 195 Gettext is more standardized: most often translation file called as package
al@17040 196 name, and it uses hierarchical tree structure in the
al@17040 197 /usr/share/locale/<locale name>/LC_MESSAGES
al@17040 198 Most often, but not always.
al@17040 199
al@17040 200 In other hand, Qt frequently uses one directory for all package's translations.
al@17040 201 Something like /usr/share/<package>/translations/<package>_<locale>.qm
al@17040 202 Not always too.
al@17040 203
al@17040 204 We can save all filenames with full path into one archive like it done in the
al@17040 205 tazpkg file /var/lib/tazpkg/files.list.lzma and will get 1.4 MiB file (48 KiB in
al@17040 206 LZMA). But I prefer lists with special format, I think plain list contains too
al@17040 207 much redundant info, and in some cases its too hard to determine which is the
al@17040 208 locale part in the filename.
al@17040 209
al@17040 210 So, let me describe lists format.
al@17040 211 Here are one or more lines for package that supports localization. In the each
al@17040 212 line here are four tab-delimited fields. First two are mandatory, and next two
al@17040 213 are optional.
al@17040 214
al@17040 215 1: package name
al@17040 216 2: locale name
al@17040 217 3: name of file that contains translations
al@17040 218 4: full path to that file
al@17040 219
al@17040 220 For “nano” package (30 lines):
al@17040 221 nano bg nano.mo /usr/share/locale/bg/LC_MESSAGES
al@17040 222 ...
al@17040 223 nano zh_TW nano.mo /usr/share/locale/zh_TW/LC_MESSAGES
al@17040 224
al@17040 225 Now let's use some rules to make list smaller.
al@17040 226
al@17040 227 RULE. Use “%” as placeholder for locale name in the path:
al@17040 228 /usr/share/locale/%/LC_MESSAGES
al@17040 229 RULE. Combine several locales into one space-separated list:
al@17040 230 bg ca cs da de es eu fi fr ga gl hu id it ms nb nl nn pl pt_BR ro ru rw sr
al@17040 231 sv tr uk vi zh_CN zh_TW
al@17040 232 RULE. Remove “.mo” from the end of filenames:
al@17040 233 nano
al@17040 234 RULE. Remove filename completely if it equals to package name.
al@17040 235 RULE. Remove default path “/usr/share/locale/%/LC_MESSAGES” completely.
al@17040 236 RULE. We can avoid empty 3rd and/or 4th fields:
al@17040 237 empty 3: field1 tab field2 tab tab field4
al@17040 238 empty 4: field1 tab field2 tab field3
al@17040 239 empty3&4: field1 tab field2
al@17040 240
al@17040 241 So, now rule for the “nano” package is very simple (one line):
al@17040 242 nano bg ca cs da de es eu fi fr ga gl hu id it ms nb nl nn pl pt_BR ro ru rw
al@17040 243 sr sv tr uk vi zh_CN zh_TW
al@17040 244
al@17040 245 And few more rules to compress list more.
al@17040 246 RULE. Combine several mo-files into one space-separated field if they have
al@17040 247 identical list of locales.
al@17040 248 Package “gtk+” contains “gtk20 gtk20-properties” in the third field.
al@17040 249 Also we can combine few paths into one space separated list.
al@17040 250 RULE. Use shell-syntax constants to save few bytes more:
al@17040 251 US="/usr/share"
al@17040 252 LC="LC_MESSAGES"
al@17040 253 PY="/usr/lib/python2.7/site-packages"
al@17040 254 R="/usr/lib/R/library"
al@17040 255 RT="$R/translations/%/$LC"
al@17040 256
al@17040 257 In some situations we have choice:
al@17040 258
al@17040 259 lcdnurse es fr he nl pt_BR ru th tr zh_CN $US/$P/locale/%/$LC
al@17040 260 lcdnurse es fr nl pt_BR ru tr zh_CN wxstd $US/$P/locale/%/$LC
al@17040 261
al@17040 262 or:
al@17040 263
al@17040 264 lcdnurse he th $US/$P/locale/%/$LC
al@17040 265 lcdnurse es fr nl pt_BR ru tr zh_CN $P wxstd $US/$P/locale/%/$LC
al@17040 266
al@17040 267 Both variants works, and no one is mistaken. Also, second variant is shorter by
al@17040 268 24 bytes :)
al@17040 269
al@17040 270
al@17040 271 Lists: to be or not...
al@17040 272 ======================
al@17040 273
al@17040 274 While I developed tramys my lists slowly moves to be more and more outdated.
al@17040 275 Many new packages in the wok, many upgrades... It seems like tramys lists
al@17040 276 needs to released very frequently. And I can't write all the sophisticated
al@17040 277 rules to automate process.
al@17040 278
al@17040 279 It sounds not bad if we'll attach localization info to the package! Like
al@17040 280 description file? No. The more files our filesystem contains — the slower it is.
al@17040 281 Better to attach it to the package receipt. In this case we not need first
al@17040 282 field. Something like:
al@17040 283
al@17040 284 L10N="he th $US/$P/locale/%/$LC
al@17040 285 es fr nl pt_BR ru tr zh_CN $P wxstd $US/$P/locale/%/$LC"
al@17040 286
al@17040 287 Off topic. I think it's better to place description to the package too:
al@17040 288 description()
al@17040 289 {
al@17040 290 cat << EOT
al@17040 291 Description here
al@17040 292 EOT
al@17040 293 }
al@17040 294
al@17040 295 And for compatibility: read info both from receipt (if any) and from lists.
al@17040 296
al@17040 297
al@17040 298 TODO
al@17040 299 ====
al@17040 300
al@17040 301 - Remove all translation files from all existing packages.
al@17040 302 - Migrate lists to receipts.
al@17040 303 - To support preferred languages in the LANGUAGE variable.
al@17040 304 - Write server-side script to get only changed/newer translation files.
al@17040 305 - Add tazpkg hook to get translations after package install (if user wants).
al@17040 306 - ...