wok-current view tramys/stuff/README @ rev 17079
mpd: curl-dev for http stream
author | Richard Dunbar <mojo@slitaz.org> |
---|---|
date | Sun Aug 24 12:55:58 2014 +0000 (2014-08-24) |
parents | |
children |
line source
1 tramys - TRAnslate MY Slitaz
2 Tool for managing translation files for SliTaz GNU/Linux
4 Aleksei Bobylev <al.bobylev@gmail.com>, 2014
7 Some random notes about tramys development.
10 The idea
11 ========
13 I like to use applications translated to my language. But in other hand I like
14 that fact SliTaz is not overloaded with unnecessary files.
16 Some packages have their “twins”, which contains only translations. But note
17 that all translations for the GIMP takes about 30 MB! And for Wesnoth ~ 90 MB!!!
18 I don't need ALL that translations. Really. I need only one.
20 Now we have some language-packs in the SliTaz repository. These packs contains
21 translations for several packages for several chosen locales. Not bad, but
22 what to do if I need translations for other packages not listed here?
24 We set up the ftp.slitaz.org. Then I thought it was a good idea to seek and take
25 the files with translations you want. And what about automation?
27 Translations we'll found in the “install” sub-folders. Something like that:
28 ftp://cook.slitaz.org/nano/install/usr/share/locale/ru/LC_MESSAGES/nano.mo
29 for “nano” package with “ru” locale.
31 Also we can download the same file using possibility in the cooker:
32 http://cook.slitaz.org/cooker.cgi?download=../wok/nano/install/usr/share/locale/
33 ru/LC_MESSAGES/nano.mo
36 About gettext's locale search order
37 ===================================
39 I know it's easy as for a first glance. My locale is “ru_UA” (I speak Russian
40 and live in Ukraine). Gettext searches for “ru_UA” translations first. When it
41 not finds them, he throws the country from the locale and looking for a “ru”
42 translations.
44 And I know that full locale definition can contain encoding and variant as
45 addition:
46 language_COUNTRY.ENCODING@variant
47 And all parts except language are optional. Can't figure it out reading official
48 docs [[ https://www.gnu.org/software/gettext/manual/gettext.html ]]
49 So, experimenting with next piece of code:
51 VAR0=$(LC_ALL="LL_CC.EE@VV" strace -o /tmp/s -e trace=file gettext test test; \
52 grep -F '/usr/lib/locale/' /tmp/s | \
53 grep -v locale-archive | \
54 grep -F '/LC_IDENTIFICATION' | \
55 sed 's|.*/usr/lib/locale/\([^/]*\).*|\1|g' | \
56 tr '\n' ' ')
57 echo "${VAR0#test}"
59 Here using special non-existing value for LC_ALL variable with aim to see all
60 search variants (gettext stops to search if it found one of variants).
61 Try it with different LC_ALL such as "LL@VV" or "LL.EE", etc.
64 About preferred languages
65 =========================
67 It is a good possibility I found in the Gettext docs, but not applied yet.
68 When Gettext not found any matched translations, it shows untranslated
69 (=English) messages. Some of us not learned English in the school; it can be
70 German of French. So, I want to see translations to my native language, or
71 (if they not exists), let's say, French translations. All it done with setting
72 of LANGUAGE environment variable.
75 About LC_ALL
76 ============
78 It is not correct to set this variable up in some cases. And if we want to more
79 mimic Gettext's behavior, we need to check many locale environment variables in
80 certain order.
83 About traffic saving
84 ====================
86 Now tramys downloading all the localization files, again and again.
87 It have no knowledge, is your existing localization file actual or outdated,
88 it just re-download it.
90 Actually at the moment we have no simple solution. All solutions touches both
91 client- and server-side.
93 1. Using GNU wget
94 -----------------
96 It have '-N' option:
98 -N, --timestamping don't re-retrieve files unless newer than
99 local.
101 But: a) default SliTaz uses BusyBox's wget that have no '-N' option, and
102 b) SliTaz HTTP server not returned info about file's timestamp:
104 $ curl -I 'http://cook.slitaz.org/cooker.cgi?download=../wok/nano/install/usr/sh
105 are/locale/ru/LC_MESSAGES/nano.mo'
106 HTTP/1.1 200 OK
107 Content-Type: application/octet-stream
108 Content-Length: 55436
109 Content-Disposition: attachment; filename=nano.mo
110 Date: Wed, 06 Aug 2014 20:53:37 GMT
111 Server: lighttpd (SliTaz GNU/Linux)
113 Our FTP server returned "Last-Modified", but both wgets not working with it :(
115 $ curl -I 'ftp://cook.slitaz.org/nano/install/usr/share/locale/ru/LC_MESSAGES/na
116 no.mo'
117 Last-Modified: Thu, 10 Apr 2014 20:34:37 GMT
118 Content-Length: 55436
119 Accept-ranges: bytes
121 $ busybox wget -O /tmp/nano1.mo 'ftp://cook.slitaz.org/nano/install/usr/share/lo
122 cale/ru/LC_MESSAGES/nano.mo'
123 Connecting to cook.slitaz.org (37.187.4.13:21)
124 nano1.mo 100% |*******************************| 55436 0:00:00 ETA
126 $ ls -l /tmp/nano1.mo
127 -rw-r--r-- 1 tux users 55436 Aug 6 22:01 /tmp/nano1.mo
129 $ wget -O /tmp/nano2.mo 'ftp://cook.slitaz.org/nano/install/usr/share/locale/ru/
130 LC_MESSAGES/nano.mo'
132 $ ls -l /tmp/nano2.mo
133 -rw-r--r-- 1 tux users 55436 Aug 6 22:03 /tmp/nano2.mo
136 2. Write new client-server infrastructure
137 -----------------------------------------
139 We can write script instead of using two-byte solution (-N) :D
140 Using BusyBox's wget on client.
141 Script logic is followed. Client sends request to server: filename and date.
142 Server returns newer file or returns nothing... It need to establish only one
143 connection per file.
146 3. Using curl
147 -------------
149 What about curl? Yes, it works:
151 $ curl -R -o /tmp/nano.mo 'ftp://cook.slitaz.org/nano/install/usr/share/locale/r
152 u/LC_MESSAGES/nano.mo'
153 % Total % Received % Xferd Average Speed Time Time Time Current
154 Dload Upload Total Spent Left Speed
155 100 55436 100 55436 0 0 30898 0 0:00:01 0:00:01 --:--:-- 32361
157 $ ls -l /tmp/nano.mo
158 -rw-r--r-- 1 tux users 55436 Apr 10 20:34 /tmp/nano.mo
160 Also, curl can ask server for gzipped content for traffic saving, and
161 transparently ungzip it for you:
163 $ curl -h
164 --compressed Request compressed response (using deflate or gzip)
166 And wget can send any specified header to server:
168 $ wget -h
169 --header=STRING insert STRING among the headers.
172 Small note about date
173 =====================
175 Do you remember server answer?
177 Last-Modified: Thu, 10 Apr 2014 20:34:37 GMT
179 We can get date of file in this format using next code:
181 LC_ALL=C; date -Rur ./nano.mo
182 Thu, 10 Apr 2014 20:34:37 UTC
184 Only need to remove both “GMT” and “UTC” and now we can compare two strings
185 that contains date:
186 if [ "${SERVER_DATE% GMT}" != "${LOCAL_DATE% UTC}" ]; then ...
189 About lists format
190 ==================
192 Here three formats of localization: GNU gettext's mo-files, Qt's qm-files,
193 and other techniques (not supported yet).
195 Gettext is more standardized: most often translation file called as package
196 name, and it uses hierarchical tree structure in the
197 /usr/share/locale/<locale name>/LC_MESSAGES
198 Most often, but not always.
200 In other hand, Qt frequently uses one directory for all package's translations.
201 Something like /usr/share/<package>/translations/<package>_<locale>.qm
202 Not always too.
204 We can save all filenames with full path into one archive like it done in the
205 tazpkg file /var/lib/tazpkg/files.list.lzma and will get 1.4 MiB file (48 KiB in
206 LZMA). But I prefer lists with special format, I think plain list contains too
207 much redundant info, and in some cases its too hard to determine which is the
208 locale part in the filename.
210 So, let me describe lists format.
211 Here are one or more lines for package that supports localization. In the each
212 line here are four tab-delimited fields. First two are mandatory, and next two
213 are optional.
215 1: package name
216 2: locale name
217 3: name of file that contains translations
218 4: full path to that file
220 For “nano” package (30 lines):
221 nano bg nano.mo /usr/share/locale/bg/LC_MESSAGES
222 ...
223 nano zh_TW nano.mo /usr/share/locale/zh_TW/LC_MESSAGES
225 Now let's use some rules to make list smaller.
227 RULE. Use “%” as placeholder for locale name in the path:
228 /usr/share/locale/%/LC_MESSAGES
229 RULE. Combine several locales into one space-separated list:
230 bg ca cs da de es eu fi fr ga gl hu id it ms nb nl nn pl pt_BR ro ru rw sr
231 sv tr uk vi zh_CN zh_TW
232 RULE. Remove “.mo” from the end of filenames:
233 nano
234 RULE. Remove filename completely if it equals to package name.
235 RULE. Remove default path “/usr/share/locale/%/LC_MESSAGES” completely.
236 RULE. We can avoid empty 3rd and/or 4th fields:
237 empty 3: field1 tab field2 tab tab field4
238 empty 4: field1 tab field2 tab field3
239 empty3&4: field1 tab field2
241 So, now rule for the “nano” package is very simple (one line):
242 nano bg ca cs da de es eu fi fr ga gl hu id it ms nb nl nn pl pt_BR ro ru rw
243 sr sv tr uk vi zh_CN zh_TW
245 And few more rules to compress list more.
246 RULE. Combine several mo-files into one space-separated field if they have
247 identical list of locales.
248 Package “gtk+” contains “gtk20 gtk20-properties” in the third field.
249 Also we can combine few paths into one space separated list.
250 RULE. Use shell-syntax constants to save few bytes more:
251 US="/usr/share"
252 LC="LC_MESSAGES"
253 PY="/usr/lib/python2.7/site-packages"
254 R="/usr/lib/R/library"
255 RT="$R/translations/%/$LC"
257 In some situations we have choice:
259 lcdnurse es fr he nl pt_BR ru th tr zh_CN $US/$P/locale/%/$LC
260 lcdnurse es fr nl pt_BR ru tr zh_CN wxstd $US/$P/locale/%/$LC
262 or:
264 lcdnurse he th $US/$P/locale/%/$LC
265 lcdnurse es fr nl pt_BR ru tr zh_CN $P wxstd $US/$P/locale/%/$LC
267 Both variants works, and no one is mistaken. Also, second variant is shorter by
268 24 bytes :)
271 Lists: to be or not...
272 ======================
274 While I developed tramys my lists slowly moves to be more and more outdated.
275 Many new packages in the wok, many upgrades... It seems like tramys lists
276 needs to released very frequently. And I can't write all the sophisticated
277 rules to automate process.
279 It sounds not bad if we'll attach localization info to the package! Like
280 description file? No. The more files our filesystem contains — the slower it is.
281 Better to attach it to the package receipt. In this case we not need first
282 field. Something like:
284 L10N="he th $US/$P/locale/%/$LC
285 es fr nl pt_BR ru tr zh_CN $P wxstd $US/$P/locale/%/$LC"
287 Off topic. I think it's better to place description to the package too:
288 description()
289 {
290 cat << EOT
291 Description here
292 EOT
293 }
295 And for compatibility: read info both from receipt (if any) and from lists.
298 TODO
299 ====
301 - Remove all translation files from all existing packages.
302 - Migrate lists to receipts.
303 - To support preferred languages in the LANGUAGE variable.
304 - Write server-side script to get only changed/newer translation files.
305 - Add tazpkg hook to get translations after package install (if user wants).
306 - ...