rev |
line source |
al@17040
|
1 tramys - TRAnslate MY Slitaz
|
al@17040
|
2 Tool for managing translation files for SliTaz GNU/Linux
|
al@17040
|
3
|
al@17040
|
4 Aleksei Bobylev <al.bobylev@gmail.com>, 2014
|
al@17040
|
5
|
al@17040
|
6
|
al@17040
|
7 Some random notes about tramys development.
|
al@17040
|
8
|
al@17040
|
9
|
al@17040
|
10 The idea
|
al@17040
|
11 ========
|
al@17040
|
12
|
al@17040
|
13 I like to use applications translated to my language. But in other hand I like
|
al@17040
|
14 that fact SliTaz is not overloaded with unnecessary files.
|
al@17040
|
15
|
al@17040
|
16 Some packages have their “twins”, which contains only translations. But note
|
al@17040
|
17 that all translations for the GIMP takes about 30 MB! And for Wesnoth ~ 90 MB!!!
|
al@17040
|
18 I don't need ALL that translations. Really. I need only one.
|
al@17040
|
19
|
al@17040
|
20 Now we have some language-packs in the SliTaz repository. These packs contains
|
al@17040
|
21 translations for several packages for several chosen locales. Not bad, but
|
al@17040
|
22 what to do if I need translations for other packages not listed here?
|
al@17040
|
23
|
al@17040
|
24 We set up the ftp.slitaz.org. Then I thought it was a good idea to seek and take
|
al@17040
|
25 the files with translations you want. And what about automation?
|
al@17040
|
26
|
al@17040
|
27 Translations we'll found in the “install” sub-folders. Something like that:
|
al@17040
|
28 ftp://cook.slitaz.org/nano/install/usr/share/locale/ru/LC_MESSAGES/nano.mo
|
al@17040
|
29 for “nano” package with “ru” locale.
|
al@17040
|
30
|
al@17040
|
31 Also we can download the same file using possibility in the cooker:
|
al@17040
|
32 http://cook.slitaz.org/cooker.cgi?download=../wok/nano/install/usr/share/locale/
|
al@17040
|
33 ru/LC_MESSAGES/nano.mo
|
al@17040
|
34
|
al@17040
|
35
|
al@17040
|
36 About gettext's locale search order
|
al@17040
|
37 ===================================
|
al@17040
|
38
|
al@17040
|
39 I know it's easy as for a first glance. My locale is “ru_UA” (I speak Russian
|
al@17040
|
40 and live in Ukraine). Gettext searches for “ru_UA” translations first. When it
|
al@17040
|
41 not finds them, he throws the country from the locale and looking for a “ru”
|
al@17040
|
42 translations.
|
al@17040
|
43
|
al@17040
|
44 And I know that full locale definition can contain encoding and variant as
|
al@17040
|
45 addition:
|
al@17040
|
46 language_COUNTRY.ENCODING@variant
|
al@17040
|
47 And all parts except language are optional. Can't figure it out reading official
|
al@17040
|
48 docs [[ https://www.gnu.org/software/gettext/manual/gettext.html ]]
|
al@17040
|
49 So, experimenting with next piece of code:
|
al@17040
|
50
|
al@17040
|
51 VAR0=$(LC_ALL="LL_CC.EE@VV" strace -o /tmp/s -e trace=file gettext test test; \
|
al@17040
|
52 grep -F '/usr/lib/locale/' /tmp/s | \
|
al@17040
|
53 grep -v locale-archive | \
|
al@17040
|
54 grep -F '/LC_IDENTIFICATION' | \
|
al@17040
|
55 sed 's|.*/usr/lib/locale/\([^/]*\).*|\1|g' | \
|
al@17040
|
56 tr '\n' ' ')
|
al@17040
|
57 echo "${VAR0#test}"
|
al@17040
|
58
|
al@17040
|
59 Here using special non-existing value for LC_ALL variable with aim to see all
|
al@17040
|
60 search variants (gettext stops to search if it found one of variants).
|
al@17040
|
61 Try it with different LC_ALL such as "LL@VV" or "LL.EE", etc.
|
al@17040
|
62
|
al@17040
|
63
|
al@17040
|
64 About preferred languages
|
al@17040
|
65 =========================
|
al@17040
|
66
|
al@17040
|
67 It is a good possibility I found in the Gettext docs, but not applied yet.
|
al@17040
|
68 When Gettext not found any matched translations, it shows untranslated
|
al@17040
|
69 (=English) messages. Some of us not learned English in the school; it can be
|
al@17040
|
70 German of French. So, I want to see translations to my native language, or
|
al@17040
|
71 (if they not exists), let's say, French translations. All it done with setting
|
al@17040
|
72 of LANGUAGE environment variable.
|
al@17040
|
73
|
al@17040
|
74
|
al@17040
|
75 About LC_ALL
|
al@17040
|
76 ============
|
al@17040
|
77
|
al@17040
|
78 It is not correct to set this variable up in some cases. And if we want to more
|
al@17040
|
79 mimic Gettext's behavior, we need to check many locale environment variables in
|
al@17040
|
80 certain order.
|
al@17040
|
81
|
al@17040
|
82
|
al@17040
|
83 About traffic saving
|
al@17040
|
84 ====================
|
al@17040
|
85
|
al@17040
|
86 Now tramys downloading all the localization files, again and again.
|
al@17040
|
87 It have no knowledge, is your existing localization file actual or outdated,
|
al@17040
|
88 it just re-download it.
|
al@17040
|
89
|
al@17040
|
90 Actually at the moment we have no simple solution. All solutions touches both
|
al@17040
|
91 client- and server-side.
|
al@17040
|
92
|
al@17040
|
93 1. Using GNU wget
|
al@17040
|
94 -----------------
|
al@17040
|
95
|
al@17040
|
96 It have '-N' option:
|
al@17040
|
97
|
al@17040
|
98 -N, --timestamping don't re-retrieve files unless newer than
|
al@17040
|
99 local.
|
al@17040
|
100
|
al@17040
|
101 But: a) default SliTaz uses BusyBox's wget that have no '-N' option, and
|
al@17040
|
102 b) SliTaz HTTP server not returned info about file's timestamp:
|
al@17040
|
103
|
al@17040
|
104 $ curl -I 'http://cook.slitaz.org/cooker.cgi?download=../wok/nano/install/usr/sh
|
al@17040
|
105 are/locale/ru/LC_MESSAGES/nano.mo'
|
al@17040
|
106 HTTP/1.1 200 OK
|
al@17040
|
107 Content-Type: application/octet-stream
|
al@17040
|
108 Content-Length: 55436
|
al@17040
|
109 Content-Disposition: attachment; filename=nano.mo
|
al@17040
|
110 Date: Wed, 06 Aug 2014 20:53:37 GMT
|
al@17040
|
111 Server: lighttpd (SliTaz GNU/Linux)
|
al@17040
|
112
|
al@17040
|
113 Our FTP server returned "Last-Modified", but both wgets not working with it :(
|
al@17040
|
114
|
al@17040
|
115 $ curl -I 'ftp://cook.slitaz.org/nano/install/usr/share/locale/ru/LC_MESSAGES/na
|
al@17040
|
116 no.mo'
|
al@17040
|
117 Last-Modified: Thu, 10 Apr 2014 20:34:37 GMT
|
al@17040
|
118 Content-Length: 55436
|
al@17040
|
119 Accept-ranges: bytes
|
al@17040
|
120
|
al@17040
|
121 $ busybox wget -O /tmp/nano1.mo 'ftp://cook.slitaz.org/nano/install/usr/share/lo
|
al@17040
|
122 cale/ru/LC_MESSAGES/nano.mo'
|
al@17040
|
123 Connecting to cook.slitaz.org (37.187.4.13:21)
|
al@17040
|
124 nano1.mo 100% |*******************************| 55436 0:00:00 ETA
|
al@17040
|
125
|
al@17040
|
126 $ ls -l /tmp/nano1.mo
|
al@17040
|
127 -rw-r--r-- 1 tux users 55436 Aug 6 22:01 /tmp/nano1.mo
|
al@17040
|
128
|
al@17040
|
129 $ wget -O /tmp/nano2.mo 'ftp://cook.slitaz.org/nano/install/usr/share/locale/ru/
|
al@17040
|
130 LC_MESSAGES/nano.mo'
|
al@17040
|
131
|
al@17040
|
132 $ ls -l /tmp/nano2.mo
|
al@17040
|
133 -rw-r--r-- 1 tux users 55436 Aug 6 22:03 /tmp/nano2.mo
|
al@17040
|
134
|
al@17040
|
135
|
al@17040
|
136 2. Write new client-server infrastructure
|
al@17040
|
137 -----------------------------------------
|
al@17040
|
138
|
al@17040
|
139 We can write script instead of using two-byte solution (-N) :D
|
al@17040
|
140 Using BusyBox's wget on client.
|
al@17040
|
141 Script logic is followed. Client sends request to server: filename and date.
|
al@17040
|
142 Server returns newer file or returns nothing... It need to establish only one
|
al@17040
|
143 connection per file.
|
al@17040
|
144
|
al@17040
|
145
|
al@17040
|
146 3. Using curl
|
al@17040
|
147 -------------
|
al@17040
|
148
|
al@17040
|
149 What about curl? Yes, it works:
|
al@17040
|
150
|
al@17040
|
151 $ curl -R -o /tmp/nano.mo 'ftp://cook.slitaz.org/nano/install/usr/share/locale/r
|
al@17040
|
152 u/LC_MESSAGES/nano.mo'
|
al@17040
|
153 % Total % Received % Xferd Average Speed Time Time Time Current
|
al@17040
|
154 Dload Upload Total Spent Left Speed
|
al@17040
|
155 100 55436 100 55436 0 0 30898 0 0:00:01 0:00:01 --:--:-- 32361
|
al@17040
|
156
|
al@17040
|
157 $ ls -l /tmp/nano.mo
|
al@17040
|
158 -rw-r--r-- 1 tux users 55436 Apr 10 20:34 /tmp/nano.mo
|
al@17040
|
159
|
al@17040
|
160 Also, curl can ask server for gzipped content for traffic saving, and
|
al@17040
|
161 transparently ungzip it for you:
|
al@17040
|
162
|
al@17040
|
163 $ curl -h
|
al@17040
|
164 --compressed Request compressed response (using deflate or gzip)
|
al@17040
|
165
|
al@17040
|
166 And wget can send any specified header to server:
|
al@17040
|
167
|
al@17040
|
168 $ wget -h
|
al@17040
|
169 --header=STRING insert STRING among the headers.
|
al@17040
|
170
|
al@17040
|
171
|
al@17040
|
172 Small note about date
|
al@17040
|
173 =====================
|
al@17040
|
174
|
al@17040
|
175 Do you remember server answer?
|
al@17040
|
176
|
al@17040
|
177 Last-Modified: Thu, 10 Apr 2014 20:34:37 GMT
|
al@17040
|
178
|
al@17040
|
179 We can get date of file in this format using next code:
|
al@17040
|
180
|
al@17040
|
181 LC_ALL=C; date -Rur ./nano.mo
|
al@17040
|
182 Thu, 10 Apr 2014 20:34:37 UTC
|
al@17040
|
183
|
al@17040
|
184 Only need to remove both “GMT” and “UTC” and now we can compare two strings
|
al@17040
|
185 that contains date:
|
al@17040
|
186 if [ "${SERVER_DATE% GMT}" != "${LOCAL_DATE% UTC}" ]; then ...
|
al@17040
|
187
|
al@17040
|
188
|
al@17040
|
189 About lists format
|
al@17040
|
190 ==================
|
al@17040
|
191
|
al@17040
|
192 Here three formats of localization: GNU gettext's mo-files, Qt's qm-files,
|
al@17040
|
193 and other techniques (not supported yet).
|
al@17040
|
194
|
al@17040
|
195 Gettext is more standardized: most often translation file called as package
|
al@17040
|
196 name, and it uses hierarchical tree structure in the
|
al@17040
|
197 /usr/share/locale/<locale name>/LC_MESSAGES
|
al@17040
|
198 Most often, but not always.
|
al@17040
|
199
|
al@17040
|
200 In other hand, Qt frequently uses one directory for all package's translations.
|
al@17040
|
201 Something like /usr/share/<package>/translations/<package>_<locale>.qm
|
al@17040
|
202 Not always too.
|
al@17040
|
203
|
al@17040
|
204 We can save all filenames with full path into one archive like it done in the
|
al@17040
|
205 tazpkg file /var/lib/tazpkg/files.list.lzma and will get 1.4 MiB file (48 KiB in
|
al@17040
|
206 LZMA). But I prefer lists with special format, I think plain list contains too
|
al@17040
|
207 much redundant info, and in some cases its too hard to determine which is the
|
al@17040
|
208 locale part in the filename.
|
al@17040
|
209
|
al@17040
|
210 So, let me describe lists format.
|
al@17040
|
211 Here are one or more lines for package that supports localization. In the each
|
al@17040
|
212 line here are four tab-delimited fields. First two are mandatory, and next two
|
al@17040
|
213 are optional.
|
al@17040
|
214
|
al@17040
|
215 1: package name
|
al@17040
|
216 2: locale name
|
al@17040
|
217 3: name of file that contains translations
|
al@17040
|
218 4: full path to that file
|
al@17040
|
219
|
al@17040
|
220 For “nano” package (30 lines):
|
al@17040
|
221 nano bg nano.mo /usr/share/locale/bg/LC_MESSAGES
|
al@17040
|
222 ...
|
al@17040
|
223 nano zh_TW nano.mo /usr/share/locale/zh_TW/LC_MESSAGES
|
al@17040
|
224
|
al@17040
|
225 Now let's use some rules to make list smaller.
|
al@17040
|
226
|
al@17040
|
227 RULE. Use “%” as placeholder for locale name in the path:
|
al@17040
|
228 /usr/share/locale/%/LC_MESSAGES
|
al@17040
|
229 RULE. Combine several locales into one space-separated list:
|
al@17040
|
230 bg ca cs da de es eu fi fr ga gl hu id it ms nb nl nn pl pt_BR ro ru rw sr
|
al@17040
|
231 sv tr uk vi zh_CN zh_TW
|
al@17040
|
232 RULE. Remove “.mo” from the end of filenames:
|
al@17040
|
233 nano
|
al@17040
|
234 RULE. Remove filename completely if it equals to package name.
|
al@17040
|
235 RULE. Remove default path “/usr/share/locale/%/LC_MESSAGES” completely.
|
al@17040
|
236 RULE. We can avoid empty 3rd and/or 4th fields:
|
al@17040
|
237 empty 3: field1 tab field2 tab tab field4
|
al@17040
|
238 empty 4: field1 tab field2 tab field3
|
al@17040
|
239 empty3&4: field1 tab field2
|
al@17040
|
240
|
al@17040
|
241 So, now rule for the “nano” package is very simple (one line):
|
al@17040
|
242 nano bg ca cs da de es eu fi fr ga gl hu id it ms nb nl nn pl pt_BR ro ru rw
|
al@17040
|
243 sr sv tr uk vi zh_CN zh_TW
|
al@17040
|
244
|
al@17040
|
245 And few more rules to compress list more.
|
al@17040
|
246 RULE. Combine several mo-files into one space-separated field if they have
|
al@17040
|
247 identical list of locales.
|
al@17040
|
248 Package “gtk+” contains “gtk20 gtk20-properties” in the third field.
|
al@17040
|
249 Also we can combine few paths into one space separated list.
|
al@17040
|
250 RULE. Use shell-syntax constants to save few bytes more:
|
al@17040
|
251 US="/usr/share"
|
al@17040
|
252 LC="LC_MESSAGES"
|
al@17040
|
253 PY="/usr/lib/python2.7/site-packages"
|
al@17040
|
254 R="/usr/lib/R/library"
|
al@17040
|
255 RT="$R/translations/%/$LC"
|
al@17040
|
256
|
al@17040
|
257 In some situations we have choice:
|
al@17040
|
258
|
al@17040
|
259 lcdnurse es fr he nl pt_BR ru th tr zh_CN $US/$P/locale/%/$LC
|
al@17040
|
260 lcdnurse es fr nl pt_BR ru tr zh_CN wxstd $US/$P/locale/%/$LC
|
al@17040
|
261
|
al@17040
|
262 or:
|
al@17040
|
263
|
al@17040
|
264 lcdnurse he th $US/$P/locale/%/$LC
|
al@17040
|
265 lcdnurse es fr nl pt_BR ru tr zh_CN $P wxstd $US/$P/locale/%/$LC
|
al@17040
|
266
|
al@17040
|
267 Both variants works, and no one is mistaken. Also, second variant is shorter by
|
al@17040
|
268 24 bytes :)
|
al@17040
|
269
|
al@17040
|
270
|
al@17040
|
271 Lists: to be or not...
|
al@17040
|
272 ======================
|
al@17040
|
273
|
al@17040
|
274 While I developed tramys my lists slowly moves to be more and more outdated.
|
al@17040
|
275 Many new packages in the wok, many upgrades... It seems like tramys lists
|
al@17040
|
276 needs to released very frequently. And I can't write all the sophisticated
|
al@17040
|
277 rules to automate process.
|
al@17040
|
278
|
al@17040
|
279 It sounds not bad if we'll attach localization info to the package! Like
|
al@17040
|
280 description file? No. The more files our filesystem contains — the slower it is.
|
al@17040
|
281 Better to attach it to the package receipt. In this case we not need first
|
al@17040
|
282 field. Something like:
|
al@17040
|
283
|
al@17040
|
284 L10N="he th $US/$P/locale/%/$LC
|
al@17040
|
285 es fr nl pt_BR ru tr zh_CN $P wxstd $US/$P/locale/%/$LC"
|
al@17040
|
286
|
al@17040
|
287 Off topic. I think it's better to place description to the package too:
|
al@17040
|
288 description()
|
al@17040
|
289 {
|
al@17040
|
290 cat << EOT
|
al@17040
|
291 Description here
|
al@17040
|
292 EOT
|
al@17040
|
293 }
|
al@17040
|
294
|
al@17040
|
295 And for compatibility: read info both from receipt (if any) and from lists.
|
al@17040
|
296
|
al@17040
|
297
|
al@17040
|
298 TODO
|
al@17040
|
299 ====
|
al@17040
|
300
|
al@17040
|
301 - Remove all translation files from all existing packages.
|
al@17040
|
302 - Migrate lists to receipts.
|
al@17040
|
303 - To support preferred languages in the LANGUAGE variable.
|
al@17040
|
304 - Write server-side script to get only changed/newer translation files.
|
al@17040
|
305 - Add tazpkg hook to get translations after package install (if user wants).
|
al@17040
|
306 - ...
|