[TAG] compressed issues of LG

Ben Okopnik ben at linuxgazette.net
Fri Nov 2 16:02:47 MSK 2007


On Fri, Nov 02, 2007 at 02:13:10PM +1100, Minh Nguyen wrote:
> So far, issues of LG have been compressed using tar and gzip. Is there
> any intention to use tar with bzip2 for future issues? Since most of
> the files in each issue are text files, bzip2 is more efficient (in
> terms of the size of the compressed file) than gzip. Here is a
> comparison of bzip2 and gzip using the current issue; i.e. November
> 2007 (#144):
> 
> 1028042 lg-144.tar.bz2
> 1045337 lg-144.tar.gz
> 
> IMHO, providing a bzip2 compressed format of LG issues would save some
> download time.

As I recall, we had a similar discussion here in TAG quite a while back
(digging through my 'Sent_mail' says 2002 - but I can't find it in LG.
Annoying, that.) In any case, here's the comparison that I ran then:

``
	OK, I'm the curious type... Here's a bunch of files from many walks of
	life; let's see who does what.
	
	-rw-r--r--    1 ben      ben       1474560 May 20 05:51 test.bin
	-rw-rw-r--    1 ben      ben        102970 Sep 19  2000 test.bmp
	-rw-rw-r--    1 ben      ben        121880 Sep 19  2000 test.gif
	-rw-rw----    1 ben      ben        939783 Jun 17 15:29 test.jpg
	-rw-r--r--    1 ben      ben       1727320 Oct  6 15:51 test.mov
	-rw-r--r--    1 ben      ben       1048576 Oct 16 20:59 test.nulls
	-rw-r--r--    1 ben      ben       1048576 Oct 16 21:03 test.ones
	-rw-r--r--    1 ben      ben        490765 Sep  1  2001 test.pbm
	-rw-r--r--    1 ben      ben        197029 Oct 12 13:53 test.ps
	-rw-rw-r--    1 ben      ben       1995119 May 29  2001 test.txt
	-rw-r--r--    1 ben      ben      36354922 Oct 16 20:29 test.wav
	
	# So then, I was like, "Dude, check out some of *this* stuff:"
	
	rar a ../rar.rar *      # Very slow
	zip ../zip.zip *
	tar czf ../tgz.tgz *    # Uses gzip as compressor
	tar cjf ../tbz2.tbz2 *  # Uses bz2 as compressor, slowest of all
	tar cf -|compress - 
	
	# And the winnah and champeen is...
	
	-rw-r--r--    1 ben      ben      26653542 Oct 16 21:09 rar.rar
	-rw-r--r--    1 ben      ben      33171830 Oct 16 21:26 tbz2.tbz2
	-rw-r--r--    1 ben      ben      36128937 Oct 16 21:10 zip.zip
	-rw-r--r--    1 ben      ben      36132733 Oct 16 21:14 tgz.tgz
	-rw-r--r--    1 ben      ben      43458125 Oct 16 21:21 Z.Z
	
	I'll be darned. Looks like "rar" is it. Whodathunk? 
''

Unfortunately, the only method that shows an appreciable savings in size
- 'rar', that is - uses a proprietary algorithm.

Given that there's no appreciable gain to be had by changing - and that
a change may occasion problems (e.g., it would break any automated
scripts that download and decompress the monthly archives), I don't see
it changing any time soon. I'm usually pretty reluctant to change things
like this without a really compelling reason.


-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *




More information about the TAG mailing list