Jump to content

Base 256 Compression


Fillo Farber

Recommended Posts

I'm looking for a way to compress 7 digit floats that range from 0.000001 to 9.999999 . I ran across the following Base 256 compression script in the WIKI. I thought I could adapt it to my use but I can't seem to get it to work correctly as shown in the WIKI. The return values I get all have a leading 0 (zero) in front of each character. *EDIT* Also any ideas as to how to convert this to use with a base 10 input would be greatly appreciated. I don't need those hyphens inserted but I need the decimal point. *END EDIT* Thanks in advance!

 

string lslBase256Chars = "0123456789abcdefghijklmnopqrstuvwxyz!\"#$%&'()*+™-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`{|}~¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅ";
string lslHexChars = "0123456789abcdef";
 
string compressKey(key _key) {
    string k = llToLower((string)llParseString2List((string)_key, ["-"], []));
    integer l = llStringLength(k);
    integer j;
    integer v;
 
    string out = "";
    integer i = 0;
    do {
        j = i + 1;
        v = (llSubStringIndex(lslHexChars, llGetSubString(k, i, i)) << 4) | 
            llSubStringIndex(lslHexChars, llGetSubString(k, j, j));
        out = (out = "") + out + llGetSubString(lslBase256Chars, v, v);
    } while ((i += 2) < l);
 
    return (out = k = "") + out;
}
 
key decompressKey(string k) {
    integer l = llStringLength(k);
    string out = "";
    integer v; integer j; integer x;
    integer i = 0;
 
    do {
        v = llSubStringIndex(lslBase256Chars, llGetSubString(k, i, i));
        j = v >> 4; x = v & 0xF;
        out = (out = "") + out + 
            llGetSubString(lslHexChars, j, j) + 
            llGetSubString(lslHexChars, x, x);
    } while ((++i) < l);
 
    return (key)llInsertString(
        llInsertString(
            llInsertString(
                llInsertString((out = "") + out, 8,"-"),
                13,
                "-"
            ),
            18,
            "-"
        ),
        23,
        "-"
    );
}
Link to comment
Share on other sites

I think I found the issue with the leading zeros. I changed the one line in the output function from this:

out = (out = "") + out + llGetSubString(lslHexChars, j, j) + llGetSubString(lslHexChars, x, x);

to this:

out = (out = "") + out +  llGetSubString(lslHexChars, x, x);

 

Link to comment
Share on other sites

Altho I fail to see the point, you can compress a float with a base 256 algorithm.

Here are 2 functions to convert a float to a 4-char string and back:

This should work (assuming there are no typo).

 

string Base256Chars = "0123456789abcdefghijklmnopqrstuvwxyz!\"#$%&'()*+™-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`{|}~¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅ";string uuCompress(float f){    integer k = (integer)f;    string result = llGetSubString(Base256Chars, k, k);    f -= (float)k;    string flt = (string)f;    integer i = 2;    for (; i <= 6; i += 2)    {        k = (integer)llGetSubString(flt, i, i + 1);        result += llGetSubString(Base256Chars, k, k);    }    return result;}float uuDecompress(string zz){    string result;    integer i = 1;    for (; i <= 3; ++i)    {        integer k = llSubStringIndex(Base256Chars, llGetSubString(zz, i, i));        result += llGetSubString("0" + (string)k, -2, -1);    }    return (float)llSubStringIndex(Base256Chars, llGetSubString(zz, 0, 0))             + (float)("0." + result);}default{    touch_start(integer total_number)    {        float ff = llFrand(256.0);        string comp = uuCompress(ff);        llOwnerSay((string)ff + " --> " + comp + " --> " + (string)uuDecompress(comp));    }}

Note that these functions handle only positive numbers from 0 to 255.999999. (Handling negative numbers is possible with a reduction of the range of the supported numbers.)

 

Link to comment
Share on other sites

Kaluura, I can not get that script to work properly. I fixed one line to read "f -= (float)k;" as I assume that's what you intended. I put the base 256 char set in a notecard because it can't be read properly after compiled within the script. With these changes it appears to work with inputs up to 99.000000 then something goes wrong. Except for the integer portion of the input number, it appears that your script compression is limited to an output that represents a range of up to 99 per position - using only the first 99 chars of the 256 char set. (there's that number 99 again - hmm)  Just guessing, but might it be possible to represent a float as large as 10.000000 with just 3 base 256 chars by using the full set throughout?

Link to comment
Share on other sites

it isn't pretty but this should work for ranges 0.0-15.99999

string gStr256 = "0123456789abcdefghijklmnopqrstuvwxyz!\"#$%&'()*+™-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`{|}~¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅ";string uFlt2Str256( float vFltNum ){	integer vIntTmp;	return	  llGetSubString( gStr256, vIntTmp = (integer)(vFltNum * 16), vIntTmp ) +	  llGetSubString( gStr256, vIntTmp = (integer)(vFltNum * 4096) & 255, vIntTmp ) +	  llGetSubString( gStr256, vIntTmp = (integer)(vFltNum * 1048576) & 255, vIntTmp );}float u256Str2Flt( string vStrFlt ){	return	  ((llSubStringIndex( gStr256, llGetSubString( vStrFlt, 0, 0 )) << 16) +	   (llSubStringIndex( gStr256, llGetSubString( vStrFlt, 1, 1 )) << 8) +	   (llSubStringIndex( gStr256, llGetSubString( vStrFlt, 2, 2 )))) / 1048576.0;}

 

Link to comment
Share on other sites


Void Singer wrote:

it isn't pretty but this should work for ranges 0.0-15.99999
string gStr256 = "0123456789abcdefghijklmnopqrstuvwxyz!\"#$%&'()*+™-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`{|}~¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅ"; string uFlt2Str256( float vIntFlt ){ integer vIntTmp; return  llGetSubString( gStr256, vIntTmp = (integer)(vFltNum * 16), vIntTmp ) +   llGetSubString( gStr256, vIntTmp = (integer)(vIntNum * 4096) & 255, vIntTmp ) +   llGetSubString( gStr256, vIntTmp = (integer)(vIntNum * 1048576) & 255, vIntTmp ); } float u256Str2Flt( string vStrFlt ){ return((llSubStringIndex( gStr256, llGetSubString( vStrFlt, 0, 0 )) << 24) + (llSubStringIndex( gStr256, llGetSubString( vStrFlt, 1, 1 )) << 16) + (llSubStringIndex( gStr256, llGetSubString( vStrFlt, 2, 2 )))) / 1048576.0; }

 

This script seems to have the potential to do what I'm looking for
:)
  The variable vIntNum I assume should have been vIntFlt? I made that fix but it doesn't seem to work. I'm getting good results on inputs from 0.000001 through 0.000010 but after that it goes wrong. Here's some examples:

Input = : 0.000010

Flt2Str Result = : 00a

u256Str2Flt Result = : 0.000010

 

Input = : 0.000011

Flt2Str Result = : 00b

u256Str2Flt Result = : 0.000010

 

Input = : 0.000012

Flt2Str Result = : 00c

u256Str2Flt Result = : 0.000011

Input = : 1.000000

Flt2Str Result = : g00

u256Str2Flt Result = : 256.000000

The Flt2Str function might be working fine but the u256Str2Flt is returning the same value for 00a and 00b as shown above. I don't have an understanding of the bitwise operators so I'm not sure what's going on. Any ideas?

 

Link to comment
Share on other sites

* Update * Not sure if you read before this edit but here's an update: I'm finding errors still. The first two examples show that the 2nd function returns the same results for 00w & 00x, then in the next two examples the 1st function seems to put out just two chars at times. Hope this is enough info for you to spot the problem. I really appreciate your assistance... Thank you!

Input = : 0.000031
Flt2Str Result = : 00w
u256Str2Flt Result = : 0.000031


Input = : 0.000032
Flt2Str Result = : 00x
u256Str2Flt Result = : 0.000031

Input = : 9.913982
Flt2Str Result = : ÜÝê
u256Str2Flt Result = : 9.913982


Input = : 9.913983
Flt2Str Result = : ÜÝ
u256Str2Flt Result = : 9.913818

Link to comment
Share on other sites

just tested by copy paste, your original results (the 9.x range) are showing a different string than the copied function...

the lower inputs, are showing truncation errors, that llCeil seems to fix at first blush. if we could bit hack floats directly it'd be a lot easier (instead of going through some madness like strife's FUI-IUF functions)

try this instead...

string gStr256 = "0123456789abcdefghijklmnopqrstuvwxyz!\"#$%&'()*+™-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`{|}~¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅ";string uFlt2Str256( float vFltNum ){    integer vIntTmp;    return      llGetSubString( gStr256, vIntTmp = (integer)(vFltNum * 16), vIntTmp ) +      llGetSubString( gStr256, vIntTmp = (integer)(vFltNum * 4096) & 255, vIntTmp ) +      llGetSubString( gStr256, vIntTmp = llCeil(vFltNum * 1048576) & 255, vIntTmp );}float u256Str2Flt( string vStrFlt ){    return      ((llSubStringIndex( gStr256, llGetSubString( vStrFlt, 0, 0 )) << 16) +       (llSubStringIndex( gStr256, llGetSubString( vStrFlt, 1, 1 )) << 8) +       (llSubStringIndex( gStr256, llGetSubString( vStrFlt, 2, 2 )))) / 1048576.0;}

 

 ETA:
that doesn't actually match the full precision (7.3) of a float of that range, but the tail was always iffy... if it fails or needs perfect precision then it'll need to be 4 characters, and multiply/shift the ranges another 256/8bits so that the tail gets properly converted.

Link to comment
Share on other sites

I said "assuming there is no typo". :P

Anyway, I fixed the function and made a real script around my functions. It works, even for numbers greater than 99.

 

[05:42]  Object: 160.565100 --> ã?:9 --> 160.565100[05:43]  Object: 150.431700 --> Ù(h™ --> 150.431700[05:43]  Object: 120.982200 --> »¥l~ --> 120.982200[05:43]  Object: 149.425500 --> Ø'=S --> 149.425500[05:43]  Object: 115.898900 --> ¶``& --> 115.898900[05:43]  Object: 238.102100 --> ıal' --> 238.102100[05:43]  Object: 117.986400 --> ¸¥F: --> 117.986400

(Those numbers are weird. I do not know why they all end up with "00". llFrand() should not do that.)

It is however true that if you do not use numbers greater than 99.999999, my functions will use only the 100 first chars of  'Base256Char'. We can do better with 3 chars only but the range of the compressable numbers falls to 0 to 7.999999.

Here is the script (Garantied without typo and working):

string Base256Chars = "0123456789abcdefghijklmnopqrstuvwxyz!\"#$%&'()*+™-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`{|}~¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅ";string uuCompress(float f){    integer k = (integer)f;    f -= (float)k;    k = (k & 0x7) << 21; // from 0 to 7    string flt = (string)f;    integer b = 14;    integer i = 2;    for (; i <= 6; i += 2)    {        k = k | ((integer)llGetSubString(flt, i, i + 1) << b);        b -= 7;    }    string result;    for (b = 16; b >= 0; b -= 8)    {        i = (k & (0xFF << b)) >> b;        result += llGetSubString(Base256Chars, i, i);    }    return result;}float uuDecompress(string zz){    integer k = 0;    integer b = 16;    integer i = 0;    for (; i <= 2; ++i)    {        k = k | (llSubStringIndex(Base256Chars, llGetSubString(zz, i, i)) << b);        b -= 8;    }    float flt = (float)((k & 0xE00000) >> 21);    string result = "0.";    for (b = 14; b >= 0; b -= 7)    {        result += llGetSubString("0" + (string)((k & (0x7F << b)) >> b), -2, -1);    }    return flt + (float)result;}default{    touch_start(integer total_number)    {        float ff = llFrand(8.0);        string comp = uuCompress(ff);        llOwnerSay((string)ff + " --> " + comp + " --> " + (string)uuDecompress(comp));    }}

 Some testing in-world:

[07:14]  Object: 5.265731 --> éßâ --> 5.265731[07:14]  Object: 4.209079 --> È*V --> 4.209079[07:15]  Object: 7.435775 --> ĭğĎ --> 7.435775[07:15]  Object: 0.858818 --> l¯i --> 0.858818[07:15]  Object: 3.041920 --> ¤9× --> 3.041920[07:15]  Object: 6.091103 --> ąLÆ --> 6.091103

 It works! :D

Some binary babble (if you understand that kind of language): To encode a number from 0 to 99 in binary, we need only 7 bits. So we split the "tail" behind the decimal point in 3 parts and we shift them to fill the 21 lowest bits of an integer (32 bits). The integer part of the number is shifted into the 23-21 bits. So we have an integer with 24 significant bits that we split/shift in 3 numbers of 8 bits (from 0 to 255) which give us indices to concatenate 3 chars from 'Base256Chars'.

Decompression simply works the other way around. From the indices of the 3 chars, we re-build the integer with 24 significant bits that we can then split/shift to re-write the float.

 

Link to comment
Share on other sites

Well I'm glad I found a challenge for you folks! I've come up with my own method which "appears" to work with floats as high as10 using 3 base256 chars with no loss. I was getting losses in the least significant digit so I manipulated that one as a string before converting to an integer. The weird thing is that when I added a comparison of the input vs output variables it reports that they are different although they seem to be identical. i.e. it is telling me that 0.000010 does not equal 0.000010 .  Here's the script that includes the test routine. Excuse my kindergarten methodology but I find it easier to deal with small lines of code at a time. I tend to get lost in the parenthesis. (I should add that I plagiarized part of the code from Void's work)

 

string gStr256 = "0123456789abcdefghijklmnopqrstuvwxyz!\"#$%&'()*+™​-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`{|}~¡¢£​¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕ​Ö×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹Ćć​ĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹ​ĺĻļĽľĿŀŁłŃńŅ";string uFlt2Str256( float vFltNum ){        string vStr = (string) vFltNum;    integer index = llSubStringIndex(vStr,".");    vStr=(string) llDeleteSubString(vStr, index,index);     integer vIntTmp=(integer)vStr;    integer vIntTmp1 = llFloor(vIntTmp/65536);    integer vIntTmp2 = llFloor((vIntTmp-(vIntTmp1*65536))/256);    integer vIntTmp3 = vIntTmp-((vIntTmp1*65536)+(vIntTmp2*256));    return      llGetSubString(gStr256, vIntTmp1, vIntTmp1) +      llGetSubString(gStr256, vIntTmp2, vIntTmp2) +      llGetSubString(gStr256, vIntTmp3, vIntTmp3);      }float u256Str2Flt( string vStrFlt ){    return       ((llSubStringIndex( gStr256, llGetSubString( vStrFlt, 0, 0 ))*65536) +       (llSubStringIndex( gStr256, llGetSubString( vStrFlt, 1, 1 )) *256) +       (llSubStringIndex( gStr256, llGetSubString( vStrFlt, 2, 2 ))))/1000000.0;}default{    state_entry()    {         float i;         for (i = 0; i< 10000; i=i+.000001)        {            string testIn = uFlt2Str256(i);            float testOut = u256Str2Flt(testIn);            if (testOut!=i)            {                llOwnerSay((string)testOut+" "+(string)i);            }        }    }     }

 

Link to comment
Share on other sites


Fillo Farber wrote:

The weird thing is that when I added a comparison of the input vs output variables it reports that they are different although they seem to be identical. i.e. it is telling me that 0.000010 does not equal 0.000010

this is because of how floats are stored... all those leading zeros are ignored, so you can get fractional parts when using straight multiplication to raise the value. which is what you saw in the original 31/32 comparison using mine, and what you are seeing in the comparison values in your recent test... test display truncates the trailing digits past 7 places, so you don't see that they don't match.

a pure float union integer and then a conversion to base 256 would perfectly encode a float of any size and perfectly decode it in 4 charaters, but the code is a bit bulky in lsl... in other most languages it's built in because they directly convert the bits from one type to the other.

please note that for storage purposes, any string representation in LSL of a float or integer is actually going to take more memory than the actual numerical value... in fact, even for most transport purposes.

Link to comment
Share on other sites

You said they have leading zeros and trailing digits. There are both? Can we strip those to format the reconstituted float to match original input value? I understand that this does not save memory. The purpose of this function is to store up to 85 floats in an object description - which is pretty cool :)

Link to comment
Share on other sites

the native format of floats strips leading zeros, until it gets to the first set bit which it then peels off and assumes it's there, and the number of binary places is encoded as a small bit field. at 4 characters it's possible to duplicate.

but there may be a small problem with your plans... the description field is limited by bytes, not characters....  and not all the characters in that series are 1 byte (optimally they could be, but that would mean using the ascii set, and not all of those will print to the description)

kaluura's code is more in that direction, and should have the room to encode the shifts for sub-normal numbers (below 1) for perfect conversion of printed values (although you may still end up with 0.0 != 0.0). alternatively, if you throw in a cast to and from string, your values will always match up

Link to comment
Share on other sites

"and not all the characters in that series are 1 byte " ewww, I didn't realize that...  I was just working on casting to string as you mentioned which did work. I found another issue though - when we reach the number 98 we seem to hit a problem with that char in the char set. It doesn't pick it up for some reason.

Yes,  Kaluura's is workable but since that script is using the full 256 char set as well I'll have to reserve at least 2 bytes memory for each char. (also the issue as with the 99th char)

Do you know of any issues using llIntegerToBase64?

Link to comment
Share on other sites

Doing some tests with the base 64 function. An output range of base 64 chars that is AAAAAA== to AA///w== represents integers 0 to 1048575 . Within that range I can strip  the leading AA and trailing == to reduce them to 4 char representations of the inputs. Above 104875 we need 5 chars and so on. I can work within the range of 4 chars for my purpose. Unless there are some other issues, I should be able to store 254/4 floats in an object name and description combined, correct? Too bad about the base 256 issues because that would have been very compact.

I really appreciate the help from both of you thus far. I'm a bitwise'er about this topic now.

Link to comment
Share on other sites

  • 2 weeks later...

I tried using these functions for compressing some floating point data, but unfortunately the functions aren't able to handle negative numbers.  Trying to compress the float -0.707107, for example, results in 16.662240.  Can these functions be adapted to compress negative floats as well as positives?

 

Link to comment
Share on other sites

at that level of resolution, you'd only get one or the other... however strife two functions FUI and IUF which convert floats to and from integer, which can easily be converted to hex at 8 characters or base64 at 6 characters and still have exact 32bit floats. compressing farther requires assumption and limitng the range

Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...