sh writeup SHOW /MERRILL,D/COMPRESS/WRITEUP DEFAULTING TO /*/COMPRESS/WRITEUP 6 0 0 0 0 + BYTER AND DBYTE BYTER AND DBYTE 0 0 (BYTING OFF MORE THAN YOU CAN CHEW) 0 0 0 RNH 5/5/78 1 0 0 0 0 OVERVIEW 1 THE CODES 2 TEST RESULTS 5 USING BYTER 6 USING DBYTE 8 RESULTS 9 1 1 0 IN THE COURSE OF DISCUSSING SEEDIS DATA TRANFER, HARVARD HOLMES SUGGESTED ANOTHER METHOD OF DATA COMPRESSION OTHER THAN NYBBLE (Q.V.). IT IS BASED ON BYTES RATHER THAN NYBBLES AND MAY BE MORE COMPATIBLE ON THE VAX MACHINE. THE FOLLOWING CONCEPTS WERE DISCUSSED. 1. RATHER THAN CONVERT INTEGERS INTO A DECIMAL FORM, USE THE BINARY FORM AND ONLY TRANSFER MULTIPLES OF 8 BITS. 2. ENOUGH BYTES TO INCLUDE A SIGN BIT MUST BE USED SO THAT NEGATIVE NUMBERS MAY EASILY BE RECOGNIZED. 3. AN OP-CODE AND NUMBER OF BYTES WILL BE PACKED INTO ONE BYTE. 4. SINCE THE CENSUS DATA INCLUDES VALUES NO LARGER THAN 16 DIGITS, THE MAXIMUM NUMBER OF BYTES NEEDED IS 7 FOR THE VALUE AND ONE FOR THE OP-CODE/COUNT. 5. DISPLAY CODED VALUES WOULD BE CONVERTED TO 8-BIT ASCII 6. GETTING LOST IN A DATA STREAM MAY BE A PROBLEM. THIS MAY BE SOLVED BY THE USE OF A TABLE AT THE BEGINNING OF A RECORD OR, BY USING THE NEXT PROBABLE OP-CODE/COUNT. 0 WITH THIS IN MIND, THE FOLLOWING OP-CODES ARE SUGGESTED. 0 1. 00B -- START OF RECORD, TOTAL BYTE COUNT 2. 10B -- VALUES BYTE COUNT 3. 11B -- REPEAT COUNT BYTES 4. 12B -- SHIFT TO ASCII MODE 5. 13B -- SKIP BYTE COUNT 6. 14B -- EXPONENT PART OF FLOATING POINT NUMBER 7. 15B -- MANTISSA OF FLOATING POINT NUMBER 8. 377B - (IN ASCII MODE) END OF RECORD 1 2 0 1. START OF RECORD (00B/05B) ----- -- ------ THE BEGINNING OF EACH RECORD WILL HAVE THIS CODE WITH THE NUMBER OF TOTAL BYTES IN THE RECORD. 00B/05B NN NN NN NN NN THE OP-CODE WILL ALWAYS HAVE A COUNT OF 5 WITH THE FIVE BYTES BEING THE NUMBER OF BYTES IN THE RECORD EXCLUDING THESE SIX. 0 2. VALUE CODE (10B/NB) ----- ---- A SINGLE VALUE WILL HAVE AN OP-CODE OF 10B FOLLOWED BY A FOUR-BIT BYTE COUNT FOR THE VALUE. IF THE BYTE COUNT IS ZERO, THE THE VALUE IS ZERO AND NO VALUE FOLLOWS. THE NUMBER OF BYTES USED MUST BE LARGE ENOUGH TO INCLUDE A SIGN BIT (LEFT MOST BIT). THE VALUE ITSELF WOULD BE RIGHT JUSTIFIED. FOR EXAMPLE, CONSIDER THE DECIMAL NUMBER 255 (377 OCTAL). THIS OCCUPIES ONLY ONE BYTE. CONVERTING THIS WOULD LOOK LIKE 10B/01B 377 OR 201 377 BUT SINCE THE SIGN BIT IS OCCUPIED, IT WOULD BE TRANSLATED BACK TO -127 (177B PLUS SIGN BIT). SO WE MUST USE TWO BYTES LIKE THIS 202 000 377 THIS WOULD GET TRANSLATED BACK PROPERLY SINCE THE SIGN BIT IS ZERO. 0 3. REPEAT CODE (11B) ------ ---- USING 11B AS A REPEAT CODE FOLLOWED BY 4 BITS OF BYE COUNT WOULD COMPRESS THE DATA EVEN FURTHER (AS IN NYBBLE). THE 1 3 0 LARGEST ARRAY TO BE HANDLED IN THE CENSUS DATA IS 1344 WORDS IN THE SECOND COUNT SO THAT THE NUMBER OF BYTES FOR THE COUNT IS NO LARGER THAN 2 BYTES. FOR EXAMPLE, IF THE WHOLE SECOND COUNT ARRAY IS ZERO, THE BYTE PATTERN WOULD BE 11B/02B 005 100 10B/00 OR 222 005 100 200 THE WHOLE ARRAY IS REDUCED TO 4 BYTES. THE FIRST BYTE INCLUDES THE OP-CODE FOR REPEAT (11B) AND THE BYTE COUNT OF THE NUMBER OR REPEATED VALUES (02B). THE NEXT TWO BYTES IS THE NUMBER OF REPEATS (005 100) 2500B OR 1344 DECIMAL. THE LAST BYTE IS THE OP-CODE FOR ZERO VALUE (10B/00B). 0 4. SHIFT TO ASCII (12B) ----- -- ----- THIS WOULD BE REPRESENTED BY 13B/NCB AND WHERE NBC IS THE BYTE COUNT FOR THE NUMBER OF ASCII CHARACTERS TO FOLLOW. THE VALUE 377B CAN BE USED AS AN END-OF-RECORD INDICATOR AND FILLER TO FILL OUT A 60-BIT WORD WHEN WRITING THE RECORD OUT. 0 5. SKIP CODE (13B/10B) ---- ---- THIS IS USED AS A KLUGE TO GET THE TOTAL NUMBER OF WORDS WHEN READING THE TAPE ON THE 6600. IT WILL DIRECTLY FOLLOW THE BEGINNING-OF-RECORD CODE (00B) AND OCCUPY NINE BYTES. BOTH CODES ARE DESIGNED TO FIT IN TWO 6600 WORD (15 BYTES) SO THAT WHEN READING ON A BYTE MACHINE (VAX, E.G.) THE FIRST SIX BYTES WOULD GIVE THE TOTAL BYTE COUNT IN THE RECORD AND SKIP THE NEXT 8 BYTES. WHEN READING ON THE 6600, THE FIRST WORD IS SKIPPED AND THE NEXT WORD WOULD BE THE TOTAL WORD COUNT. SO THE FIRST TWO WORDS ON THE RECORD WOULD LOOK LIKE 1 4 00/05 NN NN NN NN NN 13/10 00 00 00 00 00 WW WW WW ^ ^ I I ---WORD 1 ---WORD 2 SO ON THE 6600, THE SECOND WORD IS A VALID INTEGER. 0 6. AND 7. FLOATING POINT NUMBERS (14B AND 15B) -------- ----- ------- IN CONVERTING FLOATING POINT NUMBERS, THE EXPONENT IS UNBIASED AND THE MANTISSA IS SHIFTED TO REMOVE TRAILING ZEROES. THE RESULTING TWO NUMBERS ARE THEN TREATED LIKE INTEGERS IN GETTING BYTE COUNT AND PACKING IN THE LOWER FOUR BITS AFTER THE CODE. IF THE EXPONENT IS ZERO THEN NO ZERO BYTE IS PUT IN (AS IN INTEGERS). FOR EXAMPLE, THE NUMBER 1.0 WOULD LOOK LIKE 14B/00B 15B/01B 001B SO THAT EXP=0 AND MANT=1 AND IN CONVERTING BACK TO FLOAT- ING, WE WOULD HAVE F = MAN*(2**EXP) = 1.0 ZERO FLOATING POINT NUMBERS ARE TREATED AS INTEGERS, I.E., 10B/00 NEGATIVE NUMBERS HAVE A NEGATIVE MANTISSA IN THIS SCHEME. 8. END OF RECORD (377B) --- -- ------ AT THE END OF A RECORD, A SHIFT-TO-ASCII-MODE IS PUT IN WITH ENOUGH BYTE COUNT TO FILL OUT A COMPLETE WORD WITH 377B. 1 5 TEST RESULTS ---- ------- 0 TEST RUNS WERE MADE USING THE ABOVE SCHEME ON FOUTH AND SIXTH COUNT CENSUS DATA. THE FOLLOWING TABLE SUMMARIZES THE RESULTS. CP MR CU PERCENT TIME COMPRESS. ------ ---- ---- ------ 4TH COUNT TRACTS 19.0 348 30 18 (20 RECORDS) (26.5)* (348) (37) (22) 4TH COUNT STATE 27.2 379 39 38 (20 RECORDS) (43.7) (404) (56) (43) 6TH COUNT PLACE 17.4 267 28 18 (91 RECORDS) (24.6) (280) (57) (22) 6TH COUNT STATE 25.1 293 38 35 (91 RECORDS) (45.7) (301) (62) (41) *THE VALUES IN PARENTHESE ARE NYBBLE VALUES FOR THE SAME RUN 1 6 0 USING THE ROUTINES ----- ----- -------- 0 0 + COMPRESS THE COMPRESSION ROUTINES ARE STORED IN PSS LIBRARY COMPRESS, + BYTERPL SUBSET BYTERPL. THEY ARE IN FORTRAN AND SHOULD BE COMPILED WITH RUN76. FIVE ROUTINES MAY BE CALLED. FIRST, AT THE START OF A RECORD, + ALWAYS ALWAYS CALL STARTR ( OUT , N ) + OUT TO INITILIZE POINTERS AND COUNTERS. THE ARRAY OUT IS TO BE DIMENSIONED BY THE USER AS LARGE AS NEEDED TO CONTAIN THE + N COMPRESSED DATA. THE VALUE N WILL CONTAIN THE TOTAL WORD COUNT AFTER THE RECORD HAS BEEN COMPRESSED. THEN ONE OF THREE ROUTINES MAY BE CALLED DEPENDING ON THE TYPE OF VALUES THAT NEED COMPRESSION. FOR INTEGERS CALL COMPRSS ( IN , NW , OUT , N ) + IN NW WHERE IN IS THE ARRAY OF INTEGERS TO BE CONVERTED, NW IS THE NUMBER OF VALUES TO BE CONVERTED. FOR DISPLAY CODE CALL STUFF ( IN , NC , OUT , N ) + IN NC WHERE IN IS A DISPLAY CODED ARRAY WITH NC CONSECUTIVE CHARACTERS. FOR FLOATING POINT NUMBERS CALL FLOATS ( IN , NW , OUT , N ) + IN NW WHERE IN IS AN ARRAY OF NW NUMBERS. TO END THE RECORD PROPERLY, PUTTING IN END-OF-RECORD MARKS + ALWAYS AND WORD AND BYTE COUNTS --ALWAYS-- CALL ENDR ( OUT , N ) 1 7 AT THIS POINT THE RECORD IS READY TO BE WRITTEN OUT. AS AN EXAMPLE -- READ(1) (IN(I),I=1,NW) CALL STARTR ( OUT , N ) CALL COMPRSS ( IN , NW , OUT , N ) CALL ENDR ( OUT , N ) WRITE(2) (OUT(I),I=1,N) WILL CONVERT THE ARRAY OF INTEGERS TO BYTE MODE. ALL TYPES OF VALUES MAY BE MIXED IN THE SAME OUT ARRAY SO THAT CALLING COMPRSS, STUFFA AND/OR FLOATS IN ANY ORDER AND AS MANY TIMES AS NEEDED CAN BE DONE. 1 8 0 DECOMPRESSION ROUTINES ------------- -------- A SET OF SUBROUTINES TO DECOMPRESS A RECORD PREVIOUSLY COM- + COMPRESS DBYTEPL PRESSED BY BYTER IN ON PSS LIBRARY COMPRESS, SUBSET DBYTEPL. THEY ARE IN FORTRAN AND SHOULD BE COMPILED USING RUN76. THERE IS ONLY ONE ROUTINE TO CALL. FOR EXAMPLE, READ(1) DUM , NW , (IN(I),I=1,NW) WILL READ IN A BYTE MODE RECORD. THEN CALL DCMPRSS ( IN , NW , OUT , N ) + OUT N WHERE OUT WILL CONTAIN N EXPANDED WORDS. IT IS UP TO THE USER TO KNOWN THE TYPES OF VALUES RETURNED. 1 9 0 VERIFY RESULTS ------ ------- 0 0 A TEST RECORD WAS RUN THROUGH BYTER AND THEN DBYTE. IT CONTAINED A MIXTURE OF INTEGERS, DISPLAY AND FLOATING VALUES. THE INPUT RECORD WAS COMPARED TO THE OUTPUT RECORD USING VERIFY( ... ) . THE ONLY FAILURE WAS AN INPUT OF MINUS ZERO (-0) WAS CONVERTED BACK TO PLUS ZERO. THIS MAY CAUSE A PROBLEM ON LOGICAL VALUES BUT CAN BE CORRECTED IF NECESSARY. 1 OK