Can someone explain Reed Solomon ECC to me?

AMD_Man

Splendid
Jul 3, 2001
7,376
2
25,780
Since it's the summer and I'm so bored, I've decided to develop a universally transferable file format. Basically this format works similar to compressed ZIPs where you can have multiple files/folders stored in a single file. What makes my format unique is not only can it be stored on a hard drive, floppy and CD like convential formats, but it can also be stored physically on paper. It converts the 0s and 1s into symbols that can be printed by a printer. It can later be scanned in again and converted back into a machine readable format. This format will combine compression, encryption and ECC. That way this file format can stand all the demands people have had on a format. It would produce a small file, but be secure and be highly reliable so that even under the worst corruption, the data can be salvaged. I've played around with compression and encryption before but ECC is relatively new to me. Anyone know how Reed-Solomon ECC works?

I know this isn't exactly the best place to ask but I know there are many knowledgable people in here.

Intelligence is not merely the wealth of knowledge but the sum of perception, wisdom, and knowledge.
 
I'm no expert on the matter but if say, you had a zip file with tight compression that filled one floppy disk then surely the amount of paper needed to print out the code would run into hundreds of pages - especially if ecc was used? Not to mention the time needed to scan the pages into a machine and the need for a second application to convert the image into characters again.

I think something similar was used decades ago before the advent of digital media - punch cards used to be used.

Nice idea though, but I think it would only be suitable for very small files like an address book. It will still be a good programming skills exercise though. Good luck.

<b>Vorsprung durch Dontwerk</b>.....<i>as they say at VIA</i>
 
I'm no expert on the matter but if say, you had a zip file with tight compression that filled one floppy disk then surely the amount of paper needed to print out the code would run into hundreds of pages - especially if ecc was used? Not to mention the time needed to scan the pages into a machine and the need for a second application to convert the image into characters again.
Well, normally, you'd be right, but with today's high resolution printers and scanners, I may be able to get ~50KB-100KB per page (~75% of which will be real data, 25% will be parity). Yes, it's good only for small files, but it's still fun to try to store the same digital information on both conventional media and paper. Besides, I have nothing else to do this summer, lol.

Intelligence is not merely the wealth of knowledge but the sum of perception, wisdom, and knowledge.
 
Thanks! I found this little bit (pun intended, :tongue: ) interesting:

Current implementations of Reed-Solomon codes in CD technology are able to cope with error bursts as long as 4000 consecutive bits
No wonder scratched up CDs can still work! Hmm, I wonder how much room ECC takes up on a CD. You could probably fit well over a gigabyte on a CD without any ECC.

Intelligence is not merely the wealth of knowledge but the sum of perception, wisdom, and knowledge.
 
<A HREF="http://meltingpot.fortunecity.com/croatia/891/research.html
" target="_new">http://meltingpot.fortunecity.com/croatia/891/research.html
</A>

I like this one (below) (and subsequent page because they didn't forget to spread the error correction around)

www.cdrinfo.com/Sections/Articles/ Specific.asp?ArticleHeadline=Writing+Quality&index=5

Anyway you get the idea, google: ECC "Reed-Solomon"

The loving are the daring!
 
Well, if anyone is even remotely interested on my progress on this pathetic excuse for a project, I've been partially successful so far.

Converting the binary code of a file to symbols was relatively easy. I quickly moved on to added compression (where the file is compressed in memory before being converted) and a primitive file system. What this means is I can already store multiple files in a single symbol file. However, as of now, it is sequential file system which means that information about the file cannot be extracted without loading the entire file to RAM. This of course, is poor programming because unlike disk space which is huge, RAM is very finite and the file may not always fit into RAM. My goal is to store all the file info about each file all into a header at the top of the file. This won't take too much doing but I'm too tired to work on it today. I've yet to tackle encryption or ECC although it does currently use CRC32 (but you can't actually "correct" corrupted bits with CRC, but you can detect corruption).

Intelligence is not merely the wealth of knowledge but the sum of perception, wisdom, and knowledge.
 
Hmm, using only 8 different characters, each character could be 1 byte, no? You can fit a lot of characters on a sheet of paper, and that's not even a compressed format!

<font color=blue>Watts mean squat if you don't have quality!</font color=blue>
 
Hmm, using only 8 different characters, each character could be 1 byte, no? You can fit a lot of characters on a sheet of paper, and that's not even a compressed format!
If you want one character to represent an entire byte then you'd need 256 possibilities for that character. That's not possible with ASCII because many of the ASCII characters are control characters and non-printable characters. Unicode might be possible though, because unicode has 2^16 possible characters. However, recognizing that many combinations of characters would be a nightmare to say the least. Recognition accuracy would drop to as low as maybe 60%. I tried it, and I wasted a whole day working on a fruitless exercise. The best way to get accurate representations that can be easily recognized is by using simple characters like slashes. Also, I made each character represents a single bit, not a byte. Obviously, without compression that would make the file 8X larger.

Intelligence is not merely the wealth of knowledge but the sum of perception, wisdom, and knowledge.
 
Wouldn't you need a range of 256 characters for 1 character to represent 1 byte?

Then a line of 1024 characters to represent 1 KB

Thats over a million characters on paper in order to store the same data as a floppy disk.

OK, thats a primative way of looking at it but as I'm no programmer it seems logic to me. I am interested though and will keep reading.

<b>Vorsprung durch Dontwerk</b>.....<i>as they say at VIA</i>
 
For your printing on a piece of paper you could develop all kinds of symbols. Eventually getting into character recognition (OCR??) (fingerprint determination?)

A symbol for a word, a sentance, an idea, a whole paragraph or a whole book ( 🙂 Now that's real compression). Since the computer isn't limited to poor memory the way we humans are an ideograph/pictograph (something like a chinese/japanese/shorthand character) could represent a common phrase or word.

There are only a couple of hundred thousand words in the english language and maybe an average or three meanings to each one, a few (tens)thousand commmon phrases. Maybe two or 3 million symbols. Might take a while to define them all though.

What are words but abstractions of thought? And what are the arrangements of the letters but a representation of that abstraction?

I'm getting another beer!

The loving are the daring!
 
A symbol for a word, a sentance, an idea, a whole paragraph or a whole book ( 🙂 Now that's real compression). Since the computer isn't limited to poor memory the way we humans are an ideograph/pictograph (something like a chinese/japanese/shorthand character) could represent a common phrase or word.

There are only a couple of hundred thousand words in the english language and maybe an average or three meanings to each one, a few (tens)thousand commmon phrases. Maybe two or 3 million symbols. Might take a while to define them all though.

What are words but abstractions of thought? And what are the arrangements of the letters but a representation of that abstraction?
Genius! That's an excellent idea. However, .exe files, for example don't use actual words but they do have certain patterns. Now, all I have to do is construct a pattern dictionary and I could compress it quite significantly indeed! Haha, here I was thinking at the binary level, I totally forgot about symbolizing patterns in a couple of bytes rather than hundreds. Thanks!

Intelligence is not merely the wealth of knowledge but the sum of perception, wisdom, and knowledge.