Talk:Development ideas
Jump to navigation
Jump to search
Data Integrity
- The idea of 64k chunks per allocation for checksums is what's termed a hash list
- which can be extended to a multi-level hash tree
- One could imagine a 3-level tree
- a hash for each lowest-common-unit, say the commonly used 1K
- a single top hash for each IBP allocation
- a single top hash for the file
- or more levels
- It seems to be common in distributed systems to use a binary hash tree rather than a hash list -- even on individual data blocks.
- Tiger Tree Hash TTH is currently the most? popular since it provides high security with a relatively low cpu cost
- http://en.wikipedia.org/wiki/Tiger_%28hash%29
- http://en.wikipedia.org/wiki/Hash_tree#Tiger_tree_hash
- http://en.wikipedia.org/wiki/List_of_hash_functions#Computational_costs_of_CRCs_vs_Hashes
If the IBP protocol were to be extended to support a single specific checksum method it would possibly be something like TTH. In this example, this would mean TTH usage becomes part of the IBP protocol specification and both the IBP client and IBP depot would be required to implement it. The next level tools and APIs like LoRS and libxio and lstcp would not be required to know anything about IBP internal checksums, but they could optionally use it. If checksumming were optional, additional IBP commands to turn it on and off would be needed, and interoperability with non-checksumming depots becomes and issue.
- Dan 1/24/08