Talk:Development ideas: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
(13 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
==Data Integrity== | ==Data Integrity== | ||
*The idea of 64k chunks | *The idea of 64k chunks for checksums (subdividing an IBP allocation) is what's termed a hash list | ||
**http://en.wikipedia.org/wiki/Hash_list | **http://en.wikipedia.org/wiki/Hash_list | ||
*which can be extended to a multi-level hash tree | *which can be extended to a multi-level hash tree | ||
Line 10: | Line 10: | ||
**a single top hash for each IBP allocation | **a single top hash for each IBP allocation | ||
**a single top hash for the file | **a single top hash for the file | ||
** | ***This is problematic since a file can be replicated with different allocation boundaries leading to different top hash for the file. | ||
*It seems to be common in distributed systems to use a binary hash tree rather than a hash list -- even on individual data blocks. | *It seems to be common in distributed file systems to use a binary hash tree rather than a hash list -- even on individual data blocks. | ||
**Tiger Tree Hash TTH is currently the most? popular since it provides high security with a relatively low cpu cost | **Tiger Tree Hash TTH is currently the most? popular since it provides high security with a relatively low cpu cost | ||
**http://en.wikipedia.org/wiki/Tiger_%28hash%29 | **http://en.wikipedia.org/wiki/Tiger_%28hash%29 | ||
Line 18: | Line 18: | ||
**http://en.wikipedia.org/wiki/List_of_hash_functions#Computational_costs_of_CRCs_vs_Hashes | **http://en.wikipedia.org/wiki/List_of_hash_functions#Computational_costs_of_CRCs_vs_Hashes | ||
If the IBP protocol were to be extended to support a single specific checksum method it would possibly be something like TTH. In this example, this would mean TTH usage becomes part of the IBP protocol specification and both the IBP client and IBP depot would be required to implement it. The next level tools and APIs like LoRS and libxio and lstcp would not be required to know anything about IBP internal checksums, but they could optionally use it. If checksumming were optional, additional IBP commands to turn it on and off would be needed, and interoperability with non-checksumming depots becomes and | If the IBP protocol were to be extended to support a single specific checksum method it would possibly be something like TTH. In this example, this would mean TTH usage becomes part of the IBP protocol specification and both the IBP client and IBP depot would be required to implement it. The next level tools and APIs like LoRS and libxio and lstcp would not be required to know anything about IBP internal checksums, but they could optionally use it. | ||
*If checksumming were optional, additional IBP commands to turn it on and off would be needed, and interoperability with non-checksumming depots becomes an issue. | |||
*Do the hashes go into the exnode, or are they only stored in the depot allocation tables? | |||
*Since the IBP client needs to do the checksumming it needs to be emphasized that | |||
**the checksum method becomes part of the IBP protocol itself | |||
**and is not a hidden internal operation. | |||
**The checksum method is exposed at least to the IBP client | |||
**This may have significant implications to the scalability of the IBP protocol and these scalability implications may need to be explicitly addressed. | |||
- Dan 1/24/08 | - Dan 1/24/08 | ||
== NFU checksum approach == | |||
*It has also been suggested that NFUs could accommodate any generic hashing approach that the end user wants to implement. | |||
** the NFU operation would read data from disk via localhost (can NFU talk to the depot internally?) or with mmaps for NFU servers integrated into IBP depot. | |||
*This may be slower than an algorithm integrated into the depot. | |||
*A depot internal checksum doesn't stop the use of NFU checksums. | |||
**excessive NFU usage could hamper depot performance | |||
- Dan 1/28/08 |
Latest revision as of 08:07, 2 July 2008
Data Integrity
- The idea of 64k chunks for checksums (subdividing an IBP allocation) is what's termed a hash list
- which can be extended to a multi-level hash tree
- One could imagine a 3-level tree
- a hash for each lowest-common-unit, say the commonly used 1K
- a single top hash for each IBP allocation
- a single top hash for the file
- This is problematic since a file can be replicated with different allocation boundaries leading to different top hash for the file.
- It seems to be common in distributed file systems to use a binary hash tree rather than a hash list -- even on individual data blocks.
- Tiger Tree Hash TTH is currently the most? popular since it provides high security with a relatively low cpu cost
- http://en.wikipedia.org/wiki/Tiger_%28hash%29
- http://en.wikipedia.org/wiki/Hash_tree#Tiger_tree_hash
- http://en.wikipedia.org/wiki/List_of_hash_functions#Computational_costs_of_CRCs_vs_Hashes
If the IBP protocol were to be extended to support a single specific checksum method it would possibly be something like TTH. In this example, this would mean TTH usage becomes part of the IBP protocol specification and both the IBP client and IBP depot would be required to implement it. The next level tools and APIs like LoRS and libxio and lstcp would not be required to know anything about IBP internal checksums, but they could optionally use it.
- If checksumming were optional, additional IBP commands to turn it on and off would be needed, and interoperability with non-checksumming depots becomes an issue.
- Do the hashes go into the exnode, or are they only stored in the depot allocation tables?
- Since the IBP client needs to do the checksumming it needs to be emphasized that
- the checksum method becomes part of the IBP protocol itself
- and is not a hidden internal operation.
- The checksum method is exposed at least to the IBP client
- This may have significant implications to the scalability of the IBP protocol and these scalability implications may need to be explicitly addressed.
- Dan 1/24/08
NFU checksum approach
- It has also been suggested that NFUs could accommodate any generic hashing approach that the end user wants to implement.
- the NFU operation would read data from disk via localhost (can NFU talk to the depot internally?) or with mmaps for NFU servers integrated into IBP depot.
- This may be slower than an algorithm integrated into the depot.
- A depot internal checksum doesn't stop the use of NFU checksums.
- excessive NFU usage could hamper depot performance
- Dan 1/28/08