Talk:Development ideas: Difference between revisions

From ReddNet
Jump to navigation Jump to search
No edit summary
No edit summary
Line 10: Line 10:
**a single top hash for each IBP allocation
**a single top hash for each IBP allocation
**a single top hash for the file
**a single top hash for the file
**or more levels
***This is problematic since a file can be replicated with different allocation sizes and boundaries.


*It seems to be common in distributed systems to use a binary hash tree rather than a hash list -- even on individual data blocks.
*It seems to be common in distributed systems to use a binary hash tree rather than a hash list -- even on individual data blocks.

Revision as of 09:14, 29 January 2008

Data Integrity

  • One could imagine a 3-level tree
    • a hash for each lowest-common-unit, say the commonly used 1K
    • a single top hash for each IBP allocation
    • a single top hash for the file
      • This is problematic since a file can be replicated with different allocation sizes and boundaries.

If the IBP protocol were to be extended to support a single specific checksum method it would possibly be something like TTH. In this example, this would mean TTH usage becomes part of the IBP protocol specification and both the IBP client and IBP depot would be required to implement it. The next level tools and APIs like LoRS and libxio and lstcp would not be required to know anything about IBP internal checksums, but they could optionally use it.

  • If checksumming were optional, additional IBP commands to turn it on and off would be needed, and interoperability with non-checksumming depots becomes an issue.
  • Do the hashes go into the exnode, or are they only stored in the depot allocation tables?
  • Since the IBP client needs to do the checksumming it needs to be emphasized that the checksum method becomes part of the IBP protocol itself and is not a hidden internal operation.
    • The checksum method is exposed at least to the IBP client
    • This may have significant implications to the scalability of the IBP protocol and these scalability implications may need to be explicitly addressed.

- Dan 1/24/08

NFU checksum approach

  • It has also been suggested that NFUs could accommodate any generic hashing approach that the end user wants to implement.
    • the NFU operation would read data from disk via localhost (can NFU talk to the depot internally?)
  • This would be much slower than an algorithm integrated into the depot.
  • A depot internal checksum doesn't stop the use of NFU checksums.
    • excessive NFU usage could hamper depot performance

- Dan 1/28/08