Talk:TSSP Framework
My understanding of the TSSP task is to identify and help standardize a sequence of steps that are essential to each of the data operations listed (or eventually requested)(Please redirect me if my interpretation is off. I'd hate to spend time generating information that only makes sense to me!). So I'm starting with a minimalistic set that answers the question "Given a logistical network, what can I do with it?" with "Following this protocol/spec, you can at least achieve [these operations]."
- Hunter 19/20/07
"Here are some comments on your specification page:
1. I think it would be good to list all of the possible actors and their roles at the beginning of the description. Some of the actors you have listed (such as the "route generator" may not exist yet, which means that perhaps we should not include them in this document.
- Right. In both the routing scenarios the path is manually determined. There's currently no automated agent for either dynamic or static.
2. You should come up with another name/description for the IBP client, one which describes its function, not its implementation. "Data source" and "Data destination" might be two such roles (notice that there are two roles even though a single protocol is being used).
- This is what I was referring to when I asked about the scope of the actors. I only listed the components embodied by the TSSP implementation and not the external entities it will communicate with. It seems like the presence of a protocol's client implies the participation of its service, but I can make that explicit.
3. Your description of the "Steps" for the various operations is very high level, and some protocols that fit the description definitely would not work, certainly not achieving acceptable performance. For instance, the multistream TCP download implemented in LoRS is quite complex, with a number of optimization parameters. Now, we might not want to standardize on a protocol with all that complexity but on the other hand we might want to. If not, then how much complexity do we want to require? And do we want to specify the full protocol as an optional variant of a simpler one?
4. It's not clear to me why you refer to storage space as bandwidth. This is not standard terminology even when dealing with communication buffers, as far as I know. It certainly doesn't fit the scenario when storage allocations are used to implement persistent files (unless you mean to characterize a file as a "channel through time" which is a nice image, but one that few people outside of Claxton Hall will understand).
- Well taken. Maybe we could talk about channel capacity?
5. Your Teardown operation seems to be the only way to truncate the duration of storage allocations. What if a user simply deletes the exNode without doing the truncation? Remember, this may happen due to client error, so a protocol that rules it out cannot be implemented reliably.
- This also speaks to my understanding of the TSSP assignment. If TSSP represents the standard sequence of steps that are essential to each of the data operations, then premature deletion of the metadata seems like an implementation fault. Or removal of the metadata can be the responsibility of some other operation?
- The management of exNodes is at this point unspecified. We could decide to include in the TSSP a let of rules as well as protocols for managing them, but we certainly don't have such rules now. When a client obtains an exNode from a directory service, what obligations does it have, if any, in how that exNode is managed? How about when a client creates a new exNode? Is there an obligation to register it with a directory and to delete its own copy? If a copy of the exNode is lost, how robust must the directory service and IBP service be (this goes to the question of how long allocation durations are)?
- It's easy to say that losing the exNode is an implementation fault, but it's not clear that it can be avoided, given that faults can occur in the network and/or operating system serving and connecting the components. An implementation fault is not simply a runtime fault, it is one that can/should be avoided.
7. I think that some open issues include what sorts of up/download algorithms are considered acceptable. If there is a single-stream implementation that gets really terrible performance and sometimes doesn't succeed when more sophisticated algorithms succeed, is that an acceptable implementation (becauase when it does succeed it delivers the correct bits).
- Once again, I'm unsure of how a sub-routine would be integrated into a protocol description. This is just part of having an IS guy try to work this out. The exchange would be something like - "I've followed the TSSP spec and can upload/download, but it sucks!", response "The TSSP spec is a minimalist representation of how to do those operations. If you want better results, try parallelizing."
- The best analogy is TCP. There are many possible implementations of a TCP-like protocol, but most of the ones that have been documented are not known to work very well. Particular TCP variants are documented and approved by the IETF, others are not.
- If someone comes forward with an implementation of TSSP which "meets the specification" but had bad performance or other behavior, would be want to put our stamp of approval on it? Arguably, it would not be a truely interoperable alternative to existing implementations because applications that used it expecting high performance would be disappointed. It's all a matter of what we want "interoperable" to mean.
8. The relationship between resources on a single depot needs to be specified. Are they necessarily independent or can they be linked (eg having resource zero represent the union of the other resources).
- Even local resources of different types? What's the significance of the same-depot resource relationship with regards to the operations listed?
- Good question. This issue came up because LoRS cannot make allocations on resources other than zero, but Alan wanted to run depots with multiple resources. His way of dealing with this was to make resource zero represent the union of all other resources, with an allocation made on resource zero returning a capability pointing to one of the other resources. It was a "work-around", but if this sort of this is going to be allowable, that fact should be specified as part of the standard. "
- Micah/Hunter 12/20/07
Checksums:
TCP generates a checksum at the sender and then re-calculates a checksum at the receiver.
TSSP could have a verify option (or requirement) that
- generates a checksum per slice
- stores that in the exnode
- The 2nd checksum would have to be calculated on the depot
- re-reading the data from the media, of course
- wastes bandwidth re-sending the data to the client for checksum calculation
- There could be a chksum-reverify option
- the client sends chksums to the depots slice by slice.
- The checksum is re-calculated at the depot
- re-scanning the media
- the depot only sends success/failure back (not the re-calculated checksum)
- But how is the 2nd checksum calculated on the depot?
- part of the IBP manage protocol?
- NFU-type operation?
- Replication can be made more robust by deallocating from the depots, and removing from the exnode, the corrupt slices while replacing them with good copies
- Data "warming" can be made a more complete operation that includes checksum verification and data rebuilding.
The end-user is probably most comfortable using a checksum for the whole file as verification -- and CMS requires matching with the file checksum. But the slice by slice features provides a lot of flexibility for replica management and it seems we should be doing this from the start.
- Dan 1/15/08