Operations and Deployment: Difference between revisions

From ReddNet
Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
=== Deployment ===  
=== Deployment ===  
* Bring current deployment up-to-date
* Bring current deployment up-to-date (estimate complete by mid-Feb)
** Build new image for 2GB internal USB memory
** Build new image for 2GB internal USB memory
** Design and implement a depot recovery process
*** may be done by All-Hands
** Design and implement a depot recovery process  
*** This process will be vetted on current deployed hardware
*** Initial process may be done by All Hands or soon thereafter
*** Begin with SFASU for initial vetting
** Send recovery keys out to sites and update the depots
** Send recovery keys out to sites and update the depots
*** early February
*** need to recruit one person at each site to assist
** Set Nagios back up
** Set Nagios back up
** Develop MOU for current deployment
*** need a more stable system before this makes sense
*** turned off during this transition period to avoid flood of diagnostics
*** needs a day or two of time, could be done now depending on priority


* Prepare existing hardware for deployment
* Prepare additional existing hardware for deployment (19 nodes)
** Update image on internal USB (will use for testing of the above recovery process)
** Update image on internal USB (will use for testing of the above recovery process)
** Send 6 depots to SFASU with additional PDU
** Send 6 depots to SFASU with additional PDU
** Find new collaborators/sites
** Find new collaborators/sites for remaining 13.
*** Ideas?
 
* Develop MOU for current deployment (timescale uncertain?)
** Longer term project


* Define a standard set of software tools for depots
* Define a standard set of software tools for depots
** Iperf
** The following also exist
** Nagios
*** Iperf
** mtr
*** Nagios
** other tools...
*** mtr
** other tools to be added?
*** investigating new tools now


* Gain experience with existing deployment
* Gain experience with existing deployment  


* Discuss a multi-tiered system for sites
* Proposed multi-tiered system for sites for discussion
** Tier 1: Sites that run their own LServer and Chord ring
** Tier 1: Sites that run their own LServer and Chord ring
** Tier 2: Sites that manage their own REDDnet depots  
** Tier 2: Sites that manage their own REDDnet depots  

Revision as of 09:40, 28 January 2008

Deployment

  • Bring current deployment up-to-date (estimate complete by mid-Feb)
    • Build new image for 2GB internal USB memory
      • may be done by All-Hands
    • Design and implement a depot recovery process
      • This process will be vetted on current deployed hardware
      • Initial process may be done by All Hands or soon thereafter
      • Begin with SFASU for initial vetting
    • Send recovery keys out to sites and update the depots
      • early February
      • need to recruit one person at each site to assist
    • Set Nagios back up
      • need a more stable system before this makes sense
      • turned off during this transition period to avoid flood of diagnostics
      • needs a day or two of time, could be done now depending on priority
  • Prepare additional existing hardware for deployment (19 nodes)
    • Update image on internal USB (will use for testing of the above recovery process)
    • Send 6 depots to SFASU with additional PDU
    • Find new collaborators/sites for remaining 13.
      • Ideas?
  • Develop MOU for current deployment (timescale uncertain?)
    • Longer term project
  • Define a standard set of software tools for depots
    • The following also exist
      • Iperf
      • Nagios
      • mtr
    • other tools to be added?
      • investigating new tools now
  • Gain experience with existing deployment
  • Proposed multi-tiered system for sites for discussion
    • Tier 1: Sites that run their own LServer and Chord ring
    • Tier 2: Sites that manage their own REDDnet depots
    • Tier 3: Sites that use their own storage resources as depots
    • Develop MOU for each tier
  • Investigate new monitoring and management tools
    • rsync or similar (short term)
    • Perceus (long term)

Monitoring

  • Use StorCore, Nagios, iperf, and visualization tools from SC07
    • Have a statistic page that gathers information from tests and presents them cleanly
    • Define support for REDDnet
  • Create a REDDnet status site, using google maps
  • Create an RT site to resolve users' issues

Validation Framework

  • Stress and WAN testing on Production REDDnet
    • Automated testing with Clyde
    • Real world use
  • QA testing on Test REDDnet required before moving into production REDDnet
    • A stringent set of tests to test both the hardware, OS, IBP, and LStore as throughly as possible (primarily Clyde)
    • Allow users to test using this system