Main Page: Difference between revisions

From ReddNet
Jump to navigation Jump to search
 
Line 3: Line 3:
== {{Template:REDDnet}}:  Enabling Data Intensive Science in the Wide Area ==
== {{Template:REDDnet}}:  Enabling Data Intensive Science in the Wide Area ==


[[Image:reddnetmap.gif|right|350px]]{{Template:REDDnet}} (Research and Education Data Depot network) is an NSF-funded infrastructure project designed to provide a large distributed storage facility for data intensive collaboration among the nation's researchers and educators in a wide variety of application areas. Its mission is to provide "working storage" to help manage the logistics of moving and staging large amounts of data in the wide area network, e.g. among collaborating researchers who are either trying to move data from one collaborator (person or institution) to another or who want share large data sets for limited periods of time (ranging from a few hours to a few months) while they work on it. REDDnet is not designed or intended to be a replacement for reliable archival or long term personal storage and users must make separate arrangements to insure that the data they are sharing via REDDnet's "best effort" storage is also preserved independently with stronger guarantees.   
[[Image:reddnetmap.gif|right|500px]]{{Template:REDDnet}} (Research and Education Data Depot network) is an NSF-funded infrastructure project designed to provide a large distributed storage facility for data intensive collaboration among the nation's researchers and educators in a wide variety of application areas. Its mission is to provide "working storage" to help manage the logistics of moving and staging large amounts of data in the wide area network, e.g. among collaborating researchers who are either trying to move data from one collaborator (person or institution) to another or who want share large data sets for limited periods of time (ranging from a few hours to a few months) while they work on it. REDDnet is not designed or intended to be a replacement for reliable archival or long term personal storage and users must make separate arrangements to insure that the data they are sharing via REDDnet's "best effort" storage is also preserved independently with stronger guarantees.   


One example comes from the [http://cms.cern.ch/ CMS] collaboration, a high energy physics experiment that will be taking data soon at the Large Hadron Collider (LHC) at [http://public.web.cern.ch/ CERN].  Groups of researchers, distributed across the country and the world, will want to use data products derived from the raw data produced by collisions in the LHC to do a variety of tasks from calibrating the detector to searching for new physics.  They will want the newest data products available for anywhere from a month to a few months, after which it can be archived to make way for the next batch of data. Although all the data will be stored long term at CERN and [http://www.fnal.gov/ Fermi Lab] they would benefit greatly if this data could be made more readily available for processing on their distributed computing infrastructure, especially on the [http://www.opensciencegrid.org/ Open Science Grid]. REDDnet is the kind of resource needed to deal with the data logistics of this application.
One example comes from the [http://cms.cern.ch/ CMS] collaboration, a high energy physics experiment that will be taking data soon at the Large Hadron Collider (LHC) at [http://public.web.cern.ch/ CERN].  Groups of researchers, distributed across the country and the world, will want to use data products derived from the raw data produced by collisions in the LHC to do a variety of tasks from calibrating the detector to searching for new physics.  They will want the newest data products available for anywhere from a month to a few months, after which it can be archived to make way for the next batch of data. Although all the data will be stored long term at CERN and [http://www.fnal.gov/ Fermi Lab] they would benefit greatly if this data could be made more readily available for processing on their distributed computing infrastructure, especially on the [http://www.opensciencegrid.org/ Open Science Grid]. REDDnet is the kind of resource needed to deal with the data logistics of this application.

Latest revision as of 21:04, 22 April 2009


REDDnet: Enabling Data Intensive Science in the Wide Area

Reddnetmap.gif

REDDnet (Research and Education Data Depot network) is an NSF-funded infrastructure project designed to provide a large distributed storage facility for data intensive collaboration among the nation's researchers and educators in a wide variety of application areas. Its mission is to provide "working storage" to help manage the logistics of moving and staging large amounts of data in the wide area network, e.g. among collaborating researchers who are either trying to move data from one collaborator (person or institution) to another or who want share large data sets for limited periods of time (ranging from a few hours to a few months) while they work on it. REDDnet is not designed or intended to be a replacement for reliable archival or long term personal storage and users must make separate arrangements to insure that the data they are sharing via REDDnet's "best effort" storage is also preserved independently with stronger guarantees.

One example comes from the CMS collaboration, a high energy physics experiment that will be taking data soon at the Large Hadron Collider (LHC) at CERN. Groups of researchers, distributed across the country and the world, will want to use data products derived from the raw data produced by collisions in the LHC to do a variety of tasks from calibrating the detector to searching for new physics. They will want the newest data products available for anywhere from a month to a few months, after which it can be archived to make way for the next batch of data. Although all the data will be stored long term at CERN and Fermi Lab they would benefit greatly if this data could be made more readily available for processing on their distributed computing infrastructure, especially on the Open Science Grid. REDDnet is the kind of resource needed to deal with the data logistics of this application.

Another example, from the AmericaView project, might occur in the aftermath of an earthquake in California or a Hurricane on the Gulf Coast, where researchers across the country will want access to the geospatial image data from satellites covering the affected region. For a few months after the event, this data could be uploaded to REDDnet and made available to this community with much higher levels of performance and availability.

Initially, REDDnet will deploy >700 Terabytes of distributed storage with an emphasis on scalability, speed and fault tolerance. Currently (Spring 08), there are roughly 160 TB deployed. For example, at the Supercomputing 2006 Conference in Tampa, Florida, REDDnet demonstrated sustained transfers at a rate of 10 Gigabits per second between Caltech and the convention floor. These transfers were limited by the bandwidth of the network connection. At the same conference, REDDnet demonstrated fault tolerance by striping data across thirty depots and then successfully reading the data even after turning off nine of these depots.

Research Projects Using REDDnet

  • AmericaView - Satellite remote sensing data and technologies in support of applied research, K-16 education, workforce development, and technology transfer.
  • CMS - Elementary Particle Physics at the CERN Large Hadron Collider.
  • Structural Biology - Image reconstruction of large macromolecular assemblies through a collaborative effort of Vanderbilt and Lawrence Berkeley National Laboratory researchers.
  • Retinopathy - Diabetic Eye Disease Screening in Peru and Bolivia


Collaborators

Core Institutions


Error creating thumbnail: Unable to save thumbnail to destination
Tennessee
Stephen F. Austin
Error creating thumbnail: Unable to save thumbnail to destination
Error creating thumbnail: Unable to save thumbnail to destination
N. C. State
Delaware
Vanderbilt Tennessee S. F. Austin ORNL Nevoa Networks N. C. State Delaware


Collaborating Host Institutions


USP
UERJ
Michigan
Florida
Fermilab
Caltech
São Paulo Rio de Janeiro Michigan Florida Fermilab Caltech


AMPATH
LOC
LOC
Error creating thumbnail: Unable to save thumbnail to destination
LOC
AMPATH FIU Library of Congress SDSC Stanford UCSB

Support

NSF.gif This work is supported by NSF Grant PHY-0619847 and by the Vanderbilt Center for the Americas