Proposal for the LEP Data Archive “museum system”

 

Version 1.0

 

Author : Andreas Pfeiffer, PH/SFT

Last Change: 4 Sep. 04

 

 

1.     Introduction. 3

2.     The initial (prototype) setup. 3

3.     Changes to the Configuration. 4

4.     Responsibles from the experiments and IT. 4

5.     Security aspects. 4

6.     Time table. 5

7.     Location of the “museum system”. 5

Appendix A: Scenarios. 5

a)     News. 5

b)     Introduction. 5

c)     Related Documents. 5

d)     From a mail from J.J.Blaising after the October Focus meeting. 5

e)     Policies on access to the experiment's data. 6

f)      General issues and questions asked to the experiments. 7

g)     Experiments' issues. 8

h)     Scenarios for the "museum system" 8

i)      Contact people. 9

Appendix B: Castor issues. 9

a)     Initial meeting on 27 Nov. 2003. 9


Change history

 

Author, Date

Version

Reason for change

AP, 13-Apr-04

0.1

Added change history table

AP, 4-Sep-04

1.0

Moved to version 1.0 after including response from IT, updated contact list

 

 

 


 

1.    Introduction

This document describes the setup of hardware and software for the system intended for the long-term archiving of the LEP data (“museum system”) at CERN. The document describing the support for the LEP collaborations on this matter is the memorandum to the RB available at http://cern.ch/Committees/RB/RC.html, it has been approved by the Research Board (see http://cern.ch/Committees/RB/RBMinutes157.html), and no further meeting changed this decision. A small working group was set up in late 2003 to define the system. More information on the issues discussed is available in appendices A and B.

 

2.    The initial (prototype) setup

In various discussions with the responsibles from the experiments and IT, the following proposal for the configuration of the initial prototype is made:

 

 

 

 

A Gateway machine is used to control access to the worker nodes in the archive system, the same machine is used to access the data (in Castor) from within the archive network. The Gateway may later for performance reasons be split into an access gateway and a Castor gateway machine. The OS on this machine will be of a “standard” IT farm type to minimize maintenance issues and security problems, however there will be clients for the “old” system (e.g., the ssh client, potentially “old” Castor clients) available on this machine.

 

All machines inside the museum system network see only the internal network (preferably on a non-routable private IP space) and will access the data in Castor through the Gateway. A (NFS) file server (with RAIDed disks) will hold the various s/w packages needed by the experiments; common (e.g., CERNLIB) as well as experiment specific packages. The whole s/w tree will “mirror” the present (on lxplus) file hierarchy (“/afs/cern.ch/…) to minimize changes needed in the existing experiment related s/w. The present password/group files for the experiments will be copied over and become “frozen” with updates to be done by the responsible manager on request of a small number of people per experiment (see the section on “Changes to the Configuration” below).

 

A user of the system therefore has to first login on the Gateway machine using his “normal” credentials (PLUS), then connects via ssh to (one of the) worker nodes (initially there will be no load-balancing automatism, this may be added later). He/she can then start working as today on lxplus, accessing the data remotely. To ease management, the worker nodes could use the Gateway for authentication (security issue ???).

 

3.    Changes to the Configuration

 

The foreseen changes to the configuration of the museum system can be categorized in two parts: changes to the s/w configuration affecting the physics s/w and changes which do not affect the physics s/w. The latter comprise for example additions/deletions to the list of users allowed to access the system (i.e., additions to /etc/passwd and /etc/group). This will be initiated by the corresponding body of the experiment and the “system manager in charge” will do and log the changes.

 

Changes of the configuration affecting the physics s/w are for example necessary updates to the s/w on the Gateway to adapt to changes in the Castor s/w (e.g., proxies in case the protocol changes). Any of these changes has to be announced and discussed with all the (bodies of the) experiments in advance (how much ???) and validation checks have to be performed after introducing the change.

 

4.    Responsibles from the experiments and IT

 

The list of people responsible for the project from the experiments and IT is the following:

 

Aleph

 

Delphi

 

L3

 

Opal

 

IT

 

 

The “system manager in charge” will be appointed by the IT responsible.

 

5.    Security aspects

The machines comprising the “museum system”, with the exception of the Gateway machine(s) will run a “frozen” OS (either CERN RedHat 7.3.3 or (slightly) newer) in order to allow the experiments s/w to be used unchanged whenever needed in the future. This has severe impact on security as patches to these machines will not be possible. To minimize potential problems, the whole “museum” will be on a network of “private” (non-routable) IP addresses, accessible only through the Gateway machine with up-to-date OS.

 

6.    Time table

An initial prototype with a Gateway, one (?) NFS File Server and two (?) worker nodes will be set up by IT by end May 2003 (is earlier realistic/possible ?). Once this prototype is available, the resonsibles from each experiment will run a validation suite and create a reference data set for future (regular) validations. After validation of all experiments, new nodes will be added and validated once the machines become available (i.e., after their retirement from the IT computing farms).

 

7.    Location of the “museum system”

The “museum system” will be located in the Computer Centre (B513) (???; more specific ???)

 

 

Appendix A: Scenarios

This refers to $Revision: 1.14 $of the “scenarios.html” document.

a)     News

  • March 20: added Aleph's statement on data access
  • March 12: added Opal's statement on data access, added minutes of Castor meeting (Nov 03)
  • March 10: added Delphi's statement on data access

b)    Introduction

This document is meant as a draft for further discussions. It reflects the current understanding and issues which have been brought up by the experiments contacts (see below).

So far, only very preliminary and informal discussions with IT have taken place, as the representative was nominated only very recently; a meeting is foreseen early October.

c)     Related Documents

d)    From a mail from J.J.Blaising after the October Focus meeting

CERN support for LEP collaborations is organized according to the memorandum to the RB. It has been approved by the Research Board No further meeting changed this decision.

EP will continue to support the LEP LEE group (D.PLane).

Concerning the LEP museum system: In the minutes of the June Focus meeting it is written: "The chairperson asked IT to set up a small working group with one representative of each LEP experiment, a representative of EP/SFT and another one of IT. "

Up to now there are no resources available in EP division to support the implementation of such a system.

e)     Policies on access to the experiment's data

The following is taken from various e-mails from the experiments.

Aleph

Please find the Aleph Statement on the use of archived data on http://cern.ch/aleph/alpub/archive_data.pdf

The document is now public (March 19, 2004).

As you probably know Aleph as already set up its own archiving system, an isolated system based on laptops (Monade), see http://tenchini.home.cern.ch/tenchini/Status_Archiving_6_Mar_2003.pdf

The Aleph archiving project is very close to its completion. If a common LEP museum system is going to be set up the way for us to participate would be with a clone of our present system. For technical details please contact our archiving coordinator, Marcello Maggi (Marcello.Maggi@cern.ch).

Delphi

Update: A statement on DELPHI's rules for future access to DELPHI archived data as approved by the Collaboration has been received on Dec. 3, 2003. It is available as PostScript and PDF document.

In DELPHI we do not yet have a written document on our data access policy. We have discussed this once in a collaboration board meeting and will have another discussion soon.

Our present opinions are:

  1. we think that future analyses will almost for sure need detailed event information and need the data in its present format as stored on Castor. We are therefore not planning to store the data also as just 4-vectors.
  2. access to our data would be granted to outside people only in collaboration with one or more (expert) DELPHI members.
    • a) the acting DELPHI Spokesperson has the right to veto access if it is considered to be against the interest of DELPHI or science in general.
    • b) a resulting paper would also be signed by these participating DELPHI persons.
    • c) like at present, there would be an 'internal' DELPHI referee, who reads the paper as if being a journal referee.
  3. this access policy would become applicable, one year after the vast majority of the presently planned papers are published

In practice, such requests for access would have to be made through the Spokesperson. If the DELPHI computing account would remain to exist, such person(s) could then be added to this account group.

L3

L3 does not have yet a document with data access policies, nonetheless I shall be able to provide the relevant information.

The most likely scenario is one similar to the present access policy, i.e. only members of the collaboration will be granted access to the data. In the recent past we have had cases of individuals with an interest to pursue some particular search, what they have been allowed and even encouraged to do, but in collaboration with some experienced member of the collaboration.

A simplified version of the implementation of this "status quo" scenario in the access to the museum system is that our archived data (castor) and working space (/afs/cern.ch/l3) should remain accessible only to accounts of the group "xv", as they are now. New accounts in the group "xv" can of course, as now, be created after some L3 internal consultation mechanism and appropriate sponsorship by a member of the collaboration.

As for "how" the museum service should be accessible, we would think of a structure which is essentially the one a "xv" user would see now from lxplus, with castor access to the l3 data and afs work space.

Opal

Opal's policy for future access of their data is here (PDF).

f)       General issues and questions asked to the experiments

  1. The s/w of the expts (that includes CERNLIB and other s/w which are used by the expts, not just the s/w written for the analysis) can not be ported to the future platforms. Therefore, a "museum system" needs to be set up based on an actual platform "today".
  2. Security: at some point in the future, the system will only be available locally (as security updates will not exists any longer), so access will be limited to a few selected accounts, which will have to login to the machines through a Gateway.
  3. AFS: a shared filesystem is needed, but not neccessarily AFS. Preferably, the present AFS "directory layout" should be kept for compatibility, e.g., through symlinks.
  4. Castor: Is there an interest to move all data from Castor to (shared, but local to system) disk (e.g. in 10 yrs) ? ... issue with s/w ... needs rfiod/castor system (needed anyways ?)
    "The next release of castor will no longer support the instream data conversion from ebcdic to ascii, as a rhetorical question will you please preserve the current version." (Steve)
    Do we need a Proxy/Gateway for Castor in the future (or: how to read the data in 10 yrs from now if Castor has evolved ...)
    What is the total size of data, s/w, documentation for each experiment
    Update:A meeting with the CASTOR team took place on
    27 Nov. 2003, here are the minutes.
  5. GPHIGS: can this be used on a "frozen" system without a special (or any ?) license ? Answer: yes, the license is permanent.
  6. Which packages to freeze: tms, fatmen, hepdb, (opcal for OPAL) cgi-bin web tools ? (Steve)
  7. create _two_ museums in different places (risk of fire destroying one) ?
  8. Budget ? Re-use (some) LCG machines once they get replaced ? Re-use disks from the LCG farms for data store as long as the old h/w can deal with them.

g)    Experiments' issues

The situation concerning data analysis of the various LEP experiments is very different.

OPAL and also DELPHI have still a very active analysis program for at least two or three years. On the other hand, the ALEPH collaboration will disappear next Summer at the latest, as soon as the 8 to 10 pending publications are over; there has been no general software evolution in ALEPH since 6 months, apart very minor modifications for RH 7.3. The situation for L3 is not yet known.

Therefore it seems very difficult to have NOW a _common_ archiving policy for experiments which are in such different situations !

On the other hand, all expts rely on several common services ("infrastructure") like CERNLIB, Castor, and others which are not expected to change (at least not in a non-backward compatible way) in the next 2-3 years (*** to be checked with IT ***).

h)    Scenarios for the "museum system"

A) Set up a couple of homogeneous boxes in a corner with RedHat 7.3 and freeze the existing state of analysis s/w from all experiments. This system nevertheless needs maintenance to deal with the evolution of Castor.

B) Set up a few of boxes now for the common infrastructure components and add a few boxes set up and validated for the already "frozen" experiments. Once the activities in OPAL and DELPHI come to an end, the system will be extended for them. In this scenario, there is no need for a homogenuous system for the "experiment machines", the only requirement will be compatibility; again with the maintenance aspect for Castor.

C) Run all the museum system(s) in a virtual environment like VMware. This assumes that the environment (called "VMware" in the following for simplicity) will be available for the expected time (5-20 years), and that it will always provide a version emulating the "obsolete" hardware of today ! If there would be an open-source package, this probably could be guaranteed with some (small) effort of porting to the new platforms. However, it's not so clear if a commercial company will do (or can be persuaded/paid to do) this (and if the company will live long enough and be affordable). The big advantage of this is that the museum system will be independent of the underlying h/w.

D) In a combination of B) and C), one could envisage to start with B) and follow the development of virtual environments in the next few years. Once an adequate environment is available, the whole system could be migrated. In this context it might be useful to contact the people of the "Digital Curation Centre" who have been contacting us in the initial phase (see an email from early November 2003 and the link to their emulation project CAMiLEON)

Common issues for A) and B): The need to have a homogeneous set of boxes for compatibility. To reduce cost, a "staged" system should be envisaged, where -- as part of a regular farm upgrade -- the initial set of 5-10 boxes for the museum system will be bought, and from the same batch boxes will be added to the system once they are taken out of the farms, e.g., adding 20(-50?) boxes after three years, another 20(-50?) after four years etc. These numbers are only guesses, real numbers should be estimated based on experience with the present farms). Similar arguments for hard disks (at some point, new disks will not work in the old h/w), although some can probably be re-used from spare boxes.

i)       Contact people

  • Jacques.Boucrot@cern.ch (ALEPH)
  • Joel.Closier@cern.ch (ALEPH)
  • Maria.Kienzle@cern.ch (L3)
  • Steve.O'Neale@cern.ch (OPAL)
  • Thorsten.Wengler@cern.ch (OPAL)
  • Richard.Gokieli@cern.ch (DELPHI)
  • Tony Cass@cern.ch (IT)

 

Appendix B: Castor issues

This refers to the $Revision: 1.1 $ of the “Castor.html” document”  

a)     Initial meeting on 27 Nov. 2003

These are the minutes of the meeting on Castor for the LEP Data Archive (27-Nov-2003; Olof Barring, Jean-Damien Durand, AP)

In an initial discussion on the problem of how to access the data from the LEP experiments as stored in Castor from the LEP Data Archive (aka "museum system"), the following issues were raised:

  • it is assumed that Castor will be available and maintained during the whole life of the LEP Data Archive. Even if at some instant the "mainstream" data storage system at CERN will be replaced and Castor "frozen" (and therefore become a part of the "museum system"), maintenance on the hardware and media migration has to continue.
  • on a less critical level, it is expected that the underlying transport protocols (TCP/IP version 4) will be available and compatible during the life-time of the museum (at least through a proxy/gateway).
  • it is preferrable to have a "proxy" system to translate the present "old" rfio protocol to any newly developed protocol used in castor. This avoids having to maintain the additional hardware for a "frozen" version of Castor. It is expected that some effort for maintaining this proxy/gateway will be needed in the future for the whole life-time of the museum system.
  • it needs to be clarified which of the protocols are in fact used by the expts: only rfio (the preferred solution) or also the stager and/or nameserver protocols. This has an impact on the amount of maintenance work for the future, although it might be possible to replace the stager and/or nameserver by a "dummy" implementation and use only the rfio protocol (with a potentially small performance drop).
  • as the TMS system is going to be phased out this year, and therefore not available in the museum system, the experiments have to make sure they don't need this in their s/w (not even "hidden" in special parts).
  • on a side-issue, it needs to be verified that there is always a ssh client available (and ported) on the firewall/gateway machine to allow (indirect) remote access to the museum system, even when the "mainstream" ssh becomes incompatible (e.g., due to security issues).

Concerning the time-planning it is foreseen to set up a prototype "museum system" in the first half of 2004; clarification of the outstanding open issues with the experiments and IT should be done by end Jan 2004.

 

Appendix C: Response from IT

On September 2nd the following response from IT was received from Tony Cass:

 

This was a timely message as I have discussed the issue with Wolfgang recently. The manpower required to establish a museum service has been reviewed and the needs can be met from existing IT resources. However, there are some points to consider with respect to your memo. Key points are:

  • we would not reproduce the /afs/cern.ch/user tree within the nuseum system, nor maintain the existing user entries in /etc/passwd. Only a limited number of dedicated user accounts (e.g. aleph1, aleph2, ...) would be available in the cluster.
  • a key element of the museum cluster is the gateway to allow castor requests from the cluster to be transferred to the production CASTOR services outside and for the data to be passed back. Development of this gateway cannot start until after the release of the new CASTOR software. We do not expect this gateway to be available before Easter 2005.
  • special installation procedures are needed for the museum cluster nodes. Developing these procedures can start only after the completion of the migration to SLC3, so a minimal cluster (without any access to data) would not be available before mid-February 2005.

 

If we are to go ahead with the setup of the cluster, we would clearly need commitment from the LEP experiments to test the environment extensively.