LEP Data Archive

$Revision: 1.14 $

News

Introduction

This document is meant as a draft for further discussions. It reflects the current understanding and issues which have been brought up by the experiments contacts (see below).

So far, only very preliminary and informal discussions with IT have taken place, as the representative was nominated only very recently; a meeting is foreseen early October.

Related Documents

From a mail from J.J.Blaising after the October Focus meeting

CERN support for LEP collaborations is organized according to the memorandum to the RB. It has been approved by the Research Board No further meeting changed this decision.

EP will continue to support the LEP LEE group (D.PLane).

Concerning the LEP museum system: In the minutes of the June Focus meeting it is written: "The chairperson asked IT to set up a small working group with one representative of each LEP experiment, a representative of EP/SFT and another one of IT. "

Up to now there are no resources available in EP division to support the implementation of such a system.

Policies on access to the experiment's data

The following is taken from various e-mails from the experiments.

Aleph

Please find the Aleph Statement on the use of archived data on http://cern.ch/aleph/alpub/archive_data.pdf

The document is now public (March 19, 2004).

As you probably know Aleph as already set up its own archiving system, an isolated system based on laptops (Monade), see http://tenchini.home.cern.ch/tenchini/Status_Archiving_6_Mar_2003.pdf

The Aleph archiving project is very close to its completion. If a common LEP museum system is going to be set up the way for us to participate would be with a clone of our present system. For technical details please contact our archiving coordinator, Marcello Maggi (Marcello.Maggi@cern.ch).

Delphi Update: A statement on DELPHI's rules for future access to DELPHI archived data as approved by the Collaboration has been received on Dec. 3, 2003. It is available as PostScript and PDF document.

In DELPHI we do not yet have a written document on our data access policy. We have discussed this once in a collaboration board meeting and will have another discussion soon.

Our present opinions are:

  1. we think that future analyses will almost for sure need detailed event information and need the data in its present format as stored on Castor. We are therefore not planning to store the data also as just 4-vectors.
  2. access to our data would be granted to outside people only in collaboration with one or more (expert) DELPHI members.
    • a) the acting DELPHI Spokesperson has the right to veto access if it is considered to be against the interest of DELPHI or science in general.
    • b) a resulting paper would also be signed by these participating DELPHI persons.
    • c) like at present, there would be an 'internal' DELPHI referee, who reads the paper as if being a journal referee.
  3. this access policy would become applicable, one year after the vast majority of the presently planned papers are published

In practice, such requests for access would have to be made through the Spokesperson. If the DELPHI computing account would remain to exist, such person(s) could then be added to this account group.

L3

L3 does not have yet a document with data access policies, nonetheless I shall be able to provide the relevant information.

The most likely scenario is one similar to the present access policy, i.e. only members of the collaboration will be granted access to the data. In the recent past we have had cases of individuals with an interest to pursue some particular search, what they have been allowed and even encouraged to do, but in collaboration with some experienced member of the collaboration.

A simplified version of the implementation of this "status quo" scenario in the access to the museum system is that our archived data (castor) and working space (/afs/cern.ch/l3) should remain accessible only to accounts of the group "xv", as they are now. New accounts in the group "xv" can of course, as now, be created after some L3 internal consultation mechanism and appropriate sponsorship by a member of the collaboration.

As for "how" the museum service should be accessible, we would think of a structure which is essentially the one a "xv" user would see now from lxplus, with castor access to the l3 data and afs work space.

Opal

Opal's policy for future access of their data is here (PDF).

General issues and questions asked to the experiments

  1. The s/w of the expts (that includes CERNLIB and other s/w which are used by the expts, not just the s/w written for the analysis) can not be ported to the future platforms. Therefore, a "museum system" needs to be set up based on an actual platform "today".
  2. Security: at some point in the future, the system will only be available locally (as security updates will not exists any longer), so access will be limited to a few selected accounts, which will have to login to the machines through a Gateway.
  3. AFS: a shared filesystem is needed, but not neccessarily AFS. Preferably, the present AFS "directory layout" should be kept for compatibility, e.g., through symlinks.
  4. Castor: Is there an interest to move all data from Castor to (shared, but local to system) disk (e.g. in 10 yrs) ? ... issue with s/w ... needs rfiod/castor system (needed anyways ?)
    "The next release of castor will no longer support the instream data conversion from ebcdic to ascii, as a rhetorical question will you please preserve the current version." (Steve)
    Do we need a Proxy/Gateway for Castor in the future (or: how to read the data in 10 yrs from now if Castor has evolved ...)
    What is the total size of data, s/w, documentation for each experiment
    Update:A meeting with the CASTOR team took place on 27 Nov. 2003, here are the minutes.
  5. GPHIGS: can this be used on a "frozen" system without a special (or any ?) license ? Answer: yes, the license is permanent.
  6. Which packages to freeze: tms, fatmen, hepdb, (opcal for OPAL) cgi-bin web tools ? (Steve)
  7. create _two_ museums in different places (risk of fire destroying one) ?
  8. Budget ? Re-use (some) LCG machines once they get replaced ? Re-use disks from the LCG farms for data store as long as the old h/w can deal with them.

Experiments' issues

The situation concerning data analysis of the various LEP experiments is very different.

OPAL and also DELPHI have still a very active analysis program for at least two or three years. On the other hand, the ALEPH collaboration will disappear next Summer at the latest, as soon as the 8 to 10 pending publications are over; there has been no general software evolution in ALEPH since 6 months, apart very minor modifications for RH 7.3. The situation for L3 is not yet known.

Therefore it seems very difficult to have NOW a _common_ archiving policy for experiments which are in such different situations !

On the other hand, all expts rely on several common services ("infrastructure") like CERNLIB, Castor, and others which are not expected to change (at least not in a non-backward compatible way) in the next 2-3 years (*** to be checked with IT ***).

Scenarios for the "museum system"

A) Set up a couple of homogeneous boxes in a corner with RedHat 7.3 and freeze the existing state of analysis s/w from all experiments. This system nevertheless needs maintenance to deal with the evolution of Castor.

B) Set up a few of boxes now for the common infrastructure components and add a few boxes set up and validated for the already "frozen" experiments. Once the activities in OPAL and DELPHI come to an end, the system will be extended for them. In this scenario, there is no need for a homogenuous system for the "experiment machines", the only requirement will be compatibility; again with the maintenance aspect for Castor.

C) Run all the museum system(s) in a virtual environment like VMware. This assumes that the environment (called "VMware" in the following for simplicity) will be available for the expected time (5-20 years), and that it will always provide a version emulating the "obsolete" hardware of today ! If there would be an open-source package, this probably could be guaranteed with some (small) effort of porting to the new platforms. However, it's not so clear if a commercial company will do (or can be persuaded/paid to do) this (and if the company will live long enough and be affordable). The big advantage of this is that the museum system will be independent of the underlying h/w.

D) In a combination of B) and C), one could envisage to start with B) and follow the development of virtual environments in the next few years. Once an adequate environment is available, the whole system could be migrated. In this context it might be useful to contact the people of the "Digital Curation Centre" who have been contacting us in the initial phase (see an email from early November 2003 and the link to their emulation project CAMiLEON)

Common issues for A) and B): The need to have a homogeneous set of boxes for compatibility. To reduce cost, a "staged" system should be envisaged, where -- as part of a regular farm upgrade -- the initial set of 5-10 boxes for the museum system will be bought, and from the same batch boxes will be added to the system once they are taken out of the farms, e.g., adding 20(-50?) boxes after three years, another 20(-50?) after four years etc. These numbers are only guesses, real numbers should be estimated based on experience with the present farms). Similar arguments for hard disks (at some point, new disks will not work in the old h/w), although some can probably be re-used from spare boxes.

Contact people


Andreas PFEIFFER
Version information: $Id: Scenarios.html,v 1.14 2004/04/13 09:12:03 pfeiffer Exp $ Last modified: Thu Apr 08 10:07:04 Europe de l'Ouest (heure d'été) 2004