Tags

Old LEPP SupportedHardware and Network Status 2004-2010


_This page contains information on issues that were under investigation in 2004-2010. If you know of or are experiencing any problems now, please contact the Computer Group.

For information on issues that were under investigation in 2011-2012, see NetworkStatusArchive2. If you know of or are experiencing any problems now, please contact the Computer Group.

Current issues are described on the page NetworkStatus

December 25, 2010 (Saturday)

LNX212 failed again around 6am. Its services were moved to a new system by 10:00am. Any systems that were accessing lnx212's filesystems when it died will need to be rebooted. Please submit a ServiceRequest with any problems or questions.

December 24, 2010 (Friday)

/nfs/linux/cleo, /nfs/linux/cern, and /nfs/cern are currently down after a failure on lnx212, and may not be available for a few days. Please submit a ServiceRequest with any questions or concerns.
  • lnx212 was recovered around 8:3pm

December 14, 2010 (Tuesday)

Accserv (lnx209) was back up and healthy by 12:15pm.

December 13, 2010 (Monday)

After three drive failures, the RAID array on accserv (lnx209) has been damaged. The system is currently up while we attempt to recover the array, but a restore from backups may be necessary. Use of accserv should be avoided where possible. Accserv serves /nfs/acc/libs and the Accelerator SVN repository.

December 2, 2010 (Thursday)

There was a campus-wide power outage last night from about 10:25 pm, December 1, until about 1:15 am, December 2. Although many of LEPP's computer services are up again, it will be many hours before they all are working. The compute farms are likely to be down for at least a day or so.

(There were three bright flashes of light and loud bangs from the East Hill substation pn Maple Avenue when the power failed.)

November 30, 2010 (Tuesday)

The CESR Online filesystem was unavailable from approximately 12:00 to 12:10pm.

November 9, 2010 (Tuesday)

ACCFS (lnx113) and lnx186c will be down from 10:00am until approximately 12:00pm. During this time, the following filesystems be unavailable:
  • /nfs/acc/user (\\accfs\user , \\samba\accuser)
  • /nfs/acc/srf (\\accfs\srf , \\samba\accsrf)
  • /nfs/acc/vacuum (\\accfs\vacuum , \\samba\vacuum)
  • /nfs/acc/temp (\\accfs\temp , \\samba\acctemp)
  • /nfs/cesrta (\\accfs\cesrta , \\samba\cesrta)
  • /nfs/acc/cesrtacam

October 26, 2010 (Tuesday)

LNS61 will be down for about a half-hour starting at 5PM today to disconnect a failing disk enclosure. We're sorry for the short notice.

September 27, 2010 (Monday)

One of the lab's main circuit breakers tripped at about 9:30 AM causing, among other things, loss of all cooling to W221. The Feynman cluster has been shut down. If some form of cooling cannot be restored, we may have to shut down the central "infrastructure" servers, too.

Partial cooling (campus chilled water) was restored at about 10:30.

Full cooling (Liebert) was restored at about 14:50.

September 10, 2010 (Friday)

At about 19:45 Thursday evening, CIT had to block traffic to off campus sites from LEPP's "protected subnet" because someone on the network has been providing access to copyrighted material. We have disabled the connection for the offending computer and have notified CIT. Hopefully they'll be restoring external access soon.

In the meantime, people can use one of the computers in the "public terminal room" to access sites off campus.

External network connectivity was restored at about 10:00 AM on Friday.

August 20, 2010 (Friday)

There will be a power outage at Newman Lab from Midnight to 4:00AM on Friday morning. Occupants of Newman Lab should please be sure to logout before you go home on Thursday evening. There is no need to turn off computers.

August 17, 2010 (Tuesday)

There will be a 15 minute outage in the connection to the campus and the rest of the world, between 9AM and 10AM, while the Lab's network connection to the campus backbone is upgraded.

August 10 - August 14, 2010 (Tuesday through Saturday)

Scheduled power outages at Wilson Lab will cause most computers to be down or unusable. For details, see Computer Power Outage: August 10-14, 2010.

March 5, 2010 (Friday)

Lnx117 (samba) is currently unavailable as the system is recovering from an automount failure.

February 6, 2010 (Saturday)

Lnx112's /nfs/cor/an2 filesystem filled up, bringing the server to a halt. Some space has now been cleared, but more must be opened up to avoid a repeat.

January 5, 2010 (Tuesday)

To fix bugs specific to SL4.5, and for compatibility with newer hardware, the LEPP Computer Group tentatively plans on upgrading all of our SL4 systems to SL4.8 starting at 4am on Tuesday, 1/5/2010. With few exceptions, most of our SL4 systems are currently running SL4.5. For hardware compatibility reasons, ilc320 and ilc321 are already running SL4.8. Please contact the Computer Group with any questions or concerns.

November 25, 2009 (Wednesday)

  • Virtual Servers are currently down (PC570A-D)
  • VPN endpoint is down.
  • The Virtual Servers and VPN endpoint are back up at 10AM.

November 24, 2009 (Tuesday)

  • A localized power glitch brought down some of the ILC farm as well as a couple CMS nodes.

November 20, 2009 (Friday)

  • webdb (ELOG), lnx156, lnx128, and part of the linux farm are down following a localized power outage. They will be back up ASAP.
    • These systems were up by 8:30pm

November 4, 2009 (Wednesday)

  • The LEPP Web Proxy seems to be having a problem.
  • Replicon is not available via the normal link. We are investigating.

November 1, 2009 (Sunday)

  • A power glitch at 2AM October 31 took some systems offline, including a CLEO data/mc disk server (lnx114) and the Unix mail/imap/smtp server (lnscu5). The disk server and mail server have been rebooted. Report any remaining problems via the usual channels.

October 26, 2009 (Monday)

  • Due to a compromised personal laptop, Cornell has blocked our protected subnet from reaching off campus, so any laptop users will be unable to browse off campus. The LEPP Computer Group is working to resolve this situation.
  • connectivity from the protected subnet was restored at approximately 2:30pm.

October 2, 2009 (Friday)

  • lnx243, lnx7107, lnx7108, and lnx7228 were down from approximately 11:00am - 12:30pm while being upgraded to SL5.

September 16, 2009 (Wednesday)

  • A test installation of Matlab 2009b can be run on lnx201 using the command, matlab -r 2009b.

September 8, 2009 (Tuesday)

  • LogMeIn made a change and upgrade over the weekend, this seems to have broken our links for Remote Access and our Virtual Windows systems. We are investigating.
  • We have found the new links and are updating our Wiki page. If you use remote access to other PCs than the public Virtual Servers, please contact service for an updated link.

September 7, 2009 (Monday)

  • lnx180c, the main Accelerator control system server, was down due to a hardware failure from approximately 10:00 to 10:50. For a clean start, other critical control system nodes were rebooted as needed up to 11:30.

August 24, 2009 (Monday)

  • LEPP experianced an approximately 10 minute power outage at 3:40PM. Most systems went down hard. Recovery will be ongoing at least through Tuesday.
    • Most desktops should re-start automatically. You may experiance a slowdown logging in as many users are logging back in. You may reboot your PC if needed to get it back up if it doesn't re-start automatically or appear to be working properly.
    • CESR Alpha/Tru64 compute nodes CESR63 and CESR73 are down.
    • Most of the Solaris farm will be down at least until Tuesday.

August 12, 2009 (Wednesday)

  • Many of the Linux RAID disk servers are down due to an overnight power glitch
    • 9AM: the Linux RAID disk servers have been recovered

July 15, 2009 (Wednesday)

  • The LEPP Wiki server (wiki.lepp.cornell.edu), serving all of LEPP's wikis, will be unavailable from 5:00 to 5:30pm EDT.
    • This maintenance is now complete. Please contact the Computer Group with any problems. The fulltext search of attachments will not work (and some delays may be seen) until a background task has finished indexing attachments.

July 10, 2009 (Friday)

  • 9:15AM - There appears to be an issue with lnx113, which affects /nfs/acc* filesystems, and SVN. More details to follow. At this time, attempts to path to directories served from lnx113 will "hang".
  • 9:40AM - lnx113 is back up.

July 6, 2009 (Monday)

  • While filesystem problems are being corrected, users will notice delays accessing /nfs/acc/user, /nfs/acc/temp, /nfs/acc/srf, /nfs/acc/cesrtacam, and /nfs/cesrta. Please see NetworkedFilesystems for quick steps that can be taken to minimize the effect of these delays.
    • This maintenance was initiated at 10:30am, and completed by 12:20pm.

June 27, 2009 (Saturday)

  • Power will be off in Newman Lab from 6AM until about 6PM. For recommendations about powering down your computer ahead of time, please see the Wiki Page PowerOutage090627

June 25, 2009 (Thursday)

  • Please report any problems with the CESR control system by submitting a service report. There are various issues with the new operating system which are being fixed as they get reported.
    • DHCP service to some laptops was unavailable until abut 3:50 Wednesday afternoon.
    • SSH access to many of the CESR computers was restored at about 9:25 Thursday morning

June 24, 2009 (Wednesday)

  • The CESR control system VMS cluster will be down from 10AM until 1AM to attempt to upgrade its operating system.

June 23, 2009 (Tuesday)

  • The CESR control system VMS cluster was down from 10AM until Noon to attempt to upgrade its operating system. Unfortunately, the attempt was unsuccessful due to failing VME software. Another attempt will be made tomorrow.

June 18, 2009 (Thursday)

  • The default mathematica version on linux is now 7.0. Previous versions can be started using the -v option, for example, mathematica -v 6.0.

June 16, 2009 (Tuesday)

  • Our mathematica license server has now been upgraded to version 7.0. Until we make mathematica version 7.0 our default version, it may be run on Linux using the command, mathematica -v 7.0. Previous versions can be run using mathematica -v 6.0, mathematica -v 5.2, etc.

June 8, 2009 (Monday)

  • At 7 AM, LNS61 is once more handling mail. USER$DISK2 has been restored to its state as of June 1, 2009. We will be contacting affected individuals about additional file recovery. (Only a handful of people are affected.)
  • At 8:30AM, Virtual systems PC570A-D and VPN19 are back up. VPN services from LEPP are restored.

June 7, 2009 (Sunday)

  • As of 1:30AM, most Windows, VMS, Tru64 and Linux systems are up, as well as "critical' Solaris servers. The Solaris compute farm will be left down until Monday.
  • One of the user disks on LNS61 has failed. LNS61's mail services will be off until files have been restored from tape. We hope that it will be up Sunday afternoon. Printing services should be unaffected.

June 6, 2009 (Saturday)

  • There was a brief power outage at Wilson and Neman Labs at about 9:30 PM Saturday. We are in the process of recovering from it. The outage also affected West Campus and the Arts Quad.

May 29, 2009 (Thursday)

  • LNX1625 (an AMD cpu) has been shut down temporarily.

May 21, 2009 (Thursday)

  • LEPP's primary Kerberos server had problems starting at about 9:15 AM. It was up again at about 10:00 AM
  • LEPP's nx server, LNX200, had problems starting at about 6:00 AM Wednesday. It was rebooted at about 2:15 PM Thursday. It seems to be OK for the moment.

May 8, 2009 (Friday)

  • Kly101 will be retired at 10:00 am. This will require approximately 30 minutes of downtime for /nfs/erl/sandbox and /nfs/erl/rf. Details of the work required can be found in the ERL Private wiki.
    • This maintenance was complete by 10:30am.

April 21, 2009 (Tuesday)

  • 1:42PM - we have completed maintenance on the virtual machines. All access should now be operational.
  • We will be taking down the Virtual Servers and VPN at 10AM for VMWare maintenance and upgrade. At that time, PC570A-D and VPN access will be unavailable.

April 20, 2009 (Monday)

  • Newman Lab is experiencing a major power distribution problem resulting in power outages. Maintenance to correct this problem will be ongoing until at least 10:00 PM. During this time, printing, networking, and the loft server will be unavailable.

April 18, 2009 (Saturday)

  • A rack of network gear at Newman Lab lost power at around 2:45 Saturday morning. Network connectivity was restored at 13:09 by plugging the rack into another power outlet.

April 11, 2009 (Saturday)

  • The Unix/Linux home disks were offline from 4:10 AM to 8:40 AM.

April 3, 2009 (Friday)

  • The CLASSE Electronic Document Management System will be unavailable starting at 10AM. For more details on this maintenance, please see EdmsUpgrade.
    • This maintenance was complete by Friday afternoon.

March 24, 2009 (Tuesday)

  • The CLEO Restricted Document Database will be unavailable starting at 10AM until later Tuesday or Wednesday.
    • This maintenance was complete by 5:30pm Wednesday 3/25.

February 24, 2009 (Tuesday)

  • The networking software of the CESR VMS Control Cluster has been upgraded from Multinet v4.4 to Multinet v5.2

February 17, 2009 (Tuesday)

  • Unfortunately, the CESR VMS Control Cluster upgrade to V8.3 was not successful. Another attempt with more limited objectives, upgrading only the network software, will be made on Tuesday, February 24.

February 11, 2009 (Wednesday)

  • The CESR Control cluster will be down starting at 10AM, Tuesday, February 17, 2009, to upgrade its operating system from VMS V7.3-2 to V8.3. Downtime is expected to be about one hour.

February 2, 2009 (Monday)

  • LNS101 is up. Please report any configuration problems.

January 28, 2009 (Wednesday)

  • LNS101 is still down due to various hardware, software and filesystem problems. Hopefully it'll be up in the next day or so.

January 26, 2009 (Monday)

  • LNS101 went down over the weekend with a failed system disk. The computer will be down most of Monday while the content of the disk is recreated from scratch. Network mapped disk access from Windows to Linux disk shares will be unavailble while lns101 is down.
    • We have configured a new server to access unix home disk shares from Windows. Users should be able to update their existing mappings from, for example \\lns101\dab66 to \\samba\home\dab66

January 23, 2009 (Friday)

  • The LEPP ELOG server will be unavailable for approximately 30 minutes starting at 10:00am. During this time, the ELOG software will be updated and moved to a more reliable server. All links will be updated to point to this new server, so all old URL's will continue to work. An update will be posted once the maintenance is complete.
    • This maintenance was complete by 10:30am. All old configurations, user accounts, and URLs should continue to work as they did. Please contact the Computer Group with any questions or concerns.

Please email service-lepp@cornell.edu with any problems.

January 14, 2009 (Wednesday)

  • We identified a failure in the new VPN server. The issue has been identified. If you cannot connect, please contact service-lepp@cornell.edu for a fix. Note that VPN214 and Hamachi are deprecated, and will be turned off soon.

January 13, 2009 (Tuesday)

  • The LEPP Wiki server, was down from approximately 4:45 to 5:00pm after a hardware glitch.

December 1, 2008 (Monday)

  • Incoming mail through the VMS Mail server (LNS61) was delayed between about 8:15 and 10:00 am this morning due to a problem in the spam scanner. That problem has been resolved. No mail was lost.

November 25, 2008 (Tuesday)

  • lnx200 and lnx201 both went down this morning at approximately 5:15. We are investigating. There is no ETA for repair, we will update here as information becomes available.
    • These nodes were back up by 10:15am.

November 21 2008 (Friday)

  • Users of our Linux Farm are now limited to having 30 jobs running at a time. For more information, please see GridEngine.

November 11, 2008 (Tuesday)

  • Our Virtual Server status page may not update properly for availability of virtual systems. We are investigating. Please try connecting to any virtual computer, only one person can be connected at a time.

November 10, 2008 (Monday)

  • At 4PM, Vault access seems to be functional for 32 bit clients. Content Center may still have issues on 64 bit computers.
  • Starting at 12:24 PM, the Vault, and specifically Content Center is unavailable. The extent of the outage, and the specific causes are not currently known. We are working on a resolution, however, there is no current ETA.

November 9, 2008 (Sunday)

  • Starting at approximately 8:30 this morning, the connection from the CESR Private subnet to the rest of the lab went down.
  • This issues was resolved by 10:45am.

October 30, 2008 (Thursday)

  • The LEPP connection to the campus network failed at about 15:00 this afternoon. We are working on the problem.
  • The main LEPP router was restored to operation at about 16:00. There are still some relatively minor problems we're working on.

October 27, 2008 (Monday)

  • At least 1000 computers on campus are infected with malware carried by USB memory sticks. See USB-BOT information for more details. Take your USB thumb drives to CIT to have them inspected. Reboot your Windows computer when requested. LEPP hopes to have a scanning station available this afternoon.
  • 10:30AM - LEPP has a mobile scanning station available. Please contact service@mail.lepp.cornell.edu if you need to have your USB devices scanned.
  • LEPP users can now test USB devices from any LEPP Linux system.
  • 3:05PM - Windows systems have had mitigation patches pushed to them. We will be contacting users who appear to have infections over the next day or so to try and resolve the issues.
  • 3:50PM - some scripts for home users running XP can be found at the bottom of this page: https://wiki.lepp.cornell.edu/lepp/bin/view/Computing/OutsideSupport

October 20, 2008 (Monday)

  • Replicon will be unavailable while it is upgraded to a newer version to handle newer browsers like Firefox 3.
  • 4PM - Replicon is now available for use with Firefox 3.

October 4, 2008 (Saturday)

  • The "public" network connection between Wilson and the LEPP trailer failed at about 14:30. Many farm and server systems were unavailable. Rich got it working again at 16:10. We will have to schedule maintenance on the central LEPP network switch, That will cause a significant outage of all LEPP networks.

October 3, 2008 (Friday)

  • The linux Grid Engine master node was down from approximately 10:00 to 11:15am after suffering from a hardware failure.

October 1, 2008 (Wednesday)

  • The LEPP Wiki server (lnx18) was down from approximately 1:10pm to 1:20pm after suffering from a hardware failure.

September 30, 2008 (Tuesday)

  • Replicon 7.5 (our Web Timesheet software) is not compatible with newer releases of Firefox. Until we are able to upgrade Replicon, alternative browsers should be used only for working with Replicon. Please use IE6 on Windows, seamonkey on Linux, and Safari on Mac OS X (Safari v3.1.2, which requires OSX v10.4 or higher).

September 27-28, 2008 (Saturday and Sunday)

  • LNX117 and its fileshares will be very heavily used by the Inventor 2009/Vault deployment. Access to other filesystems shared from LNX117 may be degraded significantly.

September 26, 2008 (Friday)

  • The Vault will be taken down for upgrades to version 2009 at 3PM. It will be accessable by Noon on Monday, September 29th.

August 28, 2008 (Thursday)

  • SSH Key based authentication: For heightened security, users can use ssh-keygen to create authentication keys for use on our unix systems. However, users must also use a passphrase when generating this key, so that even if somebody gets a copy of your private key, you will reduce the risk of having them gain access to your account. For more information, please see the US-CERT (Computer Emergency Response Team) warning for SSH Key-based Attacks on Linux-based services.

August 21, 2008 (Thursday)

  • The CESR VMS Control cluster will be down from 11AM until about 2PM to replace the shared user disk. All of the CESR 2x computers will be unavailable.
  • The CESR VMS Control cluster's user disk has been replaced. The new disk is connected as CESR29$DKB200:. The previous disk is still connected as CESR29$DKC100:.

August 20, 2008 (Wednesday)

  • Performance of the CESR VMS Control cluster will be poor while the contents of the shared user disk are copied to a new disk.

August 5, 2008 (Tuesday)

  • All Unix home disks were down from approximately 5:30pm-6:00pm after the home disk server's RAID Controller locked up.

July 25, 2008 (Friday)

  • LNX201 has replaced lnx102 as LEPP's general-use interactive machine. Please begin using lnx201 instead of lnx102.

July 22, 2008 (Tuesday)

  • At 10:30am, /nfs/axp/cleo3 will be unavailable for up to one hour while the filesystem moves to a linux RAID filesystem. Please note that this filesystem contains the CLEO cvs repository. Any processes accessing /nfs/axp/cleo3 or the CLEO CVS repository should be killed before 10am Tuesday. Updates will be posted here when this maintenance is complete.
    • This maintenance was complete by 11:00am. Any systems that were accessing /nfs/axp/cleo3 when the transfer occurred may be unable to see the new filesystem until all process trying to access the old filesystem have been killed. Alternatively, a quick reboot should be the easiest fix. If you require assistance killing processes or rebooting a system, please email service@lepp.cornell.edu and let the Computer Group know which system you are having trouble on.

June 30, 2008 (Monday)

  • /cdat/linux/tem filled around 12:00am, taking down lnx106 (and /cdat/linux/tem, /nfs/grp/syr, /nfs/grp/uk, /nfs/grp/syr, /nfs/cms/mc1, /nfs/grp/blep, /nfs/lcg/internal, /nfs/grp/ill, /nfs/grp/cesr, /nfs/grp/srf2005, /nfs/grp/blep, and /nfs/cleoc/mc6). lnx106 and its disks were restored as of 10:00am, and the appropriate users have been notified. Please notify the Computer Group of any remaining problems.

June 26, 2008 (Thursday)

  • Lnx102 and /nfs/linux/opt were unavailable from 10:50 - 10:55 after we were forced to conduct an emergency reboot of lnx102. Please report any remaining problems to service.

June 24, 2008 (Tuesday)

  • An update to freetype pushed by redhat has broken openoffice on all SL3 and SL4 systems. A workaround is now in place on lnx102, so until an official fix is released, all openoffice work should be done on lnx102.
    • the fix for this bug was pushed to all SL3/4 nodes June 25.
  • At 10:00am, /nfs/cleoc/mc15, /nfs/c3mc/mc4, and /nfs/cleoiii/data4 will be unavailable for up to one hour while the filesystems are moved from lnx117 to lnx118. Any processes accessing these filesystems should be killed before 10am Tuesday. Updates will be posted here when this maintenance is complete.
    • This maintenance was complete by 11:15am. Any systems that were accessing these filesystems when the transfer occurred may be unable to see the new filesystem until all process trying to access the old filesystem have been killed. Alternatively, a quick reboot should be the easiest fix. If you require assistance killing processes or rebooting a system, please email service@lepp.cornell.edu and let the Computer Group know which system you are having trouble on.

June 13, 2008 (Friday)

  • lnx102 exhibited lingering problems from the June 4th power glitch. The system was rebooted at 2:00pm.
  • lnx134, the CLEO libraries server (/nfs/cleo3), was also rebooted at 2:00pm.

June 4, 2008 (Wednesday)

  • Several servers in LT107 crashed suddenly around 7:00 due to a line voltage glitch. All critical disk servers were back online by 7:30pm, but the LEPP HyperNews server will be down until Thursday morning.
    • By 9:20am June 5, the LEPP HyperNews server and any remaining batch systems were back online.

June 3, 2008 (Tuesday)

  • Starting at 10:00am EDT, edms will become unavailable for approximately 10 minutes while its operating system is upgraded to Scientific Linux 5. During this time, LEPP's Indico installation (used for conference and workshop management) will also be unavailable.
    • This maintenance is complete, and edms is once again available.

May 17, 2008 (Saturday)

  • lnx106 (serving /cdat/linux/tem, /nfs/grp/syr, /nfs/grp/uk, /nfs/grp/syr, /nfs/cms/mc1, /nfs/grp/blep, /nfs/lcg/internal, /nfs/grp/ill, /nfs/grp/cesr, /nfs/grp/srf2005, /nfs/grp/blep, and /nfs/cleoc/mc6) is currently inaccessible. The Computer Group is aware of the problem and working to resolve the issue.
  • /cdat/linux/tem had filled, causing this interruption in service. lnx106 and its disks are once again available, and the appropriate users have been notified.

May 14, 2008 (Wednesday)

  • lnx235, which serves /nfs/cms/sw, will be unavailable from approximately 10:00am to 2:00pm EDT Wednesday, May 14. During this time, lnx235 will be upgraded from Scientific Linux 3 to Scientific Linux 4. A followup will be posted when this maintenance is complete, and please email the Computer Group with any questions or concerns.
  • Lnx235 has been upgraded to SL4 and is now ready for use. Before logging in, you will need to remove the lines containing "lnx235" from your ~/.ssh/known_hosts file in order to avoid receiving the "man-in-the-middle attack" warning. For more details, please see ManInTheMiddleAttack . If any systems are unable to access /nfs/cms/sw, a quick reboot should be the easiest fix. If this is not possible or you encounter a different problem, please email service@lepp.cornell.edu and to let the Computer Group know which system you are having trouble on.

May 6, 2008 (Tuesday)

  • Lnx209 (accserv), which serves /nfs/cesr/libs and our central Accelerator subversion repository, will be upgraded to SL4 and unavailable from approximately 10:00am to 2:00pm EDT Tuesday May 6. During this time, nothing can be checked in or checked out of the repository. However, users will be able to continue working with anything that has already been checked-out of the repository. A followup will be posted when this maintenance is complete, and please email the Computer Group with any questions or concerns.
  • Lnx209 has been upgraded to SL4 and is now ready for use. Before logging in, you will need to remove the lines containing "lnx209" or "accserv" from your ~/.ssh/known_hosts file in order to avoid receiving the "man-in-the-middle attack" warning. For more details, please see ManInTheMiddleAttack . If any systems are unable to access /nfs/acc/libs, a quick reboot should be the easiest fix. If this is not possible or you encounter a different problem, please email service@lepp.cornell.edu and to let the Computer Group know which system you are having trouble on.

April 30, 2008 (Wednesday)

  • LNS61, the VMS mail server, was unresponsive for much of the morning due to a flood of external failing SSH login attempts tying up all of its network resources while Selden was away at meetings. It should be working as of about 12:30PM.

April 22, 2008 (Tuesday)

  • As of 10:30AM we are doing some maitenence on the Virtual Machines PC570A-D. They will not be available till this is complete.
  • At 10:50AM, the maitenence is complete. All Virtual Machines should be accessable.

April 7, 2008 (Monday)

  • At approximately 2:45 am, the /cdat/linux/tem filesystem served from lnx106 filled and effectively hung the system. This took the following filesystems offline from approximately 2:45am until 10:45am. /nfs/grp/syr, /nfs/grp/uk, /nfs/grp/blep, /nfs/grp/ill, /nfs/grp/cesr, /nfs/grp/srf2005, /nfs/grp/blep, /nfs/cms/mc1, /nfs/cleoc/mc6, /nfs/lcg/internal . If you are still have trouble accessing any of these filesystems, a quick reboot should fix the problem. If you are unable to reboot the system, please email service@lepp.cornell.edu for assistance.

March 31, 2008 (Monday)

  • lnx209 (accserv) will be rebooted at 4:30pm EDT.

March 29, 2008 (Saturday)

  • sol501 is down with a power problem. Parts of data24, data26, data28, and data31-33 are unavailable.

March 25, 2008 (Tuesday)

  • Cornell mail system problems caused a delay in LEPP mail delivery which seems to have been fixed this afternoon.
    1. A CIT mail server handling mail coming into the campus (walnut.mail.cornell.edu) had configuation problems on Friday, causing about a half-day of delay.
    2. A CIT mail server handling both internal and external mail (hermes30.mail.cornell.edu) had a failure in its virus scanner and was not accepting mail. Its problem seems to have started Friday afternoon, was noticed by us and reported Tuesday morning and was resolved Tuesday afternoon.

March 18, 2008 (Tuesday)

  • lnx180c was down for scheduled maintenance from 10:00am to 1:00pm EDT. During this time, /nfs/cesr/[data,instr,xbsm,cbsm, cbpm,cesr_const,cesr_meas,cesr_save] were unavailable. As lnx180c's operating system was upgraded from SL3 to SL4, you will need to update your ssh known_hosts file.

March 11, 2008 (Tuesday)

  • lnx114 was down for scheduled maintenance from 10:00 to 10:30am EDT. During this time, /nfs/cleoc/{data5,data6,data7,mc7,mc9} and /nfs/c3mc/mc2 were also unavailable.

February 20,2008 (Wednesday)

  • PC570A is currently down for maintenence.
  • 12:15PM - PC570A is back up. Should be faster now.

February 12, 2008 (Tuesday)

  • 1PM: LNS180 - LNS184 are up. Feynman is still down.

February 9, 2008 (Saturday)

  • The Liebert Air Conditioner in W221 (the main computer room) is having problems. In order to reduce the heat load, the CHESS "Feynman" compute cluster has been powered down, as have the members of the Cornell DAF farm (LNS180-LNS184) We hope they'll be up late Monday or Tuesday.

January 23, 2008

  • The unix mail server is overloaded, which is having a disproportionate effect on access to large mailboxes. There is work in progress to reduce the load and eventually upgrade the current mail server, but that won't happen immediately. In addition to the steps already taken by the Computer Group, the load can be mitigated by archiving older mail in folders. Users with large folders are impacted much more than other users, and are also contribute disproportionately to the problem.

January 15,2008

  • Replicon is having some issues. Specifically, trying to save hour input gives an error that starts with:
    Server Error

    System.Data.SqlClient.SqlException: Could not allocate space for object
    Replicon will not work currently. We do not have an estimated time to fix at this time.
  • 1/15/2008 10:47AM - Replicon is now functional.

January 7, 2008

  • PC570a is currently down. We are working on restoring full functionality.
  • 1/8/08 - 11AM: We have restored PC570a, however it is currently not available for use. We are currently testing the system.
  • 2/10/2008 - PC570a is now available for use via LogMeIn for testing. Please follow the instructions on the VNCStatus WikiPage. Please report any issues you have to service@mail.lepp.cornell.edu.

December 9, 2007 (Sunday)

  • The Computer Group is currently working to fix a problem with our Replicon Web TimeSheet, and will send a followup when it has been resolved. Until then, users will not be able to access the Web TimeSheet.
    • This issues was resolved as of Tuesday, December 11. If you have trouble accessing the timesheet, you may need to clear your web browser's cache.

November 30, 2007 (Friday)

  • LNS61 had to be rebooted at about 9:30 AM to recover a disk which was not responding. Mail and print jobs were delayed for about 15 minutes.

November 20, 2007 (Tuesday)

  • The main farm subnet switch will be upgraded starting at 9:00am. Any running jobs will pause and are likely to crash during this upgrade. We hope to have the upgrade complete Tuesday afternoon, at which point any failed jobs may be resubmitted as needed.

October 26, 2007 (Friday)

  • The mail system on LNS61 will be down from 6PM until 7PM this evening while some user directories are being moved from USER$DISK1: to USER$DISK4:. Mail delivery will be delayed. No mail will be lost. We're sorry about the short notice, but space on USER$DISK1 has become critically short.

October 23, 2007 (Tuesday)

  • At approximately 6:25am a UPS failure in our Computer Room brought down our unix home disk server in addition to various infrastructure servers and switches. Service was restored by 9am, although various repercussions from the failure are still being felt. If you experience any difficulty accessing a home disk area, please send an email to service.

October 17, 2007 (Wednesday)

  • LNS61, the VMS mail and print server which provides spam scanning and print services, will be down for about two hours starting at about 6PM this evening. Hopefully it'll be available by 8PM. No incoming mail will be lost: it'll be held temporarily on the LEPP Unix mail server.

LNS61 has been having problems over the past week as a side effect of the system disk having filled up at one point: it keeps running out of file headers despite frequent deletions. The system disk is currently being copied to a larger disk drive which will also have a much larger index file. Downtime will be required this evening in order to copy the mail database and quarantined spam messages. We're sorry for the short notice, but the situation has been getting worse.

  • LNS61 was up by about 7:10PM. As of 8:15PM things still seem to be running smoothly. We can hope that it'll stay that way...

September 5, 2007 (Wednesday)

  • From about 11:45 until about 12:15, Wilson Lab's connection to most of the campus and to off campus networks was down. According to the notice from CIT, "All subnets fed out of the CCC router have no network connectivity at this time. CIT network engineers are currently working to resolve this problem."

August 30, 2007 (Thursday)

  • LNS61 mail service had crashed. It is now up, with a large backlog. Mail should be available.
  • LNS61 is currently not accepting IMAP connections. If you use LNS61 for e-mail, your e-mail will not be accessible.

August 16, 2007 (Thursday)

  • Many LEPP computing services (including web and mail servers) were down starting at about 3:20PM due to an electrical problem in Wilson 221. Central LEPP servers were up again by 4:00 PM.

August 5, 2007 (Sunday)

  • Many central Cornell computing services will be unavailable on Sunday, August 5, 2007, from 5AM until 2PM while their network hardware is upgraded. This includes Mail, Cornell University Library online services, CUinfo, Blackboard, the Electronic Directory, Cornell mailing lists, PeopleSoft, and many others. For details see http://www.cit.cornell.edu/computer/news/articles.html#aug5outage

  • LEPP computing services, including Mail, should be unaffected. Mail addressed to mail.lepp.cornell.edu or lepp.cornell.edu will be delivered as usual. Mail addressed to cornell.edu will be delayed until after Cornell's central services are available again.

July 20, 2007 (Friday)

  • Oracle Calendar is down. It's expected up time is around 1pm. CIT is restoring the calendar program from it's 3am backup.

July 20, 2007 (Friday)

  • The w320_xrx_5500 printer is up.

July 17, 2007 (Tuesday)

  • The w320_xrx_5500 printer is down. Service has been called and they should be here Thursday morning. I will try to deploy a second printer Wednesday morning for third floor printing.

July 17, 2007 (Tuesday)

  • Replicon is up. Thanks for your patience.

July 16, 2007 (Monday)

  • Replicon is down. We are aware of the issue and are working to resolve it. A notice will be posted here when it is back up.

July 10, 2007 (Tuesday)

  • All Scientific Linux 4 systems are updated to Scientific Linux 4.5. SupportedHardware will need to be rebooted (at the user's convenience) to pick up the latest kernel. This will primarily affect most linux desktops.

July 1, 2007 (Sunday)

  • LEPP's primary kerberos server was down (until Monday morning) and caused delays with various authentication services.

June 27, 2007 (Wednesday)

  • An nfs lockd problem on our home disk server is preventing linux (at least) clients from obtaining locks. We're looking into ways to correct this. If there's no better solution, we'll have to reboot the home server.
  • The LEPP Unix home disk server was rebooted at 5PM Wednesday because things were continuing to get worse. Unfortunately, it did not solve the problem. Unix and Linux systems are still unusable. We are continuing to investigate. We apologize for this serious inconvenience.
  • By 8:30pm, the Unix home disk server was back in service.

June 25, 2007 (Monday)

  • The primary LEPP Kerberos server was once more functional as of 11 AM and Pine is once more working on Tru64 systems. The network switch port it was plugged into had failed. It is now plugged into a different port.

June 23, 2007 (Saturday)

  • The primary LEPP Kerberos server is down. Unix and Linux logins will be slow. This also causes Pine to fail on Tru64 systems, which are computers with names that start with LNS. Pine does work on Linux systems, so you can login on LNX102 to run Pine to read your mail. Theorists should use LNXTH3.

June 6, 2007

  • lnx108 was unavailable from 10:00-10:30am for scheduled system maintenance.

May 28, 2007

  • lnx108 was recovered, and maintenance has been scheduled for June 6.

May 27, 2007

  • lnx108 became unavailable at approximately 12:00am. Until the system is recovered, the following filesystems will be unavailable: /cdat/linux/tem2, /nfs/lcg/catalogs, /nfs/cleoc/data4, /nfs/grp/etab, /nfs/grp/reu, /nfs/cleoc/mc1, /nfs/cleoc/mc5 .

May 16, 2007 (Wednesday)

  • Monday's computer outage schedule has been finailzed to the extent that we can. Please read PowerOutageSummary2007.

May 10, 2007

  • There will be scheduled power outages in Wilson Lab starting at 8 AM on Monday, May 21st. Building power will be off starting at 10 AM and back on before 10:30. Trailer and chilled water power will be off starting at 10:30 AM and back on before 11 AM. Details of computer outages will be added shortly at PowerOutageSummary2007.

April 24, 2007

  • The webserver is back up.

April 23, 2007

March 28, 2007

LNS101 was down for about 30 minutes at Noon to replace its external hot-swap disk enclosure. The enclosure's fan had failed and the disks were cooking.

March 27, 2007

LNS61 and LNS62 were down from about 6:30 until 7:00 pm. Updated files were copied to the new system disk while the system was up, then the system was shutdown and the disk replaced. There were some printer queue problems after the reboot, but they seem to have been resolved.

March 26, 2007

LNS61 and LNS62 will be down tomorrow, Tuesday, March 27, from 5PM until approximately 7PM EDT for system software upgrades. Printers will be unavailable and some mail will be delayed until LNS61 is up again.

March 11, 2007

Daylight Saving Time Change - With the Daylight Saving Time change, various interruptions and anomalies may be expected. The computer group has made every effort to ensure systems are properly updated, where possible relying on vendor-supplied patches.
  • All SL4, RH9, and RH7.3 linux systems properly updated to the new DST. SL3 and RHEL3 systems required manual intervention that went into effect on Monday, March 12.

March 2, 2007

9AM: The LEPP Web server was down overnight due to a disk failure. Files on the Web server itself which were updated during the past week will have to be updated again by their authors: the most recent backup was done on February 23rd. (The backup schedule for the Web server now has been changed to be done daily.) Files in personal public_html directories (in ~ areas) and/or on the Wiki server were not affected.

March 1, 2007

6PM: The following systems have been moved:
  • LNS133
  • LNSVA2
  • LNS526 (Kerberos server)
  • LNSCU5 (Mail server)
  • LNSCU7 (News server)
  • LNS123 (Xterm server)

LNSTH1, LNSTH2 and LNS107 will be moved on Friday.

LNS111 was down for a while due to an intermittant power cable.

February 28, 2007, et seq.

Several systems, including some central LEPP servers, will have to be moved in the next few days, which will require some downtime. The precise schedule has yet to be determined.
  • LNS133
  • LNSTH1, LNSTH2
  • LNSVA2
  • LNS526 (Kerberos server)
  • LNSCU5 (Mail server)
  • LNSCU7 (News server)
  • LNS123 (Xterm server)
  • LNS107 (Tape server)

February 23, 2007

Lnx111 will be unavailable from 10-11am EST. During this time, /nfs/cleoc/data[5,6,7] and /nfs/cleoc/mc7 will move from lnx111 to lnx114. A notice has been sent to the CompEnv CLEO HyperNews forum.
  • This maintenance is complete; /nfs/cleoc/data[5,6,7] and /nfs/cleoc/mc7 were available by 11:00am EST.

February 16, 2007

Maintenance is scheduled for lnx106 from 10:00 to 11:00am EST. A notice was sent to everyone owning a top-level directory on an affected filesystem and to the CompEnv CLEO HyperNews forum.
  • The scheduled maintenance is complete, and lnx106 was back up by 10:30am EST.

February 6, 2007

Lns61 and LNS62 will be down from 5PM until approximately 7PM EST for system software upgrades.

January 19, 2007

Maintenance is scheduled for lnx107 from 10:00 to 11:00am EST. A notice was sent to everyone owning a top-level directory on an affected filesystem and to the CompEnv CLEO HyperNews forum.
  • The scheduled maintenance is complete, and lnx107 was back up by 10:30am EST.

January 3, 2007

LNS61 hung due to disk I/O problems at about 7:30 PM or so. After cycling its power and rebooting, it seems to have recovered.

December 29, 2006

Replicon is unavailable until 8am December 30 for schedule maintenance.

December 22, 2006

  • A physical volume failure (lnx112) requires emergency maintenance. lnx112 and its disks (/nfs/cor/an1, /nfs/cor/an2, /nfs/cor/temp, /nfs/cor/user, /nfs/cleoc/mc8) will be unavailable until further notice.
    • lnx112 and its filesystems have been restored. In total, the system was unavailable from 12:45pm-1:30pm EST.

November 12, 2006

  • Critical systems in LT107 lost power for 30 minutes around 7 am; many systems needed to be rebooted afterwards
    • Unix/Linux home disk service was restored at 8:20 am
    • Access to cleo3/c solaris releases was restored at 10 am

November 9, 2006

  • Hardware problems on the primary Kerberos KDC caused authentication problems from 6:30 to 8 am. Services affected were Unix/Linux logins, restricted access web pages, and the Unix mail server smtp/imap/mail.lepp.cornell.edu.

September 13, 2006

  • LNS62 no longer will accept mail delivery from outside the lab on its port 25 (SMTP). It's a slow VAX and the flood of SPAM was getting to be too much for it to handle. Mail addressed to LNS62 is being automatically redirected to LNS61 for delivery. There should be no noticable difference to anyone using the VMS mail servers.

September 6, 2006

  • lnx107 and its disks were unavailable from 10:00 - 11:00am. This downtime was necessary to replace a failed system disk.

August 22, 2006

  • /nfs/cesr/temp was unavailable from 10:00 - 10:15am. During this time, the volume serving /nfs/cesr/temp was moved to a new server.

July 31, 2006

  • There will be a power outage at Newman Lab on Thursday, August 3rd, from 2AM to 8AM.

July 28, 2006

  • Lnx108 was down from 10:45 to 11:00 am to recover a failed cpu.

July 20, 21 & 24 (Thursday, Friday and Monday)

  • ~5AM to 7AM -- Cornell network switches will have their firmware upgraded. Connections to LEPP may be intermittant on any of these mornings.

July 7, 2006 (Friday)

  • ~2PM until 3:15PM -- Cornell routing problems again. There was no connection to the outside world.

June 29, 2006 (Thursday)

  • lnx754 (lnxcon - the CLEO Constants server) has crashed three times over the past 24 hours from apparent hardware failures. At approximately 12:00, we moved lnx754's disks into a new chassis. The system is currently up and healthy.

June 26, 2006 (Monday)

  • 3 PM: Cornell network problems. No or intermittant connection to outside world for much of the afternoon.
  • 6:15 PM: Cornell network problems resolved
  • Not all Farm systems up yet.
  • A few systems still down due to disk and other failures.
  • lnx6166 and lnx162[1-3] were upgraded to SL3.

June 24, 2006 (Saturday)

  • 7:30 AM: Wilson Lab without power: no network
  • 1:15 PM: power restored
  • 4PM: critical servers up

June 23, 2006 (Friday)

  • 6 AM: many systems shut down in preparation for power outage
  • LEPP & CLEO traliers without power. Wilson Lab with limited chilled water. Unix home disk down, only critical servers up.
  • 3 PM: power restored.

June 15, 2006

  • More information about the impact of the scheduled power outage is available on the page PowerOutageSummary

June 9, 2006

  • Power outages are scheduled for the week of June 19-24 while work is performed on the transformer pad. The LEPP and CLEO trailers will be without power on Friday, June 23rd: the Unix home disk and compute farms will be unavailable. Wilson Lab will be without power on Saturday, June 24th: all compute services will be unavailable. Additional information will be available next week.

May 31, 2006

  • LEPP Nameservers were having problems this afternoon. This caused some off-campus systems to be unreachable. The problem was resolved at about 14:30 (a firewall configuration problem)
  • wiki.lepp.cornell.edu was unavailable from 16:00 - 18:00 EDT for scheduled maintenance.

May 12, 2006

  • lnx180c was down for scheduled maintenance from 10:15 to 10:30am

May 11, 2006

  • A power controller tripped and the Linux Raid servers went down at around 10:50 AM EDT.
  • It seems the power controller couldn't support an additional new server. The new server was removed and all Linux Raid servers have been rebooted.

May 10, 2006

  • The LEPP network experienced problems between 1:30 and 2:00 PM EDT. This issue has been resolved.
  • Cornell's central router (and thus connection to the commodity Internet) was down for scheduled maintenance intermittantly between 5 and 7AM EDT. [added 22May06] Additional info is available in the Usenet newsgroup cornell.announce.networks

May 1, 2006

  • Cornell's connection to the commodity Internet was down for several hours due to problems with Cornell's central router. [added 22May06] Additional info is available in the Usenet newsgroup cornell.announce.networks

August 17, 2004

  • The RAID fileserver has been restored to a stable and "protected" state, and all filesystems are once again available. We are still investigating the symptoms of this failure with multiple hardware manufacturers.

August 12, 2004

  • The following filesystems are not available:
    • /cdat/linux/tem
    • /nfs/grp/blep
    • /nfs/grp/cesr
    • /nfs/grp/ill
    • /nfs/grp/syr
  • The RAID fileserver holding them is having severe hardware problems. We are in contact with the manufacturer but have no idea when the problem will be resolved.

June 1, 2004

  • Many "non-critical" systems in Wilson 221 have been shut down again. Only the "central servers" are running. Air conditioning has had to be shut down because of leaks in the 85 degree Experimental water system. We hope it will be possible to turn systems on again this afternoon.

May 27, 2004

  • The NFS locking problem which was effecting graphical linux logins has been resolved. A patch was applied at 11:30 this morning which fixed the problems we've been seeing since the power-outage. Please contact the computer group with any remaining issues.

May 26, 2004

  • We continue to have problems with our linux desktops. All linux systems running RedHat Enterprise Linux and RedHat 9 are experiencing severly long delays when starting (or logging into) the graphical interface.
  • Water flow has been restored to the air conditioner in W221. Most systems have been turned back on.

May 25, 2004

  • All linux systems running RedHat Enterprise Linux and RedHat 9 are experiencing severly long delays when starting (or logging into) the graphical interface. This appears to be a locking problem with our homes disk server and is under investigation.
  • sol301 hung
  • lns717 has died along with utb tape mounting

-- SeldenBall - 26 Oct 2007
Topic revision: r9 - 15 Mar 2020, DevinBougie
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CLASSE Wiki? Send feedback