Tuesday, August 9, 2011

Solaris: Maintenance Mode

Sometimes a solaris system will not boot all the way up into multiuser mode because some service has entered into a degraded state.  This can be very frustrating since you more then likely will be able to ping the host, and for all practical purposes it's up, but its essentially in single user mode until the problem is properly resolved.  So break out the crash cart, or KVM, and login, and we'll look at how to properly troubleshoot it.


Note: Sun Support has told me that sudo is not supported (and while Oracle now owns Solaris, it will always be SUN to me).  sudo works just fine, always has, but it is understood that it is not the desired way of doing things (a topic for later discussion).


I may update this example later, but for now this one is readily available.  I'm fairly certain that upon reboot that this example would not prevent the system from coming all the way up, but the diagnosis would be about the same.  As it goes,  I was editing the sshd_config file on a system, and as is my habit with such changes, I wanted to validate the changes before logging off.  What I found out was that no one could log in to the system.  I couldn't understand it, as the change I made was just adding authorization for logins to be tied to another group (e.g., AllowGroups).  I removed the entry I had added, but was frustrated to find that the problem persisted.  After a few other changes to the file, I determined that the conf file was not the problem, and decided to see what the state of the service was


$ sudo svcs status ssh
STATE          STIME    FMRI
online         Jul_20   svc:/network/nfs/status:default
maintenance     9:50:12 svc:/network/ssh:default


Aha!  Maintenance mode.  So what do you do now?  Well lets get some details:


$ sudo svcs -xv status ssh
svc:/network/nfs/status:default (NFS status monitor)
 State: online since July 20, 2011 09:44:45 AM CDT
   See: man -M /usr/share/man -s 1M statd
   See: /var/svc/log/network-nfs-status:default.log
Impact: None.

svc:/network/ssh:default (SSH server)
 State: maintenance since July 28, 2011 09:50:12 AM CDT
Reason: Start method failed repeatedly, last exited with status 255.
   See: http://sun.com/msg/SMF-8000-KS
   See: man -M /usr/share/man -s 1M sshd
   See: /var/svc/log/network-ssh:default.log
Impact: This service is not running.


Well, this still doesn't tell me enough, but notice how it tells me what log file to look in?  Lets look there.


$ cat /var/svc/log/network-ssh:default.log
...
[ Jul 28 09:50:11 Rereading configuration. ]
[ Jul 28 09:50:11 Executing refresh method ("/lib/svc/method/sshd restart"). ]
[ Jul 28 09:50:11 Method "refresh" exited with status 0. ]
[ Jul 28 09:50:11 Stopping because all processes in service exited. ]
[ Jul 28 09:50:11 Executing stop method (:kill). ]
[ Jul 28 09:50:11 Executing start method ("/lib/svc/method/sshd start"). ]
/etc/ssh/sshd_config: line 147: Bad configuration option: llow
/etc/ssh/sshd_config: terminating, 1 bad configuration options


This log points out the line in the sshd_config file that was causing my problems.  I had somehow deleted the first character of AllowGroups so that it read llowGroups.   After fixing the faux pas, I had to then clear the event in order to get it out of maintenence mode:

$ sudo svcadm clear ssh

Once cleared, I was able to reread the configuration file, and everything was right again with the world:

$ sudo svcadm refresh ssh

$ sudo svcs status ssh
STATE          STIME    FMRI
online         Jul_20   svc:/network/nfs/status:default
online         9:59:59  svc:/network/ssh:default

Now, I'm not sure if it reread the configuration file or not when I cleared the event.  I should have validated it.  Regardless, in the event that your system is stuck in single user mode, it will NOT continue to boot up until you have cleared the event.








6 comments:

  1. Excellent job posting this man. I've just modified the sshd_config file (previous backup) to test some changes and the ssh service just fail at start. The Sun Server has Solaris 10, so your topic rules.

    I check that log too, but never saw the same line you put on yellow. I rechecked again and perfectly matches with the changes I did.

    Thanks a lot

    ReplyDelete
  2. Kudos no this article! Very concise and informative. Really helped me cut through to the issue that knocked the SSHd offline. I always struggle when I swing over to Solaris from the various other Linux builds I normally admin.

    ReplyDelete
  3. thank you very much. explanation is very clear

    ReplyDelete
  4. root@solaris:~# svcs -xv
    svc:/application/database/mysql:version_55 (MySQL Database Management System)
    State: maintenance since August 12, 2017 02:05:10 PM EDT
    Reason: Restarting too quickly.
    See: http://support.oracle.com/msg/SMF-8000-L5
    See: man -M /usr/share/man -s 1 MySQL 5.5
    See: http://dev.mysql.com/doc
    See: /var/svc/log/application-database-mysql:version_55.log
    Impact: This service is not running.

    this command does not clear it
    root@solaris:~# svcadm clear svc:/application/database/mysql:version_55

    the log:
    /usr/mysql/5.5/bin/mysqld_safe --defaults-file=/etc/mysql/5.5/my.cnf --user=mysql --datadir=/var/mysql/5.5/data --pid-file=/var/mysql/5.5/data/solaris.pid
    [ Jul 4 11:25:26 Method "start" exited with status 0. ]
    [ Jul 4 11:25:27 Stopping because all processes in service exited. ]
    [ Jul 4 11:25:27 Executing stop method ("/lib/svc/method/mysql_55 stop 116"). ]
    [ Jul 4 11:25:27 Method "stop" exited with status 0. ]
    [ Jul 4 11:25:27 Executing start method ("/lib/svc/method/mysql_55 start"). ]
    /usr/mysql/5.5/bin/mysqld_safe --defaults-file=/etc/mysql/5.5/my.cnf --user=mysql --datadir=/var/mysql/5.5/data --pid-file=/var/mysql/5.5/data/solaris.pid
    [ Jul 4 11:25:27 Method "start" exited with status 0. ]
    [ Jul 4 11:25:27 Stopping because all processes in service exited. ]
    [ Jul 4 11:25:27 Executing stop method ("/lib/svc/method/mysql_55 stop 122"). ]
    [ Jul 4 11:25:27 Method "stop" exited with status 0. ]
    [ Jul 4 11:25:27 Executing start method ("/lib/svc/method/mysql_55 start"). ]
    /usr/mysql/5.5/bin/mysqld_safe --defaults-file=/etc/mysql/5.5/my.cnf --user=mysql --datadir=/var/mysql/5.5/data --pid-file=/var/mysql/5.5/data/solaris.pid
    [ Jul 4 11:25:27 Method "start" exited with status 0. ]
    [ Jul 4 11:25:28 Stopping because all processes in service exited. ]
    [ Jul 4 11:25:28 Executing stop method ("/lib/svc/method/mysql_55 stop 126"). ]
    [ Jul 4 11:25:28 Method "stop" exited with status 0. ]
    [ Jul 4 11:25:28 Restarting too quickly, changing state to maintenance. ]
    [ Jul 9 21:07:18 Leaving maintenance because disable requested. ]
    [ Jul 9 21:07:18 Disabled. ]
    [ Jul 9 21:09:30 Executing start method ("/lib/svc/method/mysql_55 start"). ]
    /usr/mysql/5.5/bin/mysqld_safe --defaults-file=/etc/mysql/5.5/my.cnf --user=mysql --datadir=/var/mysql/5.5/data --pid-file=/var/mysql/5.5/data/solaris.pid
    [ Jul 9 21:09:30 Method "start" exited with status 0. ]
    [ Jul 9 21:09:33 Stopping because all processes in service exited. ]
    [ Jul 9 21:09:33 Executing stop method ("/lib/svc/method/mysql_55 stop 94"). ]
    [ Jul 9 21:09:33 Method "stop" exited with status 0. ]
    [ Jul 9 21:09:33 Executing start method ("/lib/svc/method/mysql_55 start"). ]

    contact me -> shoot@ccskeet.com

    ReplyDelete
  5. how to exit from maintenance mode in Solaris V240

    ReplyDelete
  6. Hi,

    I am having the same maintenance issue but for SMTP:sendmail. I have done the steps above however it continuously goes back into maintenance mode :(.

    Can anyone advise me on this?

    Thanks
    CLaire

    ReplyDelete