最近对系统进行了升级,yum update,升级之后发现jps命令用不了了,最终找到问题是jps和java的连接全部失效,手动更改位置之后jps能用,但hbase还是调用原来的位置,一个个改太繁琐了,用ln jps /usr/lib/jvm/java/bin/jps重建连接之后发现该问题解决,但另几台机子还是不行,用yum remove java-1.8.0-openjdk*和yum install java-1.8.0-openjdk*重装java之后一切问题解决。还有几台机子yum update升级之后并没有产生这个问题,一切正常,不知道为什么,都是从centos6.5升级到centos6.7。
之后运行hbase发现hbase报一堆错误,大概意思就是zookeeper无法连接主机,在主机上看了下jps,发现没有hmaster这个进程,用xyhadoop/hbase-1.0.1.1/bin/hbase-daemon.sh start master启动hmaster之后,用jps看到hmaster进程,但瞬间再次用jps看的时候hmaster已经自动关闭了。
查看日志cat /root/xyhadoop/hbase-1.0.1.1/bin/../logs/hbase-root-master-192-168-137-2.out
SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/root/xyhadoop/hbase-1.0.1.1/lib/phoenix-4.7.0-HBase-1.0-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/root/xyhadoop/hbase-1.0.1.1/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/root/xyhadoop/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. OpenJDK 64-Bit Server VM warning: You have loaded library /root/xyhadoop/hadoop-2.7.1/lib/native/libhadoop.so which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'. <strong>0 [main] ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - ZooKeeper create failed after 4 attempts</strong> <strong>3 [main] ERROR org.apache.hadoop.hbase.master.HMasterCommandLine - Master exiting</strong> <strong>java.lang.RuntimeException: Failed construction of Master</strong>: class org.apache.hadoop.hbase.master.HMaster at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1988) at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:203) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:139) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2002) Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:512) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:491) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createWithParents(ZKUtil.java:1256) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createWithParents(ZKUtil.java:1234) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:174) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:167) at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:531) at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:333) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1983) ... 5 more
看我加粗的部分,尝试4次创建zookeeper都失败,hmaster已经存在,java运行时异常,无法构建master。
hmaster已经存在的错误提示让我非常不解,jps查看,明明已经自动关闭了,netstat查看也没有发现该进程和相应的端口被占用,上网搜索,说是重新格式化namenode或者可能是因为各节点时间或者数据不同步造成的,用ntp同步时间之后问题依旧,将hdfs上的数据备份下来,重建hdfs再拷贝上去,动静太大,没有尝试。
后来,我突然想到,既然错误提示hmaster已经存在,那我不如先关闭再打开试试,
[root@192-168-137-2 ~]# xyhadoop/hbase-1.0.1.1/bin/hbase-daemon.sh stop master no master to stop because kill -0 of pid 4383 failed with status 1 [root@192-168-137-2 ~]# xyhadoop/hbase-1.0.1.1/bin/hbase-daemon.sh stop master no master to stop because no pid file /tmp/hbase-root-master.pid
注意看两次的提示,第一次提示pid4388 kill失败,第二次提示pid文件没有找到。那说明第一次的关闭master操作虽然没有成功,但却把相应的pid文件删掉了,第二次还提供了该文件所在的位置。
此时启动hmaster看看:
[root@192-168-137-2 ~]# xyhadoop/hbase-1.0.1.1/bin/hbase-daemon.sh start master starting master, logging to /root/xyhadoop/hbase-1.0.1.1/bin/../logs/hbase-root-master-192-168-137-2.out [root@192-168-137-2 ~]# jps 2048 AmbariServer 5124 HMaster 4309 HQuorumPeer 3595 SecondaryNameNode 5197 Jps 3758 ResourceManager 3374 NameNode [root@192-168-137-2 ~]# jps 2048 AmbariServer 5124 HMaster 4309 HQuorumPeer 5433 Jps 3595 SecondaryNameNode 3758 ResourceManager 3374 NameNode
成功启动,并且jps看到进程hmaster存在,稍等一会再jps看看,发现hmaster还在,进入hbase shell,一切正常,list看下,表全部列出,至此,问题解决,将过程记录下来分享经验给大家。
参考链接:
http://www.aboutyun.com/thread-5882-1-1.html
http://www.aboutyun.com/thread-5883-1-1.html
http://blog.chinaunix.net/xmlrpc.php?r=blog/article&id=4008535&uid=26275986
《hbase hmaster启动起来就自动关闭的问题解决成功经验分享》有一个想法
评论已关闭。