跳至正文

【server故障】linux下JVM内存耗尽故障

– 环境:linux RHL 7.5
– Tableau Server:2019.1
今天客户咨询一个问题,在访问8850后台时,提示出现以下错误:
 
Whitelabel Error Page
This application has no explicit mapping for /error, so you are seeing this as a fallback.There was an unexpected error (type=Internal Server Error, status=500).
com.tableausoftware.tabadmin.webapp.exceptions.RestException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
WechatIMG47.png

1、故障排查和描述

第一时间,我让客户使用tsm命令查看了server的状态,显示异常如下:

WechatIMG49.png
我们看到上面的关键提示是Native memory allocation (mmap) failed to map,通过这里我们可以看到有用的线索,同时根据log日志进一步查找原因。
 
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 702021632 bytes for committing reserved memory.
# Possible reasons:
#   The system is out of physical RAM or swap space
#   In 32 bit mode, the process size limit was hit
# Possible solutions:
#   Reduce memory load on the system
#   Increase physical memory or swap space
#   Check if swap backing store is full
#   Use 64 bit Java on a 64 bit OS
#   Decrease Java heap size (-Xmx/-Xms)
#   Decrease number of Java threads
#   Decrease Java thread stack sizes (-Xss)
#   Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
#  Out of Memory Error (os_linux.cpp:2640), pid=6197, tid=0x00007fca569c1700
#
# JRE version:  (8.0_181-b13) (build )
# Java VM: OpenJDK 64-Bit Server VM (25.181-b13 mixed mode linux-amd64 compressed oops)
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try “ulimit -c unlimited” before starting Java again
#

 

2、故障解决

 
此类问题的首选推荐是重启server,方法是使用tsm restart命令。不过重启后,所有进程依然无法使用,tsm状态显示error,后台服务均未启动。
 
WechatIMG75
wechatimg73.png
 
此时需要进一步查看日志文件,Tableau server的日志保存在以下路径中:

/var/opt/tableau/tableau_server/data/tabsvc/logs/

首先查看负责tsm命令的tabadmincontroller_node1-0日志文件,发现一切正常,没有error类的提示;由于上面提到了zookeeper,之后查看负责协调服务的日志,appzookeeper_node1-0.log 日志,反复记录的error问题为:
 
2019-07-14 21:01:37.130 +0800 29300 main : ERROR org.apache.zookeeper.server.quorum.QuorumPeerConfig – Invalid configuration, only one server specified (ignoring)
 
既然是协调服务的故障,可能与系统的底层故障有关,因此建议顾客重启操作系统服务器,
重启了操作系统之后,tsm方才正常。
 
正在通过工程师排查进一步的原因,工程师反馈后会补充。
 
 
 
 
Jul 14, 2019
 
吴玉朋
Tableau partner
在 2019年7月14日,下午11:16,Tableau Technical Support Case Replies <supportcasereplies@tableau.com> 写道:

了解 喜乐君 的更多信息

订阅后即可通过电子邮件收到最新文章。

《【server故障】linux下JVM内存耗尽故障》有4个想法

  1. Pingback: Tableau Server集群部署之hostname问题 – 喜乐君

  2. Pingback: 【故障排除】Tableau Server集群部署之用户与hostname问题 – 喜乐君

评论已关闭。