美文网首页
2018-11-10 Hadoop – GC overhead

2018-11-10 Hadoop – GC overhead

作者: 四火流年 | 来源:发表于2018-11-10 17:13 被阅读18次

    转自: https://chawlasumit.wordpress.com/2016/03/14/hadoop-gc-overhead-limit-exceeded-error/

    In our Hadoop setup, we ended up having more than 1 million files in a single folder. The folder had so many files, that any hdfs dfs command like -ls, -copyToLocal on the files was giving following error:

    Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
            at java.util.Arrays.copyOf(Arrays.java:2367)
            at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
            at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
            at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
            at java.lang.StringBuffer.append(StringBuffer.java:237)
            at java.net.URI.appendAuthority(URI.java:1852)
            at java.net.URI.appendSchemeSpecificPart(URI.java:1890)
            at java.net.URI.toString(URI.java:1922)
            at java.net.URI.<init>(URI.java:749    at org.apache.hadoop.fs.Path.initialize(Path.java:203)
            at org.apache.hadoop.fs.Path.<init>(Path.java:116    at org.apache.hadoop.fs.Path.<init>(Path.java:94    at org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:230)
            at org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:263)
            at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:732)
            at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
            at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755)
            at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751)
            at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
            at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751)
            at org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268)
            at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
            at org.apache.hadoop.fs.shell.CommandWithDestination.recursePath(CommandWithDestination.java:291)
            at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
            at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
            at org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:243)
            at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
            at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
            at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:220)
            at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
            at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
            at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
    

    After doing some research, we added following environment variable to update Hadoop runtime options.

    export HADOOP_OPTS="-XX:-UseGCOverheadLimit"
    

    Adding this option fixed the GC error, but started throwing the following error, citing the lack of Java Heap space.

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
            at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1351)
            at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1413)
            at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1524)
            at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1533)
            at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:557)
            at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:606)
            at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
            at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
            at com.sun.proxy.$Proxy15.getListing(Unknown Source)
            at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1969)
            at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1952)
            at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:724)
            at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
            at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755)
            at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751)
            at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
            at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751)
            at org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268)
            at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
            at org.apache.hadoop.fs.shell.CommandWithDestination.recursePath(CommandWithDestination.java:291)
            at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
            at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
            at org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:243)
            at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
            at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
            at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:220)
            at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
            at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
            at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    

    We modified the above export, and tried following instead. Note that instead of HADOOP_OPTS, we needed to set HADOOP_CLIENT_OPTS fix this error. This was needed because all the hadoop commands run as a client. HADOOP_OPTS needs to be setup for modifying actual Hadoop run time, and HADOOP_CLIENT_OPTS is needed to be setup for modifying run time for Hadoop command line client.

    export HADOOP_CLIENT_OPTS="-XX:-UseGCOverheadLimit -Xmx4096m"
    

    相关文章

      网友评论

          本文标题:2018-11-10 Hadoop – GC overhead

          本文链接:https://www.haomeiwen.com/subject/ehzzxqtx.html