错误调用栈如下
Caused by: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to hive@TEST.COM
at org.apache.hadoop.security.authentication.util.KerberosName.getShortName(KerberosName.java:389) ~[?:?]
at org.apache.hadoop.security.User.<init>(User.java:48) ~[?:?]
at org.apache.hadoop.security.User.<init>(User.java:43) ~[?:?]
at org.apache.hadoop.security.UserGroupInformation$HadoopLoginModule.commit(UserGroupInformation.java:197) ~[?:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_181]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) ~[?:1.8.0_181]
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) ~[?:1.8.0_181]
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) ~[?:1.8.0_181]
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) ~[?:1.8.0_181]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181]
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) ~[?:1.8.0_181]
at javax.security.auth.login.LoginContext.login(LoginContext.java:588) ~[?:1.8.0_181]
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1135) ~[?:?]
该错误出现的典型情况有
- 配置了错误的
hadoop.security.auth_to_local
,这个属性设置将认证名转换为短名称的规则,当所有规则都不匹配时就会报上述错误,但是一般都不会配置这个属性 - 在同一个进程中连续连接了使用不同Realm的Kerberos集群,我们主要分析这种情况
分析错误原因
根据调用栈可以得出,getShortName
这个方法出现了异常,查看源码
public String getShortName() throws IOException {
String[] params;
if (hostName == null) {
// if it is already simple, just return it
if (realm == null) {
return serviceName;
}
params = new String[]{realm, serviceName};
} else {
params = new String[]{realm, serviceName, hostName};
}
for (Rule r : rules) {
String result = r.apply(params);
if (result != null) {
return result;
}
}
LOG.info("No auth_to_local rules applied to {}", this);
return toString();
}
可知在遍历了所有规则后,都未能返回一个不为null
的result
rules
是一个静态变量,我们追踪rules
是如何被设置的,部分代码已被忽略
// 以下代码来自org.apache.hadoop.security.authentication.util.KerberosName
/**
* A pattern for parsing a auth_to_local rule.
*/
private static final Pattern ruleParser =
Pattern.compile("\\s*((DEFAULT)|(RULE:\\[(\\d*):([^\\]]*)](\\(([^)]*)\\))?"+
"(s/([^/]*)/([^/]*)/(g)?)?))/?(L)?");
private static List<Rule> rules;
public static void setRules(String ruleString) {
rules = (ruleString != null) ? parseRules(ruleString) : null;
}
static List<Rule> parseRules(String rules) {
List<Rule> result = new ArrayList<Rule>();
String remaining = rules.trim();
while (remaining.length() > 0) {
Matcher matcher = ruleParser.matcher(remaining);
if (!matcher.lookingAt()) {
throw new IllegalArgumentException("Invalid rule: " + remaining);
}
if (matcher.group(2) != null) {
result.add(new Rule());
} else {
result.add(new Rule(Integer.parseInt(matcher.group(4)),
matcher.group(5),
matcher.group(7),
matcher.group(9),
matcher.group(10),
"g".equals(matcher.group(11)),
"L".equals(matcher.group(12))));
}
remaining = remaining.substring(matcher.end());
}
return result;
}
// 以下代码来自org.apache.hadoop.security.HadoopKerberosName
public static void setConfiguration(Configuration conf) throws IOException {
final String defaultRule;
switch (SecurityUtil.getAuthenticationMethod(conf)) {
case KERBEROS:
case KERBEROS_SSL:
defaultRule = "DEFAULT";
break;
}
String ruleString = conf.get(HADOOP_SECURITY_AUTH_TO_LOCAL, defaultRule);
setRules(ruleString);
}
// 以下代码来自org.apache.hadoop.security.UserGroupInformation
public static void setConfiguration(Configuration conf) {
initialize(conf, true);
}
private static synchronized void initialize(Configuration conf, boolean overrideNameRules) {
if (overrideNameRules || !HadoopKerberosName.hasRulesBeenSet()) {
try {
HadoopKerberosName.setConfiguration(conf);
} catch (IOException ioe) {
throw new RuntimeException(
"Problem with Kerberos auth_to_local name configuration", ioe);
}
}
}
可见我们在调用UserGroupInformation#setConfiguration(Configuration conf)
时,重设了KerberosName#rules
,当我们没有在conf
中配置hadoop.security.auth_to_local
时,rules
中包含一个使用无参构造函数new出来的Rule
,接下来研究Rule
,部分代码已被忽略
// 以下代码来自org.apache.hadoop.security.authentication.util.KerberosName.Rule
Rule() {
isDefault = true;
}
String apply(String[] params) throws IOException {
String result = null;
if (isDefault) {
if (defaultRealm.equals(params[0])) {
result = params[1];
}
} else if (params.length - 1 == numOfComponents) {
String base = replaceParameters(format, params);
if (match == null || match.matcher(base).matches()) {
if (fromPattern == null) {
result = base;
} else {
result = replaceSubstitution(base, fromPattern, toPattern, repeat);
}
}
}
if (result != null && nonSimplePattern.matcher(result).find()) {
LOG.info("Non-simple name {} after auth_to_local rule {}",
result, this);
}
if (toLowerCase && result != null) {
result = result.toLowerCase(Locale.ENGLISH);
}
return result;
}
当isDefault == true
时,使用了KerberosName#defaultRealm
来匹配realm
部分,如果一致,则返回serviceName,此时追踪defaultRealm
的设值方式
// 以下代码来自org.apache.hadoop.security.authentication.util.KerberosName
private static String defaultRealm;
static {
defaultRealm = KerberosUtil.getDefaultRealm();
}
@VisibleForTesting
public static void resetDefaultRealm() {
defaultRealm = KerberosUtil.getDefaultRealm();
}
可知defaultRealm
在static块中被初始化,加载的是初始化时java.security.krb5.kdc
指向的配置文件中配置的default_realm
,而resetDefaultRealm()
方法,没有从任何地方被调用。这就解释了,为什么在连接不同的Kerberos的realm时,只有能连通第一个,因为连接第二个时,defaultRealm
仍为第一个的值
解决方案
- 可以在调用
UserGroupInformation#setConfiguration(Configuration conf)
后再调用一下KerberosName#resetDefaultRealm()
,在setConfiguration
时,UGI会重新加载当前java.security.krb5.kdc
指向的配置文件,然后调用resetDefaultRealm()
获取加载到的最新值 - 可以在
setConfiguration
传入的conf
中自定义hadoop.security.auth_to_local
,使之和默认行为相同,直接匹配出serviceName
,配置说明Mapping Kerberos Principals to Short Names
网友评论