Brave(基于Zipkin的分布式调用链客户端)

作者: 乌耳 | 来源:发表于2017-12-17 22:38 被阅读283次

Brave接入ZipKin实现调用链跟踪【下】
Brave(基于Zipkin的分布式调用链客户端)
应用监控之调用链跟踪选型之Zipkin、Pinpoint、Sky
Zipkin分布式系统调用链追踪
Zipkin Brave源码解读-Tracing（全链路跟踪埋点
ZIPKIN调用链跟踪深入探究——存储检索篇
Brave接入ZipKin实现调用链跟踪【上】
SOFATracer+zipkin记录多服务系统的链路调用
spring-cloud微服务项目实战（10）-集成sleuth
Spring Cloud Sleuth+Zipkin跟踪调用链路

Brave

Brave是一个用于捕捉和报告分布式操作的延迟信息给Zipkin的工具库。
Zipkin 基于 Dapper，

包含什么

Brave的无依赖性trace包基于JRE6+，这是用于记录时间和描述系统的基础api，这个library也包含解析X-B3-TraceId头信息的代码

大多数用户不自己直接写tracing代码，相反，他们复用已经写好的基础代码。在开发自己的tracing前可以先check出instrumentation和Zipkin'list,这里已经有了通用的tracing library，如JDBC、Servlet和Spring等

如果你试图去tracing遗留系统，你也许对Spring XML Configuration感兴趣，这允许你不写任何代码来配置tracing。

如果你想将trace IDs 放入日志文件，或者想改变本地线程的行为，参看Context libraries，集成如SLF4J的日志系统

Brave APi(V4)

Brave是用于捕捉分布式操作的延迟信息并报告给zipkin的工具包，大多数人不直接使用Brave，他们使用libraries和framework而不是直接使用Brave来服务与自己的系统

该模块包含创建Tracer，连接spans，模拟潜在的分布式工作的延迟，该模块还包含在系统网络间传递跟踪上下文信息的工具包，如通过http headers。

配置

至关重要的，你必须有一个Tracer，一个已经配置过的可以向zipkin报告信息的Tracer。

这里有一个例子，配置基于http（而不是kafka）发送跟踪信息（span）到zipkin的例子。

// 配置reporter，用于控制向zipkin发送span的频率
//   (the dependency is io.zipkin.reporter2:zipkin-sender-okhttp3)
sender = OkHttpSender.create("http://127.0.0.1:9411/api/v2/spans");
spanReporter = AsyncReporter.create(sender);

// 创建一个你想在zipkin中看到的服务名称的跟踪组件
tracing = Tracing.newBuilder()
                 .localServiceName("my-service")
                 .spanReporter(spanReporter)
                 .build();

// 跟踪公开的可能需要的对象，最重要的时跟踪
tracer = tracing.tracer();

// Failing to close resources can result in dropped spans! When tracing is no
// longer needed, close the components you made in reverse order. This might be
// a shutdown hook for some users.
tracing.close();
spanReporter.close();
sender.close();

Zipkin v1 设置

如果你需要连接老版本的zipkin api，你可以使用下面的方法连接，参考zipkin-reporter查看更多信息

sender = URLConnectionSender.create("http://localhost:9411/api/v1/spans")
reporter = AsyncReporter.builder(sender)
                        .build(SpanBytesEncoder.JSON_V1);

Tracing

Tracer创建并连接spans，spans模拟分布式系统工作单元信息。Tracing可以采样，从而减轻进程中的开销，和防止大量的数据发送给zipkin。

Spans在完成时将一个Tracer报告给zipkin，不采样则不做任何事。开始一个span后，你可以标记感兴趣的event或添加tags，或查看当前的keys或详细信息。

Spans有一个上下文，其中包含tracer标识符，将其放在表示分布式操作的树的正确位置。

Local Tracing

跟踪本地代码，只需要在一个范围内运行它

Span span = tracer.newTrace().name("encode").start();
try {
  doSomethingExpensive();
} finally {
  span.finish();
}

在上面的例子中，创建的span是trace的根span。很多情况下，创建的span是已经存在的tracer的一部分。这种情况下，调用newChild替代newTrace。

Span span = tracer.newChild(root.context()).name("encode").start();
try {
  doSomethingExpensive();
} finally {
  span.finish();
}

自定义spans

已经有一个span，你可以添加tags，它可以lookup keys或详细信息。例如，你可以添加一个tag说明当前运行时version。

span.tag("clnt/finagle.version", "6.36.0");

若希望向第三方暴露自定义spans的功能，用brave.SpanCustomizer而不是brave.Span。前者更易于理解和测试，并且不会触发用户span的lifecycle hooks。

interface MyTraceCallback {
  void request(Request request, SpanCustomizer customizer);
}

brave.Span实现brave.SpanCustomizer，仅仅是为了易于使用。

例如：

for (MyTraceCallback callback : userCallbacks) {
  callback.request(request, span);
}

查找当前span

有时，你需要知道trace是否还在进行之中或是已经结束，并且你不想用户做null检查。brave.CurrentSpanCustimizer 添加相应的正在进行的span或者drops数据。

例如：

// Some DI configuration wires up the current span customizer
@Bean SpanCustomizer currentSpanCustomizer(Tracing tracing) {
  return CurrentSpanCustomizer.create(tracing);
}

// user code can then inject this without a chance of it being null.
@Inject SpanCustomizer span;

void userCode() {
  span.annotate("tx.started");
  ...
}

RPC tracing

开发你自己的RPC基础架构时，先Check instrumentation writtern here 和 Zipkin'list。

RPC tracing经常通过拦截器自动实现。下面场景中，添加tags和events来描述RPC操作的角色。

Client span：

// 在发送请求之前，添加描述信息
span = tracer.newTrace().name("get").type(CLIENT);
span.tag("clnt/finagle.version", "6.36.0");
span.tag(TraceKeys.HTTP_PATH, "/api");
span.remoteEndpoint(Endpoint.builder()
    .serviceName("backend")
    .ipv4(127 << 24 | 1)
    .port(8080).build());

// 当请求开始被调度，开始span
span.start();

// if you have callbacks for when data is on the wire, note those events
span.annotate(Constants.WIRE_SEND);
span.annotate(Constants.WIRE_RECV);

// when the response is complete, finish the span
span.finish();

单向跟踪

有时你需要创建一个异步操作，有Request，但是没有Response.在通常的RPC tracing中，使用span.finish()表明接受到Response。在单向tracing中，使用span.flush()，因为你不期望响应。

以下是client端模拟怎样构建一个单项tracing操作：

// start a new span representing a client request
oneWaySend = tracer.newSpan(parent).kind(Span.Kind.CLIENT);

// Add the trace context to the request, so it can be propagated in-band
tracing.propagation().injector(Request::addHeader)
                     .inject(oneWaySend.context(), request);

// fire off the request asynchronously, totally dropping any response
request.execute();

// start the client side and flush instead of finish
oneWaySend.start().flush();

下面是server处理单向跟踪:

// pull the context out of the incoming request
extractor = tracing.propagation().extractor(Request::getHeader);

// convert that context to a span which you can name and add tags to
oneWayReceive = nextSpan(tracer, extractor.extract(request))
    .name("process-request")
    .kind(SERVER)
    ... add tags etc.

// start the server side and flush instead of finish
oneWayReceive.start().flush();

// you should not modify this span anymore as it is complete. However,
// you can create children to represent follow-up work.
next = tracer.newSpan(oneWayReceive.context()).name("step2").start();

sampling（采样）

采样可以用来减少收集和报告的span数据。当span不采样时，不会增加开销。

抽样是预先就决定的，这意味着报告数据的决定是在一个tracer的第一个操作中做出的，而这个决定是向下游传播的。

默认情况下，有一个全局采样器将单一速率应用于所有的操作。Tracer.Builder.sampler是表示采样信息这一点，它默认跟踪每个请求。

声明式采样

有时候需要根据Java方法或注释进行采样。

大多数用户将使用一个框架拦截器来自动执行这种策略。以下是他们如何在内部工作的：

// derives a sample rate from an annotation on a java method
DeclarativeSampler<Traced> sampler = DeclarativeSampler.create(Traced::sampleRate);

@Around("@annotation(traced)")
public Object traceThing(ProceedingJoinPoint pjp, Traced traced) throws Throwable {
  Span span = tracing.tracer().newTrace(sampler.sample(traced))...
  try {
    return pjp.proceed();
  } finally {
    span.finish();
  }
}

自定义采样

你可能需要根据操作的内容来应用不同的策略。例如，你可能不想跟踪静态资源（如图片）的请求，或者你可能想将所以请求都追踪到新的api。

大多数用户将使用一个框架的拦截器来自动执行这种策略。以下是他们如何在内部工作的：

Span newTrace(Request input) {
  SamplingFlags flags = SamplingFlags.NONE;
  if (input.url().startsWith("/experimental")) {
    flags = SamplingFlags.SAMPLED;
  } else if (input.url().startsWith("/static")) {
    flags = SamplingFlags.NOT_SAMPLED;
  }
  return tracer.newTrace(flags);
}

注意：以上内置Http采样器的基础

Propagation

需要传播以确保源自同一个根的tracer在相同的轨迹中被收集在一起。最常见的propagating方法是从发送RPC请求的客户端向接收服务的服务器复制tracer上下文。

例如，当一个下游的Http调用被创建时，它的跟踪上下文和它一起被发送，被编码为request headers：

1513520459(1).png

上面的名称来自B3 Propagation，它时Brave内置的，并且具有许多语言和框架的实现。

client Propagation code：

// configure a function that injects a trace context into a request
injector = tracing.propagation().injector(Request.Builder::addHeader);

// before a request is sent, add the current span's context to it
injector.inject(span.context(), request);

以下是服务端propagation的代码：

// configure a function that extracts the trace context from a request
extracted = tracing.propagation().extractor(Request::getHeader);

// when a server receives a request, it joins or starts a new trace
span = tracer.nextSpan(extracted, request);

传播额外的字段

有时你需要传播额外的字段，例如请求ID或备用Tracing上下文。例如，如果你在Cloud Foundry环境中，则可能需要传递RequestID：

// when you initialize the builder, define the extra field you want to propagate
tracingBuilder.propagationFactory(
  ExtraFieldPropagation.newFactory(B3Propagation.FACTORY, "x-vcap-request-id")
);

// later, you can tag that request ID or use it in log correlation
requestId = ExtraFieldPropagation.current("x-vcap-request-id");

提取propagated的上下文

TraceContext.Extractor<C> 从传入请求或消息中读取跟踪标识符和采样状态。carrier通常是一个请求对象或头信息（headers）。

上面方式可以用于像HttpServletHandler这样的标准工具，也可用于自定义RPC或消息传递代码。

TraceContextOrSamplingFlags通常只用于Tracer.nextSpan(extracted),除非你在客户端和服务端之间共享spanID。

在客户端和服务端之间共享SpanID

正常的instrumentation pattern是创建一个代表RPC的服务端span。Extractor.extract应用于传入的客户端请求时可能会返回完整的跟踪上下文。
Tracer.joinSpan尝试继续此跟踪，使用相同的SpanID（如果支持），或者如果不支持则创建子span。

这是一个B3传播的例子：

1513520503(1).png

一些传播系统只转发父spanID，检测时间 Propagation.Factory.supportsJoin() == false。在这种情况下，一个新的跨度ID总是被配置，并且传入的上下文确定父ID。

注意：有些span报告器不支持共享spanID。例如，如果您设置Tracing.Builder.spanReporter(amazonXrayOrGoogleStackdrive)，禁用连接通过Tracing.Builder.supportsJoin(false)。这将迫使创建一个新的child spanTracer.joinSpan()。

Implementing Propagation

TraceContext.Extractor<C>由Propagation.Factory插件实现。在内部，这段代码将TraceContextOrSamplingFlags使用以下之一创建联合类型：

TraceContext 如果trace和spanID存在。
TraceIdContext 如果trace标识存在，但不包含span标识。
SamplingFlags 如果没有标识符存在

一些Propagation实现从提取点（不包括传入头）读取额外的数据到injection中（不包括写出头文件）。例如，它可能带有一个请求ID。当实现有额外的数据时，这里是他们如何处理它。

如果TraceContext已经提取，添加额外的数据为TraceContext.extra()
否则，将其添加为TraceContextOrSamplingFlags.extra()，Tracer.nextSpan处理。

当前跟踪组件

Brave 支持“current tracing component”的概念，只有当你没有其他的手段获得参考时才应该使用。这是针对JDBC连接的，因为它们通常在跟踪组件之前初始化。

可以通过Tracing.current()实例化最新的跟踪组件。或者Tracing.currentTracer()。如果您使用这些方法中的任何一种，不要缓存结果。相反，每次需要时都要查看它们。

Current Span

Brave支持“current span”的概念，代表了运行中的操作。Tracer.currentSpan()可以用来添加自定义tags到一个span，Tracer.nextSpan()可以用来创建任何在运行的child span。

通过自定义执行程序在范围中设置范围

许多框架允许您指定用于用户回调的执行程序。该类型CurrentTraceContext实现了支持当前span所需的全部功能。它也暴露你可以用来装饰执行者的工具。

CurrentTraceContext currentTraceContext = new CurrentTraceContext.Default();
tracing = Tracing.newBuilder()
                 .currentTraceContext(currentTraceContext)
                 ...
                 .build();

Client c = Client.create();
c.setExecutorService(currentTraceContext.executorService(realExecutorService));

手动设置范围

在编写新的instrumentation时，重要的是将您创建的span作为当前span。这不仅允许用户访问它Tracer.currentSpan()，还允许像SLF4J MDC这样的自定义功能查看当前的traceID。

Tracer.withSpanInScope(Span)有利于这一点，并通过 try-with-resources 最方便地使用，这样不影响外部代码的调用。

try (SpanInScope ws = tracer.withSpanInScope(span)) {
  return inboundRequest.invoke();
} finally { // note the scope is independent of the span
  span.finish();
}

这极少情况下，你可能需要暂时清除当前的span。例如，启动一个不应该与当前请求关联的任务。要做到这一点，只需将null传递给withSpanInScope。

try (SpanInScope cleared = tracer.withSpanInScope(null)) {
  startBackgroundThread();
}

使用回调

许多库公开了一个回调模型，而不是一个拦截器。当创建新的instrumentation时，你可能会发现需要在一个回调（例如onStart()）中放置一个span，并在另一个回调（例如onFinish()）中结束span。

如果库保证这些运行在同一个线程上，则可以简单地Tracer.withSpanInScope(Span)将开始回调的结果传播到关闭的时。这通常是通过请求域属性完成的。

这是一个例子：

class MyFilter extends Filter {
  public void onStart(Request request, Attributes attributes) {
    // Assume you have code to start the span and add relevant tags...

    // We now set the span in scope so that any code between here and
    // the end of the request can see it with Tracer.currentSpan()
    SpanInScope spanInScope = tracer.withSpanInScope(span);

    // We don't want to leak the scope, so we place it somewhere we can
    // lookup later
    attributes.put(SpanInScope.class, spanInScope);
  }

  public void onFinish(Response response, Attributes attributes) {
    // as long as we are on the same thread, we can read the span started above
    Span span = tracer.currentSpan();

    // Assume you have code to complete the span

    // We now remove the scope (which implicitly detaches it from the span)
    attributes.remove(SpanInScope.class).close();
  }
}

有时你必须建立一个库，在请求和响应之间没有共享的Contex。对于这种情况，您可以使用ThreadLocalSpan临时存储回调之间的span。

这是一个例子：

class MyFilter extends Filter {
  final ThreadLocalSpan threadLocalSpan;

  public void onStart(Request request) {
    // Assume you have code to start the span and add relevant tags...

    // We now set the span in scope so that any code between here and
    // the end of the request can see it with Tracer.currentSpan()
    threadLocalSpan.set(span);
  }

  public void onFinish(Response response, Attributes attributes) {
    // as long as we are on the same thread, we can read the span started above
    Span span = threadLocalSpan.remove();
    if (span == null) return;

    // Assume you have code to complete the span
  }
}

处理在不同线程上发生的回调

上面的例子工作，回调发生在同一个线程。如果你无法在同一个线程上关闭该scope，则不应将span设置为scope。在一些异步库中可能会出现这种情况。通常，您需要直接在自定义属性中传播span。这将允许您跟踪RPC，即使这种方法不利于使用Tracer.currentSpan()外部代码。

下面是一个显式传播的例子：

class MyFilter extends Filter {
  public void onStart(Request request, Attributes attributes) {
    // Assume you have code to start the span and add relevant tags...

    // We can't open a scope as onFinish happens on another thread.
    // Instead, we propagate the span manually so at least basic tracing
    // will work.
    attributes.put(Span.class, span);
  }

  public void onFinish(Response response, Attributes attributes) {
    // We can't rely on Tracer.currentSpan(), but we can rely on explicit
    // propagation
    Span span = attributes.remove(Span.class);

    // Assume you have code to complete the span
  }

禁用跟踪

如果您处于需要在运行时关闭跟踪的情况，请调用Tracing.setNoop(true)。这将把任何新的span变成“noop”span，并且丢弃所有数据直到Tracing.setNoop(false)被调用。

性能

Brave已经建立在性能的基础上。使用核心Span api，可以在几微秒内记录跨度。当跨度采样时，实际上没有开销（因为它是一个noop）。

与以前的实现不同，“Brave4”只需要一个时间戳。所有注释都是使用较便宜和更精确的System.nanoTime()功能以偏移量记录的。

单元测试instrumentation

在编写单元测试时，有一些技巧可以使错误更容易找到：

报告跨越一个并发队列，所以你可以在测试中阅读它们
使用StrictCurrentTraceContext露出微妙的传播错误
无条件清理Tracing.current()，防止泄漏

以下是您的单元测试的一个示例设置：

ConcurrentLinkedDeque<Span> spans = new ConcurrentLinkedDeque<>();

Tracing tracing = Tracing.newBuilder()
                 .currentTraceContext(new StrictCurrentTraceContext())
                 .spanReporter(spans::add)
                 .build();

  @After public void close() {
    Tracing current = Tracing.current();
    if (current != null) current.close();
  }

注意：原创文章，欢迎转载，请注明出处。

Brave接入ZipKin实现调用链跟踪【下】
在上篇《Brave接入ZipKin实现调用链跟踪【上】》中，我们了解了ZipKin和Brave的相关知识及配置，本...
Brave(基于Zipkin的分布式调用链客户端)
Brave Brave是一个用于捕捉和报告分布式操作的延迟信息给Zipkin的工具库。Zipkin 基于 Dapp...
应用监控之调用链跟踪选型之Zipkin、Pinpoint、Sky
调用链监控系统简介 Zipkin是Twitter开源的调用链分析工具，目前基于springcloud sleuth...
Zipkin分布式系统调用链追踪
zipkin-demo ZIPKIN分布式系统调用链追踪在公司业务发展过程中，刚开始的时候，我们可能比较关注单个...
Zipkin Brave源码解读-Tracing（全链路跟踪埋点
最近在研究全链路跟踪，因某些原因选用了 ZipKin 的 Brave 作为埋点工具，ZipKin不使用，本系列仅做...
ZIPKIN调用链跟踪深入探究——存储检索篇
ZIPKIN作为当下流行的分布式调用链解决方案，它底层存储支持多种组件，包括elasticsearch，cassa...
Brave接入ZipKin实现调用链跟踪【上】
导读：一个分布式系统由若干分布式服务构成，每一个请求会经过多个业务系统并留下足迹，但是这些分散的数据对于问题排查...
SOFATracer+zipkin记录多服务系统的链路调用
1. 介绍在分布式为服务系统中的相互调用，SOFATracer+zipkin可以将链路调用数据记录并展示出来，这...
spring-cloud微服务项目实战（10）-集成sleuth
目的部署zipkin服务，配合sleuth进行调用分析简介 zipkin是Twitter的开源的分布式跟踪系统...
Spring Cloud Sleuth+Zipkin跟踪调用链路
Spring Cloud Sleuth+zipkin跟踪调用链路一、下载zipkin server并启动 htt...