美文网首页CVPR paper translation
Taskonomy: Disentangling Task Tr

Taskonomy: Disentangling Task Tr

作者: Lornatang | 来源:发表于2019-03-27 23:12 被阅读0次

Taskonomy: Disentangling Task Transfer Learning翻译 下

4. Experiments

4.实验

With 26 tasks in the dictionary (4 source-only tasks), our approach leads to training 26 fully supervised task-specific networks, image transfer networks in 1st order, and image image for image

order, from which we sample according to the procedure in Sec. 3. The total number of transfer functions trained for the taxonomy was ∼3,000 which took 47,886 GPU hours on the cloud.

在字典中有26个任务(4个纯源任务),我们的方法导致培训26个完全监督的任务专用网络, image 传输网络的第1阶和 image image 的 image

命令,从中我们根据程序秒。 3。针对该分类标准培训的传输功能的总数约为3,000,在云上花费了47,886个GPU小时。

Out of 26 tasks, we usually use the following 4 as sourceonly tasks (described in Sec. 3) in the experiments: colorization, jigsaw puzzle, in-painting, random projection. However, the method is applicable to an arbitrary partitioning of the dictionary into T and S. The interactive solver website allows the user to specify any desired partition.

在26个任务中,我们通常在实验中将以下4个任务用作源项任务(如第3节所述):彩色化,拼图游戏,绘画中的随机投影。但是,该方法适用于将字典任意分成T和S.交互式求解器网站允许用户指定任何想要的分区。

image

Table 1: Task-Specific Networks’ Sanity: Win rates vs. random (Gaussian) network representation readout and statistically informed guess avg.

表1:任务特定网络的完整性:胜率与随机(高斯)网络表示读数和统计学上的猜测平均值。

Network Architectures: We preserved the architectural and training details across tasks as homogeneously as possible to avoid injecting any bias. The encoder architecture is identical across all task-specific networks and is a fully convolutional ResNet-50 without pooling. All transfer functions include identical shallow networks with 2 conv layers (concatenated channel-wise if higher-order). The loss ( image

) and decoder’s architecture, though, have to depend on the task as the output structures of different tasks vary; for all pixel-to-pixel tasks, e.g. normal estimation, the decoder is a 15-layer fully convolutional network; for low dimensional tasks, e.g. vanishing points, it consists of 2-3 FC layers. All networks are trained using the same hyperparameters regardless of task and on exactly the same input images. Tasks with more than one input, e.g. relative camera pose, share weights between the encoder towers. Transfer networks are all trained using the same hyperparameters as the task-specific networks, except that we anneal the learning rate earlier since they train much faster. Detailed definitions of architectures, training process, and experiments with different encoders can be found in the supplementary material. Data Splits: Our dataset includes 4 million images. We made publicly available the models trained on full dataset, but for the experiments reported in the main paper, we used a subset of the dataset as the extracted structure stabilized and did not change when using more data (explained in Sec. 5.2). The used subset is partitioned into training (120k), validation (16k), and test (17k) images, each from non-overlapping sets of buildings. Our task-specific networks are trained on the training set and the transfer networks are trained on a subset of validation set, ranging from 1k images to 16k, in order to model the transfer patterns under different data regimes. In the main paper, we report all results under the 16k transfer supervision regime (∼10% of the split) and defer the additional sizes to the supplementary material and website (see Sec. 5.2). Transfer functions are evaluated on the test set.

网络架构:我们尽可能保持各个任务之间的架构和培训细节,以避免注入任何偏见。编码器体系结构在所有任务特定网络中都是相同的,并且是一个完全卷积的ResNet-50,无需汇集。所有传递函数都包含具有2个conv层的相同浅层网络(如果高阶,则按级联通道方式)。尽管如此,丢失( image

)和解码器的体系结构必须依赖于任务,因为不同任务的输出结构有所不同;对于所有的像素到像素任务,例如正常估计,解码器是一个15层完全卷积网络;对于低维度任务,例如消失点,它由2-3个FC层组成。所有网络都使用相同的超参数进行训练,而不管任务和输入图像是否完全相同。具有多个输入的任务,例如相对摄像机姿态,编码器塔之间的分配权重。传输网络都是使用与特定任务网络相同的超参数进行训练的,除了我们之前的训练速度比较慢以来,它们的训练速度更快。有关体系结构的详细定义,培训过程以及使用不同编码器的实验可以在补充材料中找到。数据拆分:我们的数据集包含400万张图像。我们公开提供了关于完整数据集的训练模型,但对于主要报告中的实验,我们使用了数据集的一个子集,因为提取的结构是稳定的,并且在使用更多数据时没有变化(在5.2节中解释过)。使用的子集被划分为每个来自非重叠建筑物集合的训练(120k),验证(16k)和测试(17k)图像。我们的任务特定网络在训练集上进行了训练,并且传输网络在验证集的一个子集上进行了训练,范围从1k图像到16k,以便在不同数据制度下对传输模式进行建模。在主要论文中,我们报告了16k转移监督制度下的所有结果(约10%的拆分),并将附加大小推迟到补充材料和网站(参见5.2节)。传递函数在测试集上进行评估。

How good are the trained task-specific networks? Win rate (%) is the proportion of test set images for which a baseline is beaten. Table 1 provides win rates of the taskspecifc networks vs. two baselines. Visual outputs for a ran dom test sample are in Fig. 3. The high win rates in Table 1 and qualitative results show the networks are well trained and stable and can be relied upon for modeling the task space. See results of applying the networks on a YouTube video frame-by-frame here. A live demo for user uploaded queries is available here.

训练有素的任务特定网络有多好?赢率(%)是基线被击打的测试集图像的比例。表1提供了任务特定网络与两个基线的胜率。运行测试样本的可视化输出如图3所示。表1中的高胜率和定性结果表明,网络训练良好且稳定,可以依赖于对任务空间进行建模。查看在这里逐帧应用YouTube视频网络的结果。用户上传查询的实时演示可在此处获得。

image

Figure 8: Computed taxonomies for solving 22 tasks given various supervision budgets (x-axes), and maximum allowed transfer orders (y-axes). One is magnified for better visibility. Nodes with incoming edges are target tasks, and the number of their incoming edges is the order of their chosen transfer function. Still transferring to some targets when tge budget is 26 (full budget) means certain transfers started performing better than their fully supervised task-specific counterpart. See the interactive solver website for color coding of the nodes by Gain and Quality metrics. Dimmed nodes are the source-only tasks, and thus, only participate in the taxonomy if found worthwhile by the BIP optimization to be one of the sources.

图8:在给定各种监督预算(x轴)和最大允许转移次数(y轴)的情况下,用于解决22个任务的计算分类法。其中一个是放大以提高可视性。具有传入边缘的节点是目标任务,其传入边缘的数量是其选择的传递函数的顺序。当预算为26(全额预算)时,仍然转移到某些目标意味着某些转账开始比其完全监督的任务特定对手更好地执行。通过增益和质量指标查看交互式求解器网站,了解节点的颜色编码。变暗的节点是仅来源的任务,因此,只有参与BIP优化才能找到有价值的资源时才参与分类。

To get a sense of the quality of our networks vs. state-ofthe-art task-specific methods, we compared our depth estimator vs. released models of [53] which led to outperforming [53] with a win rate of 88% and losses of 0.35 vs. 0.47 (further details in the supplementary material). In general, we found the task-specific networks to perform on par or better than state-of-the-art for many of the tasks, though we do not formally benchmark or claim this.

为了了解我们网络的质量与先进的特定任务方法的质量,我们比较了我们的深度估计器和发布的[53]模型,这些模型导致性能超越[53],胜率为88%,损失0.35对0.47(补充材料中的进一步细节)。总的来说,我们发现任务特定网络在许多任务中的表现与最先进的技术相当或更好,但我们并未正式基准测试或声明这一点。

4.1. Evaluation of Computed Taxonomies

4.1。计算分类法的评估

Fig. 8 shows the computed taxonomies optimized to solve the full dictionary, i.e. all tasks are placed in T and S (except for 4 source-only tasks that are in S only). This was done for various supervision budgets (columns) and maximum allowed order (rows) constraints. Still seeing transfers to some targets when the budget is 26 (full dictionary) means certain transfers became better than their fully supervised task-specific counterpart.

图8显示了为解决完整字典而优化的计算分类法,即所有任务都放置在T和S中(除了仅限于S中的仅4个源代码任务)。这是针对各种监督预算(列)和最大允许订单(行)限制完成的。在预算为26(全字典)的情况下,仍然会看到转移到某些目标,这意味着某些转帐比完全监督的任务特定的转帐更好。

While Fig. 8 shows the structure and connectivity, Fig. 9 quantifies the results of taxonomy recommended transfer policies by two metrics of Gain and Quality, defined as: Gain: win rate (%) against a network trained from scratch using the same training data as transfer networks’. That is, the best that could be done if transfer learning was not utilized. This quantifies the gained value by transferring. Quality: win rate (%) against a fully supervised network trained with 120k images (gold standard).

虽然图8显示了结构和连接性,但图9通过增益和质量两个度量来量化分类法推荐的转移策略的结果,定义为:对于使用相同训练数据从头开始训练的网络,增益:赢率(%)作为转移网络'。也就是说,如果转移学习没有被利用,那么可以做的最好。这通过转移来量化所获得的价值。质量:赢得比率(%)与完全监督的网络训练有120k图像(黄金标准)。

image

Figure 9: Evaluation of taxonomy computed for solving the full task dictionary. Gain (left) and Quality (right) values for each task using the policy suggested by the computed taxonomy, as the supervision budget increases(→). Shown for transfer orders 1 and 4.

图9:为解决完整任务字典而计算的分类标准评估。随着监督预算增加(→),使用计算分类法建议的策略为每个任务获得(左)和质量(右)值。显示转让订单1和4。

Red (0) and Blue (1) represent outperforming the reference method on none and all of test set images, respectively (so the transition Red→White→Blue is desirable. White (.5) represents equal performance to reference).

红色(0)和蓝色(1)分别表示在没有测试集和全部测试集图像上都优于参考方法(因此需要转换红色→白色→蓝色)。白色(.5)表示与参考相同的性能)。

image

Figure 10: Generalization to Novel Tasks. Each row shows a novel test task. Left: Gain and Quality values using the devised “all-for-one” transfer policies for novel tasks for orders 1-4. Right: Win rates (%) of the transfer policy over various self-supervised methods, ImageNet features, and scratch are shown in the colored rows. Note the large margin of win by taxonomy. The uncolored rows show corresponding loss values.

图10:推广到小说任务。每行显示一个新颖的测试任务。左图:利用针对订单1-4的新任务设计的“一对一”转移策略获得的收益和质量值。右图:在各种自我监督方法,ImageNet功能和划痕上的转移策略赢率(%)显示在彩色行中。请注意分类法的优势。未着色的行显示相应的损失值。

Each column in Fig. 9 shows a supervision budget. As apparent, good results can be achieved even when the supervision budget is notably smaller than the number of solved tasks, and as the budget increases, results improve (expected). Results are shown for 2 maximum allowed orders.

图9中的每一栏显示了监督预算。显然,即使监督预算明显小于解决的任务数量,也能取得良好的结果,随着预算的增加,结果会改善(预期)。显示2个最大允许订单的结果。

4.2. Generalization to Novel Tasks

4.2。推广到小说任务

The taxonomies in Sec. 4.1 were optimized for solving all tasks in the dictionary. In many situations, a practitioner is interested in a single task which even may not be in the dictionary. Here we evaluate how taxonomy transfers to a novel out-of-dictionary task with little data.

第二部分的分类法4.1为解决词典中的所有任务而优化。在许多情况下,从业者对单个任务感兴趣,甚至可能不在字典中。在这里,我们评估分类标准如何转移到一个数据量很少的新词典外的任务。

This is done in an all-for-one scenario where we put one task in T and all others in S. The task in T is target-only and has no task-specific network. Its limited data (16k) is used to train small transfer networks to sources. This basically localizes where the target would be in the taxonomy. Fig. 10 (left) shows the Gain and Quality of the transfer policy found by the BIP for each task. Fig. 10 (right) compares the taxonomy suggested policy against some of the best existing self-supervised methods [96, 103, 68, 100, 1], ImageNet FC7 features [51], training from scratch, and a fully supervised network (gold standard).

这是在一个全部为一的场景中完成的,我们在T中放置了一个任务,而在S中放置了一个任务。T中的任务仅限于目标,并没有任务特定的网络。其有限的数据(16k)被用来训练小型传输网络的来源。这基本上定位了目标在分类中的位置。图10(左)显示了BIP为每项任务找到的转让政策的收益和质量。图10(右)比较了分类建议策略与一些现有的最佳自我监督方法[96,103,68,100,1],ImageNet FC7功能[51],从头开始的培训和完全监督的网络(黄金标准)。

The results in Fig. 10 (right) are noteworthy. The large win margin for taxonomy shows that carefully selecting transfer policies depending on the target is superior to fixed transfers, such as the ones employed by self-supervised methods. ImageNet features which are the most popular off-the-shelf features in vision are also outperformed by those policies. Additionally, though the taxonomy transfer policies lose to fully supervised networks (gold standard) in most cases, the results often get close with win rates in 40% range. These observations suggests the space has a rather predicable and strong structure. For graph visualization of the all-for-one taxonomy policies please see the supplementary material. The solver website allows generating the taxonomy for arbitrary sets of target-only tasks.

图10(右)的结果值得注意。分类标准的大赢额表明,根据目标谨慎选择转移政策优于固定转移,例如自我监督方法所采用的转移政策。ImageNet功能是目前最流行的现成功能,其性能也优于这些策略。此外,尽管分类标准转移政策在大多数情况下都会失去完全监管的网络(黄金标准),但结果往往会在40%的范围内获胜。这些观察结果表明,空间具有相当可预测和强大的结构。对于所有的一分类政策的图形可视化,请参阅补充材料。求解器网站允许为任意目标专用任务生成分类。

image

Figure 11: Structure Significance. Our taxonomy compared with random transfer policies (random feasible taxonomies that use the maximum allowable supervision budget). Y-axis shows Quality or Gain, and X-axis is the supervision budget. Green and gray represent our taxonomy and random connectivities, respectively. Error bars denote 5th–95th percentiles.

图11:结构信号。我们的分类法与随机转移政策(使用最大允许监督预算的随机可行分类法)相比较。Y轴表示质量或增益,X轴表示监督预算。绿色和灰色分别代表我们的分类和随机连接。误差线表示第5至第95百分点。

5. Significance Test of the Structure

5.结构的信号检验

The previous evaluations showed good transfer results in terms of Quality and Gain, but how crucial is it to use our taxonomy to choose smart transfers over just choosing any transfer? In other words, how significant/strong is the discovered structure of task space? Fig. 11 quantifies this by showing the performance of our taxonomy versus a large set of taxonomies with random connectivities. Our taxonomy outperformed all other connectivities by a large margin signifying both existence of a strong structure in the space as well as a good modeling of it by our approach. Complete experimental details is available in supplementary material.

先前的评估显示,质量和收益方面的转移结果很好,但使用我们的分类法选择智能转账而不选择转移又有多关键?换句话说,发现的任务空间结构如何重要/强大?图11通过显示我们的分类法与大量具有随机连接性的分类法的性能来对此进行量化。我们的分类标准大大超过了所有其他连接性,表明在该空间中存在一个强大的结构,并且通过我们的方法建立了良好的模型。完整的实验细节可以在补充材料中获得。

5.1. Evaluation on MIT Places & ImageNet

5.1。对MIT Places和ImageNet进行评估

To what extent are our findings dataset dependent, and would the taxonomy change if done on another dataset? We examined this by finding the ranking of all tasks for transferring to two target tasks of object classification and scene classification on our dataset. We then fine tuned our taskspecific networks on other datasets (MIT Places [104] for scene classification, ImageNet [78] for object classification) and evaluated them on their respective test sets and metrics. Fig. 12 shows how the results correlate with taxonomy’s ranking from our dataset. The Spearman’s rho between the taxonomy ranking and the Top-1 ranking is 0.857 on Places and 0.823 on ImageNet showing a notable correlation. See supplementary material for complete experimental details.

我们的发现数据集在多大程度上依赖于数据集,如果在另一个数据集上完成分类,数据集是否会发生变化?我们通过查找所有转移到目标分类和场景分类目标任务的所有任务的排序来对此进行检查。然后,我们对其他数据集上的任务特定网络进行调整(MIT Places [104]用于场景分类,ImageNet [78]用于对象分类),并对它们各自的测试集和度量进行评估。图12显示了结果如何与我们数据集的分类排序相关联。Spearman的分类排名和Top-1排名之间的rho为0.857,ImageNet为0.823,显示出显着的相关性。请参阅补充材料以获取完整的实验细节。

5.2. Universality of the Structure

5.2。结构的普遍性

We employed a computational approach with various design choices. It is important to investigate how specific to those the discovered structure is. We did stability tests by computing the variance in our output when making changes in one of the following system choices: I. architecture of task-specific networks, II. architecture of transfer function networks, III. amount of data available for training transfer networks, IV. datasets, V. data splits, VI. choice of dictionary. Overall, despite injecting large changes (e.g. varying the size of training data of transfer functions by 16x, size and architecture of task-specific networks and transfer networks by 4x), we found the outputs to be remarkably stable leading to almost no change in the output taxonomy computed on top. Detailed results and experimental setup of each tests are reported in the supplementary material.

我们采用了各种设计选择的计算方法。调查发现的结构如何具体化是非常重要的。我们通过计算我们的输出中的变化来进行稳定性测试,当对以下系统选项之一进行更改时:I.任务特定网络的体系结构II。传递函数网络体系结构,III。 IV。可用于培训转移网络的数据量。数据集,V.数据拆分,VI。词典的选择。总的来说,尽管注入了很大的变化(例如,将传输函数的训练数据大小改变了16倍,任务特定网络和传输网络的大小和架构增加了4倍),但我们发现输出非常稳定,导致几乎没有变化输出分类标准在顶部计算。每个测试的详细结果和实验设置都在补充材料中报告。

image

Figure 12: Evaluating the discovered structure on other datasets: ImageNet [78] (left) for object classification and MIT Places [104] (right) for scene classification. Y-axis shows accuracy on the external benchmark while bars on x-axis are ordered by taxonomy’s predicted performance based on our dataset. A monotonically decreasing plot corresponds to preserving identical orders and perfect generalization.

图12:评估其他数据集上发现的结构:ImageNet [78](左)用于对象分类,MIT Places [104](右)用于场景分类。Y轴显示外部基准测试的准确性,而x轴上的小节则根据分类标准根据我们的数据集预测的性能进行排序。单调递减的图对应于保持相同的顺序和完美的泛化。

5.3. Task Similarity Tree

5.3。任务相似性树

Thus far we showed the task space has a structure, measured this structure, and presented its utility for transfer learning via devising transfer policies. This structure can be presented in other manners as well, e.g. via a metric of similarity across tasks. Figure 13 shows a similarity tree for the tasks in our dictionary. This is acquired from agglomerative clustering of the tasks based on their transferring-out behavior, i.e. using columns of normalized affinity matrix P as feature vectors for tasks. The tree shows how tasks would be hierarchically positioned w.r.t. to each other when measured based on providing information for solving other tasks; the closer two tasks, the more similar their role in transferring to other tasks. Notice that the 3D, 2D, low dimensional geometric, and semantic tasks are found to cluster together using a fully computational approach, which matches the intuitive expectations from the structure of task space. The transfer taxonomies devised by BIP are consistent with this tree as BIP picks the sources in a way that all of these modes are quantitatively best covered, subject to the given budget and desired target set.

到目前为止,我们已经展示了任务空间具有一个结构,测量了这个结构,并通过设计转移策略提出了转移学习的效用。该结构也可以以其他方式呈现,例如通过跨任务的相似性度量。图13显示了我们词典中任务的相似性树。这是通过基于任务转移行为对任务进行聚集聚类而获得的,即使用归一化亲和矩阵P的列作为任务的特征向量。树显示了任务如何在层次上定位w.r.t.当基于提供用于解决其他任务的信息进行测量时彼此相互关联;两个任务越接近,他们在转移到其他任务中的角色就越相似。请注意,发现3D,2D,低维几何和语义任务使用完全计算方法聚集在一起,该方法与任务空间结构中的直观预期相匹配。由BIP设计的转让分类法与该树一致,因为BIP选择来源的方式是所有这些模式在数量上最好涵盖了所有这些模式,但要根据给定的预算和期望的目标集。

6. Limitations and Discussion

6.局限和讨论

We presented a method for modeling the space of visual tasks by way of transfer learning and showed its utility in reducing the need for supervision. The space of tasks is an interesting object of study in its own right and we have only scratched the surface in this regard. We also made a number of assumptions in the framework which should be noted.

我们提出了一种通过转移学习的方式对视觉任务空间进行建模的方法,并展示了它在减少监督需求方面的实用性。任务的空间本身就是一个有趣的研究对象,我们在这方面只是抓住了表面。我们还在框架中提出了一些应该注意的假设。

image

Figure 13: Task Similarity Tree. Agglomerative clustering of tasks based on their transferring-out patterns (i.e. using columns of normalized affinity matrix as task features). 3D, 2D, low dimensional geometric, and semantic tasks clustered together using a fully computational approach.

图13:任务相似性树。基于其转出模式对任务进行聚集聚类(即使用归一化亲和度矩阵列作为任务特征)。 3D,2D,低维几何和语义任务使用完全计算方法聚集在一起。

Model Dependence: We used a computational approach and adopted neural networks as our function class. Though we validated the stability of the findings w.r.t various architectures and datasets, it should be noted that the results are in principle model and data specific. The current model also does not include a principled mechanism for handling uncertainty or probabilistic reasoning.

模型依赖:我们使用计算方法,并采用神经网络作为我们的功能类。虽然我们验证了各种体系结构和数据集的结果的稳定性,但应该注意,结果原则上是模型和数据特定的。目前的模型也不包括处理不确定性或概率推理的原理性机制。

Compositionality: We performed the modeling via a set of common human-defined visual tasks. It is natural to consider a further compositional approach in which such common tasks are viewed as observed samples which are composed of computationally found latent (sub)tasks.

合成性:我们通过一套常见的人工定义的视觉任务来进行建模。考虑进一步的组合方法是很自然的,在这种方法中,这些常见任务被视为观察样本,这些样本由计算发现的潜在(子)任务组成。

Space Regularity: We performed modeling of a dense space via a sampled dictionary. Though we showed a good tolerance w.r.t. to the choice of dictionary and transferring to out-of-dictionary tasks, this outcome holds upon a proper sampling of the space as a function of its regularity. More formal studies on properties of the computed space is required for this to be provably guaranteed for a general case. Transferring to Non-visual and Robotic Tasks: Given the structure of the space of visual tasks and demonstrated transferabilities to novel tasks, it is worthwhile to question how this can be employed to develop a perception module for solving downstream tasks which are not entirely visual, e.g. robotic manipulation, but entail solving a set of (a priori unknown) visual tasks.

空间规律性:我们通过一个采样词典对密集空间进行建模。尽管我们表现出了良好的宽容度。到字典的选择和转移到字典外的任务,这个结果对空间的适当采样作为其规律性的函数是有效的。需要对计算空间的属性进行更正式的研究,以便在一般情况下可证实这一点。转移到非视觉和机器人任务:给定视觉任务空间的结构并证明了新任务的可转移性,值得质疑如何利用这一点来开发用于解决下游任务的感知模块,例如机器人操纵,但需要解决一套(先验未知的)视觉任务。

Lifelong Learning: We performed the modeling in one go. In many cases, e.g. lifelong learning, the system is evolving and the number of mastered tasks constantly increase. Such scenarios require augmentation of the structure with expansion mechanisms based on new beliefs.

终身学习:我们一次性进行建模。在很多情况下,例如终身学习,体系不断发展,掌握的任务不断增多。这种情况需要根据新的信仰扩大机制。

Acknowledgement: We acknowledge the support of NSF (DMS-1521608), MURI (1186514-1-TBCJE), ONR MURI (N00014-14-1-0671), Toyota(1191689-1-UDAWF), ONR MURI (N00014-13-1-0341), Nvidia, Tencent, a gift by Amazon Web Services, a Google Focused Research Award.

文章引用于 http://tongtianta.site/paper/1750
编辑 Lornatang
校准 Lornatang

相关文章

网友评论

    本文标题:Taskonomy: Disentangling Task Tr

    本文链接:https://www.haomeiwen.com/subject/debxbqtx.html