InterAxis: Steering Scatterplot

InterAxis: Steering Scatterplot

作者: 温柔的谢世杰 | 来源:发表于2017-05-08 22:11 被阅读0次

    Abstract—Scatterplots are effective visualization techniques for multidimensional data that use two (or three) axes to visualize data
    items as a point at its corresponding x and y Cartesian coordinates. Typically, each axis is bound to a single data attribute. Interactive exploration occurs by changing the data attributes bound to each of these axes. In the case of using scatterplots to visualize
    the outputs of dimension reduction techniques, the x and y axes are combinations of the true, high-dimensional data. For these
    spatializations, the axes present usability challenges in terms of interpretability and interactivity. That is, understanding the axes
    and interacting with them to make adjustments can be challenging. In this paper, we present InterAxis, a visual analytics technique
    to properly interpret, define, and change an axis in a user-driven manner. Users are given the ability to define and modify axes by
    dragging data items to either side of the x or y axes, from which the system computes a linear combination of data attributes and binds
    it to the axis. Further, users can directly tune the positive and negative contribution to these complex axes by using the visualization
    of data attributes that correspond to each axis. We describe the details of our technique and demonstrate the intended usage through
    two scenarios.


    Index Terms—Scatterplots, user interaction, model steering


    Scatterplots are commonly utilized in visualizing relationships between two individual data attributes . The use of two orthogonal
    axes mapped to data attributes produces a Cartesian space where data
    objects can be charted. A basic strategy to form these axes in multidimensional data visualization is to assign each axis an individual
    feature or dimension originally given in a dataset. For example, plotting temperature over time on the y and x axes, respectively, generates a chart that can be used for understanding the relationship between
    these two data attributes. However, this has a severe scalability issue because two-dimensional (2D) scatterplots can represent only two
    features out of many at any given point of time.

    散点图通常用于可视化两个单独数据属性之间的关系。 使用两个正交
    映射到数据属性的轴产生数据的笛卡尔空间对象可以被绘制。 在多维数据可视化中形成这些轴的基本策略是将每个轴分配给个体特征或尺寸原始在数据集中给出。 例如,分别在y轴和x轴上绘制温度随时间的变化,生成可以用于理解关系的图表这两个数据属性。 然而,这具有严重的可扩展性问题,因为二维(2D)散点图可以仅表示两个在任何给定时间点的许多功能。

    Instead, an alternative strategy that better handles this scalability issue is dimension reduction, which involves multiple original features
    to represent each axis. Dimension reduction [21] is a popular technique used to transform high-dimensional data into lower-dimensional
    views (typically, 2D scatterplots). While a variety of approaches exist,
    their fundamental functionality is similar: to solve for distances between data points in a lower-dimensional space that closely represents
    the true distances between the points in a high-dimensional space. This
    is carried out by variations in solving for distance metrics from the

    相反,更好地处理这种可扩展性问题的替代策略是维度降低,其涉及多个原始特征以表示每个轴。 尺寸减小[21]是用于将高维数据转换为低维的流行技术视图(通常为2D散点图)。 虽然存在各种方法,它们的基本功能类似于:解决紧密代表的低维空间中的数据点之间的距离高维空间点之间的真实距离。 这个是通过解决距离度量的变化进行的数据。

    In the visual and perceptual understanding of a scatterplot, the interpretation of its axes plays a crucial role. That is, understanding what
    it means to have large/small values along the x or y axis significantly
    helps the users’ reasoning process about why the relationships among
    data items are close/remote in a scatterplot. In the case of traditional
    scatterplots where each axis is directly mapped to a particular data
    attribute (without any dimension reduction), this process is straightforward. However, this is not often the case when it comes to the axis
    of a 2D scatterplot generated by dimension reduction. One of the primary reasons is that only a limited set of dimension reduction methods
    provide the interpretability of the axes of a scatterplot. Such methods include traditional methods such as principal component analysis
    (PCA) [27] and linear discriminant analysis [23], which form an axis
    (or a reduced dimension) explicitly as a linear combination of the original data attributes. Through this linear combination representation of
    the original attributes, one can interpret the contribution of each original attribute to the axis. On the other hand, many other dimension
    reduction methods form each axis implicitly in terms of the original
    attributes, and thus they do not provide users with its clear meaning.
    Most advanced non-linear dimension reduction methods such as manifold learning [33] correspond to this case. Even worse, in some other
    popular methods such as multidimensional scaling (MDS) [31] and
    force-directed graph layout [22], these are rotation invariant, which
    means that the axis is not defined at all. Thus, communicating with
    users about the meaning of the axes resulting from dimension reduction techniques is an open challenge.


    Another issue with the scatterplot generated by dimension reduction lies in the lack of interactivity. Forming the axes via dimension
    reduction does not typically allow human intervention. In other words,
    most of the dimension reduction methods are performed in a fully automated manner on the basis of their own pre-defined mathematical
    criteria, and thus, diverse user needs and task goals are not considered
    in this process. For instance, the PCA criterion, which maximally preserves the total variance of data, may not align well with the goal of
    a user’s task. While MDS attempts to preserve all pairwise distances
    with equal weights, one may want to focus on a subset of data points,
    e.g., a local region in a scatterplot, at a time.
    Motivated by these challenges, we propose a novel interactive
    knowledge specification method for multidimensional data visualization, which is an alternative to the purely automatic process of generating a scatterplot via dimension reduction. The proposed method interactively forms an axis, thereby generating a corresponding scatterplot
    in a user-driven manner. The key novelty of the proposed method lies
    in the direct and seamless incorporation of user-selected data items for
    characterizing the axis during the data exploration process. Our technique enables users to create and modify the axes by dragging data
    objects to the high and low locations on both the x and y axes. The
    proposed method defines the meaning of an axis accordingly in the
    form of a linear combination of original data features, similar to the
    output of linear dimension reduction methods. Such a user-driven linear combination of data attributes is visualized on each axis, showing
    the positive or negative contribution of each attribute to the axis. Finally, users can continually refine the axes by dragging additional data
    points to the axes, or by directly adjusting the contribution of the data
    attributes as part of the linear combination.


    The primary contributions of this work include the following:
    • a visual analytics technique for directly creating, modifying, and
    visualizing complicated axes formed by a linear combination of
    data attributes
    • a user interaction technique enabling seamless interactivity via
    both data objects and data attributes to steer the meaning of the
    • a visual analytics technique to help users discover and weigh data


    The rest of this paper is organized as follows: Section 2 discusses related work. Section 3 describes our proof-of-concept visual analytics
    system along with how the proposed interaction techniques are performed from the perspectives of both the front end and the back end,
    followed by a discussion about our design rationale. Section 4 presents
    several usage scenarios showcasing the advantages of the proposed interaction techniques. Section 5 presents in-depth discussions about the
    limitations of our interaction techniques as well as potential directions
    for improving them. Finally, Section 6 concludes the paper with some
    future work.

    本文的其余部分组织如下:第二部分讨论相关工作。 第3节(怎么实现)描述了我们的概念验证视觉分析系统以及从前端和后端的角度如何执行所提出的交互技术,其次是关于我们的设计理念的讨论。 第4节(使用场景)介绍几种使用场景展示了所提出的交互技术的优点。 第5节对此进行了深入的讨论我们的互动技术的局限性以及潜在的方向改善他们。 最后,第6节总结了一些文章未来的工作。

    2.1 Multiattribute Data Visualization

    Fig. 2. A scatterplot generated by Tableau [41]. Users can interactively explore data by selecting and changing the bindings between
    data attributes and axes.


    图2,Tableau [41]生成的散点图。 用户可以通过选择和更改两者之间的绑定来交互地探索数据数据属性和轴。

    The design space for visualization techniques for representing multiattribute data is large [28]. For example, the existing techniques include iconic displays [6], transforming displays based on geometric
    characteristics [13], and stacked visual representations [32]. Among
    these many techniques, one commonly used technique is the scatterplot [12, 20, 45], owing to the visual simplicity and cultural familiarity
    of such charts [43]. Scatterplots (such as the one shown in Fig. 2) represent data on a Cartesian plane defined by the two graphical axes (the
    x and the y axes). Three-dimensional scatterplots are also an available
    option, but their use in information visualization is limited given the
    perceptual and visual challenges [38, 47]. Systems that enable users to
    generate scatterplots include Tableau [41], GGobi [40], Matlab [34],
    Spotfire [1], and Microsoft Excel [19]. One basic user interaction supported by scatterplots is to select and change the mapping of the axes
    to data attributes (Fig. 2).
    Other kinds of high-dimensional data have also been visualized in
    the form of a scatterplot based on dimension reduction, including education performance data, census data [18], wine characteristics [5],
    facial images [8], and text documents [7].

    这些许多技术,一种常用的技术是散点图[12,20,45],由于视觉简洁和文化熟悉度的这样的图表[43]。散点图(如图2所示)表示由两个图形轴定义的笛卡尔坐标平面上的数据(x和y轴)。三维散点图也是可用的选项,但它们在信息可视化中的使用受到感知和视觉挑战的限制[38,47]。允许用户使用生成散点图的系统包括Tableau [41],GGobi [40],Matlab [34],
    Spotfire [1]和Microsoft Excel [19]。通过散点图支持的一个基本用户交互是选择和更改轴的映射到数据属性(图2)。

    Fig. 3. A scatterplot matrix (adapted from [15]) showing all individual
    pairwise feature scatterplots of an 8-dimensional dataset



    Fig. 4. A Galaxy View generated by IN-SPIRE [48] showing a scatterplot of documents (dots)


    图4. IN-SPIRE [48]生成的Galaxy View,显示文件散点图(点)

    As dataset complexities increase, often, the number of data attributes to select from increases as well. This causes situations where
    directly selecting one out of hundreds or thousands of data attributes
    can be less than optimal. As such, different types of techniques exist
    to show more combinations of data attributes simultaneously. For example, multiple scatterplots can be arranged into a single view called a scatterplot matrix [12]. A scatterplot matrix (such as the example
    shown in Fig. 3, adapted from [15]) binds data attributes to rows and
    columns so that each cell in the matrix can represent a single scatterplot. As such, users do not have to individually bind data attributes to
    the axes and interactively choose among the potentially large number
    of choices

    随着数据集复杂性的增加,通常选择的数据属性数量也会增加。 这导致了情况,直接从数以百计的数据属性中直接选择一个不是最佳的。 因此,存在不同类型的技术同时显示更多的数据属性组合。 例如,可以将多个散点图排列成称为散点图矩阵的单个视图[12]。 散点图矩阵(如示例
    如图3,改编自[15])将数据属性绑定到行和列,使得矩阵中的每个单元格可以表示单个散点图。 因此,用户不必单独绑定数据属性

    2.2 Applications of Dimension Reduction in Information Visualization


    When using dimension reduction for visualization purposes, the goal
    is to provide a low-dimensional view, typically a 2D scatterplot, in
    a manner that the original high-dimensional distances between data
    points are maximally preserved in the resulting 2D views. These
    views often show spatial clusters or groups of data representing coherent contents. The widely used dimension reduction methods used
    for visualization include PCA [27], MDS [31], self-organizing map
    (SOM) [29], and generative topographic mapping (GTM) [3]. Recently, t-distributed stochastic neighbor embedding [46] has been proposed as a dimension reduction method, which is particularly suitable for generating 2D scatterplots that can reveal meaningful insights
    about data such as clusters and outliers

    是提供一个低维度的视图,通常是2D散点图,初始化高维数据点之间的距离需要最大程度的表现在2维视图中。 这些视图通常显示表示相干内容的空间群集或数据组。 使用广泛使用的降维方法
    可视化包括PCA [27],MDS [31],自组织图
    (SOM)[29]和生成地形图(GTM)[3]。 最近,t分布随机相邻嵌入[46]已经被提出作为一种维数减小方法,特别适用于生成可以揭示有意义的见解的二维散点图关于诸如集群和异常值之类的数据

    To date, these methods have been actively adopted in visual analytics systems. For example, IN-SPIRE [48], a well-known visual analytics system for document analysis, provides a Galaxy View (as shown
    in Fig. 4) that visualizes text corpora spatially by showing the pairwise similarity between documents as their distance in a 2D space.
    As a result, groups and clusters emerge, which can be perceived as
    the sets of similar documents, based on the geographic "near=similar"
    metaphor [39]. More recently, a visual analytics system applicable to
    more general high-dimensional data types including documents and
    images has been proposed, allowing a user to explore the diverse aspects of data by applying various dimension reduction methods to generate different scatterplot visualizations [9].
    Other kinds of high-dimensional data have also been visualized in
    the form of a scatterplot based on dimension reduction, including education performance data, census data [18], wine characteristics [5],
    facial images [8], and text documents [7].

    迄今为止,这些方法已经在视觉分析系统中得到积极应用。例如,IN-SPIRE [48],用于文档分析的知名视觉分析系统提供了一个Galaxy View(如图4)通过显示文档之间的成对相似性作为它们在2D空间中的距离,在空间上可视化文本语料库。结果,群体和集群出现,这可以被认为是各类相似的文件,基于地理“近=相似”比喻[39]。最近,一个视觉分析系统适用于更一般的高维数据类型包括文档和已经提出了图像,允许用户通过应用各种维度降低方法来生成不同的散点图可视化来探索数据的不同方面[9]。其他类型的高维数据也已被可视化基于维度降低的散点图形式,包括教育绩效数据,人口普查数据[18],葡萄酒特征[5]面部图像[8]和文本文档[7]。

    2.3 Interactivity for Dimension Reduction in Information


    In general, the axes created via dimension reduction techniques are defined by linear or non-linear combinations of original data dimensions.
    This complexity can lead to trust and interpretation challenges for domain experts exploring their data visually [10]. For example, users
    may question whether their interpretation of a pattern is trustworthy or
    if it is just an artifact of a dimension reduction technique. More fundamentally, using only two dimensions to represent considerably higherdimensional data inevitably involves significant information loss and
    distortion. To overcome these issues, various user interactions have
    been employed in numerous visual analytics systems.
    One approach to user interaction is via direct manipulation of dimension reduction model parameters. For example, Jeong et al.
    presented iPCA, a visual analytics application that visualizes highdimensional data in a 2D scatterplot using PCA [26]. They utilize
    graphical controls (e.g., sliders) to enable users to directly manipulate
    the weight on the principal components used in PCA. As a result, the
    adjustments by the user generate a new projection (i.e., a new scatterplot). Similar interaction guidelines have been used by other applications, such as a text visualization system called STREAMIT [2].
    A different set of techniques for incorporating user interactions into
    such visual analytics systems also exists. Semantic interaction techniques function by inferring model updates based on direct interactions performed in the visualization [16, 17]. For example, Endert et
    al. have shown how directly manipulating the position of points in a
    2D scatterplot can be used for inferring the parameters of PCA, MDS,
    and GTM [18]. These inferences can also be used for exporting the
    specification of distance functions computed in the dimension reduction step so that they can be reused, shared, or simply saved [5].
    Other than manipulating data items to interact with scatterplots, researchers have studied the interaction techniques that manipulate features or dimensions. Yi et al. have presented a technique called Dust
    & Magnet that allows users to additionally place features or dimensions on top of a scatterplot themselves to see which data items have
    large values of these features or dimensions [49]. For text analysis, the
    VIBE system allows users to perform similar interactions with keywords [35]. In addition, Turkay et al. proposed a technique using
    dual scatterplots one of which shows data items while the other shows
    features [44]. By providing brushing and linking as well as filtering
    operations on both data items and features in these dual scatterplots,
    users can check major patterns as well as outliers among data items
    and among features.The technique proposed in this paper follows a similar idea of interacting with both data items and features, but the main novelty of
    the proposed technique against the existing work lies in the capability
    of directly defining and interpreting the axes of the 2D scatterplot by
    assigning the data items of our interest to the axes. In this respect, our
    work is related to PivotSlice, a technique recently proposed by Zhao
    et al. that allows faceted browsing of high-dimensional data [50], as
    it allows users to specify data attributes on axes of the scatterplot by
    directly dragging the attribute to the axis. However, our technique enables users to drag data objects (instead of data attributes) to the axis.
    Further, the proposed technique does not divide the scatterplot into a
    multifaceted view.
    Furthermore, a technique called flexible linked axes [11] has a relationship with our work from a different aspect. That is, this technique
    is a different type of interaction that allows users to draw axes on a canvas, where scatterplots can be generated between any two neighboring
    axes. However, the main goal of this technique is fundamentally different from ours in that it attempts to flexibly coordinate and place
    multiple scatterplots on a large canvas, while our focus is on improving a single scatterplot for better supporting the interactive exploration
    of data based on a more sophisticated, user-driven axis specification.
    Further, Kondo and Collins have shown how directly interacting with
    visualizations can be used for revealing temporal trends and relationships between data items [30]. Their work allowed users to manipulate
    the position of data points in a scatterplot to reveal the temporal trends
    in data, again enabling interactions directly on the data items in a scatterplot to parameterize a data model.

    用户交互的一种方法是通过直接操纵维度降低模型参数。例如,Jeong et al。提出了iPCA,一种视觉分析应用程序,可以使用PCA在2D散点图中显示高维数据[26]。他们利用图形控件(例如滑块),以使用户能够直接操纵PCA中使用的主要成分的重量。结果,用户的调整产生新的投影(即新的散点图)。其他应用程序也使用了类似的交互指南,例如名为STREAMIT [2]的文本可视化系统。用于将用户交互纳入的一组不同的技术
    这样的视觉分析系统也存在。语义交互技术通过基于在可视化中执行的直接交互来推断模型更新而起作用[16,17]。例如,Endert et
    人。已经表明如何直接操纵一个点的位置2D散点图可用于推断PCA,MDS,和GTM [18]。这些推论也可以用于出口在维度降低步骤中计算出的距离函数的规范,以便可以重用,共享或简单地保存[5]。
    除了操纵数据项与分散图进行交互之外,研究人员还研究了操纵特征或尺寸的相互作用技术。 Yi等已经提出了一种称为尘埃的技术
    et al。这允许分面浏览高维数据[50],as

    To realize the proposed interaction technique, we built a proof-ofconcept visual analytics system. In this section, we describe (1) the
    overall design of the proposed visual analytics system, (2) the proposed interaction to steer the axis in a user-driven manner, (3) the underlying mathematical details to support the proposed user interaction,
    (4) the design rationale, and (5) the implementation details of the proposed system.

    为了实现所提出的交互技术,我们构建了一个验证视觉分析系统。 在本节中,我们将描述(1)

    3.1 System Design
    As shown in Fig. 1 by using the well-known Car dataset, which consists of 387 data items with 18 attributes,1 the proposed system mainly
    contains three panels: (1) the scatterplot view (Fig. 1(A)), (2) the
    axis interaction panel to support the proposed interaction capabilities
    (Fig. 1(B-D)), and the data detail view (Fig. 1(E)).
    The user interaction technique presented in this paper fosters a visual data exploration process grounded in the principles of semantic
    interaction techniques [16, 17]. That is, the system interprets the analytical reasoning of exploratory user interactions to steer the underlying data model. The generic workflow supported by our user interaction technique is as follows:

    1. The user observes two data points that define the difference between the two semantic groupings (e.g., “nice cars” and “bad
    2. The user drags one data item to each side of the axis.
    3. Interaxis computes the weighting of data attributes that supports
      these higher-level groupings (Eq. 1). The weights are displayed
      in the bar chart below the axis.
    4. The scatterplot updates to reflect the newly defined axis, where
      data items are placed according to the similarity on either side of
      the axis (Eq. 2).
    5. The user can refine the semantic grouping by adding/removing
      data points or directly modifying the weighting in the visualization below the axes.
    6. The user can save the axis for future use and continue to explore
      the visualization iteratively by using the same interaction concept
      based on different semantic groupings.

    The scatterplot view provides a 2D overview of the data. By default,
    the first and the second features of data, e.g., Retail Price and HP
    (Horsepower), are assigned to the x and the y axes, respectively, but
    this initial view can be set up by using a dimension reduction method
    such as PCA [27] to provide another starting point. Data points are represented as semi-transparent circles so that regions with overlapped
    data points can be highlighted. The scatterplot view supports zoom
    and pan via mouse wheel operations on a white space (to zoom on
    both axes simultaneously) or over a particular axis (to zoom only on
    this axis). Hovering over or clicking on a data point, one can check the
    full details (or the original high-dimensional information) of the data
    item in the data detail view (Fig. 1(E)).
    The axis interaction panel consists of two drop zones (the high-end
    and the low-end of each axis), which the user drags data points into in
    order to steer the axis (Fig. 1(B)), an interactive bar chart (Fig. 1(C)),
    and a sub-panel (Fig. 1(D)) containing buttons to save the current axis
    for further use or to clear the data points currently assigned to the axis
    and a combo box to change the axis back to one among the original
    features or the previously defined axes. The bars in the interactive
    bar chart represent the contributions/weights of attributes to the corresponding axis. The longer the length of a bar is, the stronger its corresponding attribute contributes to the axis. The bars are color-coded
    by the signs of their weights: positive contributions in blue and negative contributions in red. Data points that are high on the positively
    weighted (blue-colored) attributes will be placed on the high-end side
    of the axis. Data points that are high on the negatively weighted attributes will be placed on the low-end side of the axis. For example,
    in Fig. 1(C), sedans tend to be on the left side of the scatterplot, while
    sports cars and cars with rear-wheel drive (RWD) tend to be on the
    right side. Positive and negative weights represent the magnitude and
    at which end of the axis the data points with those attributes will be

    散点图提供了数据的2D概述。默认,数据的第一和第二个特征,例如零售价和HP(马力)分别分配给x轴和y轴,但是可以通过使用尺寸缩小方法来设置此初始视图如PCA [27]提供了另一个起点。数据点被表示为半透明圆圈,使得具有重叠的区域数据点可以突出显示。散点图视图支持缩放
    在图1中。 1(C),轿车往往位于散点图的左侧,而

    Fig. 1. An overview of the proposed visual analytics system, InterAxis, showing a car dataset, which includes 387 data items with
    18 attributes. The proposed system contains three panels: (A) the scatterplot view to provide a two-dimensional overview of data,
    (B-D) the axis interaction panel to support the proposed interaction capabilities, and (E) the data detail view to show the original
    high-dimensional information of the data items of interest. The axis interaction panel (B-D) consists of (B) two drop zones (the
    high-end and the low-end of each axis), which a user drags data points into in order to steer the axis, (C) an interactive bar chart,
    and a sub-panel containing buttons to save the current axis for future use (D, middle) or to clear the data points currently assigned
    to the axis (D, right) and a combo box to change the axis back to one among the original features or the previously created axes
    via our interaction (D, left).

    图1.提出的视觉分析系统的概述,InterAxis,显示一个汽车数据集,其中包括387个数据项 18个属性。 所提出的系统包含三个面板:(A)散点图视图以提供数据的二维概述,(B-D)轴互动面板支持提出的交互能力,(E)数据详细视图显示原始感兴趣的数据项的高维信息。 轴相互作用面板(B-D)由(B)两个放置区组成


    3.2 Interactive Axis Steering
    The proposed method provides two types of interactions: (1) data-level
    axis steering and (2) attribute-level axis manipulation. Data-level axis
    steering is prompted by dragging a data point from the scatterplot into
    the two drop zones at the high- and the low- end of the axis. Attributelevel axis manipulation is prompted by directly adjusting the bars in
    the interactive bar chart.
    The main idea of the proposed interaction for steering the axis in
    a user-driven manner lies in an intuitive process of incorporating data
    items seamlessly while exploring data in a scatterplot. For example,
    when a user finds data points that he likes (or dislikes) in the scatterplot, he can drag them to the high-end (or the low-end) drop zone of
    an axis (Fig. 1(B)). Accordingly, a new axis is formed by reflecting
    these choices of data items, which will then update the scatterplot on
    the basis of the newly formed axis. The technical details about how
    we form a new axis will be described in the next section.
    How the axis is formed from this process is summarized and visualized as a bar chart (Fig. 1(C)) so that a user can get an idea about
    how much a particular original feature or dimension is emphasized or
    de-emphasized. Given such a bar chart, a user can further refine the
    meaning of an axis by directly manipulating the length of each bar
    via drag-and-drop operations on the tip of the bar (attribute-level axis
    The entire interaction process can be dynamic and iterative. That is,
    a user can additionally assign new data items to an axis or remove data
    items that was already assigned to an axis. Furthermore, the abovedescribed direct manipulation on the bar chart can be performed at
    any moment during such an interactive exploration of the bar chart.
    Finally, a user can save the current definition of an axis, and then it is
    registered as a new entry in the combo box (Fig. 1(D, left)) so that a
    user can later recover the axis to a previously saved one.


    3.3 Underlying Techniques根本技术
    In this section, we describe the underlying technique for the proposed
    user interaction of forming the axis via data items. For the sake of
    brevity, we consider only the x axis (the horizontal axis) in a scatterplot, but the following description can be generalized to the y axis in
    the same manner

    用户通过数据项形成轴的交互作用。 为了

    Data preprocessing. As will be discussed later, the underlying
    model to define the axis is based on a linear combination of the original dimensions. To this end, we adopt data preprocessing steps used in linear regression models [14]. For a categorical variable with c different categories, we use dummy encoding, which converts it to a cdimensional indicator vector where the value of each dimension is 1
    if a data item is in the category of the corresponding dimension and
    0 otherwise. Next, we scale and translate each dimension (including
    both indicator and numerical variables) so that its value is exactly in
    the range from 0 to 1
    Linear transformation. Assuming that such data preprocessing is
    done, we denote a set of high-dimensional vectors of data items that
    the user assigned (via a drag-and-drop) to the high-end of the x axis
    as , ax n,xh,h� and a set of those that he dragged into
    the low-end side of the x axis as�, where
    x,h and nx,l represent the total number of the assigned points to the
    high-end and the low-end of the x axis, respectively. Now, we define
    the linear transformation vector for the x axis as follows:

    This is then further scaled to have a unit Euclidean norm.
    One can define the linear transformation vector T
    y for the y axis
    in the same manner. Every data item is mapped to the x axis (and
    the y axis) via the transformation Tx (and Ty). That is, the i-th data
    item whose high-dimensional vector is represented as ai is mapped to
    a point in our 2D scatterplot so that its 2D coordinates are represented
    as follows:

    Owing to the easy interpretability of this linear model, one can understand the meaning of this transformation in a straightforward manner. That is, the resulting x axis basically emphasizes the features
    or dimensions that have large values on the high-dimensional vectors
    contained in Ax,h but have low values on those in Ax,l. On the other
    hand, we de-emphasize the features that have low values on the vectors
    contained in Ax,h but have high values on those in Ax,l. In this manner,
    as a data item has larger (or lower) values on these emphasized dimensions and lower (or higher) values on the de-emphasized dimensions,
    its x coordinate will have a higher (or lower) value, appearing more on
    the right (or left) side of the x axis. The notations used in this section
    are summarized in Table 1.

    as,ax n,xh,h?和一组他拖入的那些





          本文标题:InterAxis: Steering Scatterplot
