前端智能化漫谈 (1) - pix2code

作者: Jtag特工 | 来源:发表于2019-07-25 16:25 被阅读0次

前端智能化漫谈 (2) - pix2code实战篇
前端智能化漫谈 (4) - pix2code结果编辑距离分析
前端智能化漫谈 (3) - pix2code推理部分解析
前端智能化漫谈 (1) - pix2code
未来已来，漫谈纺纱智能化
漫谈前端开发
漫谈前端之路
前端开发漫谈
漫谈 Clustering
「可视化搭建系统」——从设计到架构，探索前端的领域和意义

前端智能化漫谈 (1) - pix2code

自从有了GUI图形界面，就诞生了跟图形界面打交道的开发工程师，其中最大的一拨就演化成现在的前端工程师。不管是工作在前端、移动端还是桌面客户端，跟界面布局和切图等工作打交道是工作中的重要一部分。能够直接从设计稿生成代码，不仅是前端工程师的梦想，也是很多设计师同学的期望。
2017年，一篇名为《pix2code: Generating Code from a Graphical User Interface Screenshot》的论文横空出世，立刻引发了广泛关注。

pix2code是做什么的

如下图所示，pix2code通过将屏幕截图与对应的DSL描述通过深度神经网络进行训练，然后给出一张新图去进行推理得出一个新的DSL描述，最后再通过代码生成器变成目标平台上的代码。

pix2code的过程

下面我们分别看下在Android平台，iOS平台和Web上的例子。

Android平台样例

Android例

对应的DSL如下：

stack {
row {
btn, switch
}
row {
radio
}
row {
label, slider, label
}
row {
switch
}
row {
switch, switch, btn
}
row {
btn, btn
}
row {
check
}
}
footer {
btn-dashboard, btn-dashboard, btn-home
}

Web样例

我们再看一个web的样例：

0B660875-60B4-4E65-9793-3C7EB6C8AFD0.png

对应的DSL如下：

header {
btn-inactive, btn-inactive, btn-inactive, btn-active, btn-inactive
}
row {
single {
small-title, text, btn-red
}
}

iOS平台样例

0A967511-9D13-40D3-9A95-20125CDB25CD.png

对应的DSL如下：

stack {
row {
img, label
}
row {
label, slider, label
}
row {
label, switch
}
row {
label, btn-add
}
row {
img, label
}
row {
label, slider, label
}
row {
img, img, img
}
row {
label, btn-add
}
}
footer {
btn-search, btn-search, btn-download, btn-more
}

pix2code的原理

pix2code的原理是将抓图通过卷积网络提取特征，同时将DSL通过LSTM循环神经网络进行训练，二者再统一放到一个循环神经网络中进行训练。
推理的时候，只有抓图进入卷积网络，DSL序列为空，输出结果为一DSL序列。

pix2code训练和推理的过程

废话不多说，直接上代码。

一些参数

首先是一些配参数，比如输入形状，训练轮数等：

CONTEXT_LENGTH = 48
IMAGE_SIZE = 256
BATCH_SIZE = 64
EPOCHS = 10
STEPS_PER_EPOCH = 72000

模型的保存和读取

训练的网络权值不能浪费了，AModel提供S/L大法：

from keras.models import model_from_json

class AModel:
    def __init__(self, input_shape, output_size, output_path):
        self.model = None
        self.input_shape = input_shape
        self.output_size = output_size
        self.output_path = output_path
        self.name = ""

    def save(self):
        model_json = self.model.to_json()
        with open("{}/{}.json".format(self.output_path, self.name), "w") as json_file:
            json_file.write(model_json)
        self.model.save_weights("{}/{}.h5".format(self.output_path, self.name))

    def load(self, name=""):
        output_name = self.name if name == "" else name
        with open("{}/{}.json".format(self.output_path, output_name), "r") as json_file:
            loaded_model_json = json_file.read()
        self.model = model_from_json(loaded_model_json)
        self.model.load_weights("{}/{}.h5".format(self.output_path, output_name))

卷积网络

卷积网络部分，6个卷积层分三段，最后是两个1024节点的全连接网络

        image_model = Sequential()
        image_model.add(Conv2D(32, (3, 3), padding='valid', activation='relu', input_shape=input_shape))
        image_model.add(Conv2D(32, (3, 3), padding='valid', activation='relu'))
        image_model.add(MaxPooling2D(pool_size=(2, 2)))
        image_model.add(Dropout(0.25))

        image_model.add(Conv2D(64, (3, 3), padding='valid', activation='relu'))
        image_model.add(Conv2D(64, (3, 3), padding='valid', activation='relu'))
        image_model.add(MaxPooling2D(pool_size=(2, 2)))
        image_model.add(Dropout(0.25))

        image_model.add(Conv2D(128, (3, 3), padding='valid', activation='relu'))
        image_model.add(Conv2D(128, (3, 3), padding='valid', activation='relu'))
        image_model.add(MaxPooling2D(pool_size=(2, 2)))
        image_model.add(Dropout(0.25))

        image_model.add(Flatten())
        image_model.add(Dense(1024, activation='relu'))
        image_model.add(Dropout(0.3))
        image_model.add(Dense(1024, activation='relu'))
        image_model.add(Dropout(0.3))

        image_model.add(RepeatVector(CONTEXT_LENGTH))

        visual_input = Input(shape=input_shape)
        encoded_image = image_model(visual_input)

文本处理网络

文本处理部分使用两个LSTM

        decoder = LSTM(512, return_sequences=True)(decoder)
        decoder = LSTM(512, return_sequences=False)(decoder)
        decoder = Dense(output_size, activation='softmax')(decoder)

        self.model = Model(inputs=[visual_input, textual_input], outputs=decoder)

        optimizer = RMSprop(lr=0.0001, clipvalue=1.0)
        self.model.compile(loss='categorical_crossentropy', optimizer=optimizer)

图像和文本串联在一起

图像和文本都处理好之后，我们将其并联在一起：

        decoder = concatenate([encoded_image, encoded_text])

并联好之后，我们再用另外两级LSTM网络来进行训练。

        decoder = LSTM(512, return_sequences=True)(decoder)
        decoder = LSTM(512, return_sequences=False)(decoder)
        decoder = Dense(output_size, activation='softmax')(decoder)

整个网络构建好之后，我们就可以进行训练了。

        self.model = Model(inputs=[visual_input, textual_input], outputs=decoder)

        optimizer = RMSprop(lr=0.0001, clipvalue=1.0)
        self.model.compile(loss='categorical_crossentropy', optimizer=optimizer)

fit的过程中会将之前参数文件中的EPOCHS和BATCH_SIZE读进来，如下：

    def fit(self, images, partial_captions, next_words):
        self.model.fit([images, partial_captions], next_words, shuffle=False, epochs=EPOCHS, batch_size=BATCH_SIZE, verbose=1)
        self.save()

进行推理时的代码如下：


    def predict(self, image, partial_caption):
        return self.model.predict([image, partial_caption], verbose=0)[0]

    def predict_batch(self, images, partial_captions):
        return self.model.predict([images, partial_captions], verbose=1)

支持的DSL

最后我们看一下pix2code具体都支持哪些DSL组件。

Android DSL

Android平台支持16种DSL
其中对应到控件的有8个：

stack
row
label
btn
slider
check
radio
switch
辅助性的DSL4个：
opening-tag
closing-tag
body
footer
还有4种在FooterBar上的按钮：
btn-home
btn-dashborad
btn-notifications
btn-search

与Android代码的对应表如下：

{
  "opening-tag": "{",
  "closing-tag": "}",
  "body": "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<LinearLayout\n    xmlns:android=\"http://schemas.android.com/apk/res/android\"\n    xmlns:app=\"http://schemas.android.com/apk/res-auto\"\n    xmlns:tools=\"http://schemas.android.com/tools\"\n    android:id=\"@+id/container\"\n    android:layout_width=\"match_parent\"\n    android:layout_height=\"match_parent\"\n    android:orientation=\"vertical\"\n    tools:context=\"com.tonybeltramelli.android_gui.MainActivity\">\n    {}\n</LinearLayout>\n",
  "stack": "<FrameLayout android:id=\"@+id/content\" android:layout_width=\"match_parent\" android:layout_height=\"match_parent\" android:layout_weight=\"1\" android:padding=\"10dp\">\n    <LinearLayout android:layout_width=\"match_parent\" android:layout_height=\"match_parent\" android:orientation=\"vertical\">\n        {}\n    </LinearLayout>\n</FrameLayout>",
  "row": "<LinearLayout android:layout_width=\"match_parent\" android:layout_height=\"wrap_content\" android:orientation=\"horizontal\" android:paddingTop=\"10dp\" android:paddingBottom=\"10dp\" android:weightSum=\"1\">\n{}\n</LinearLayout>",
  "label": "<TextView android:id=\"@+id/[ID]\" android:layout_width=\"wrap_content\" android:layout_height=\"wrap_content\" android:text=\"[TEXT]\" android:textAppearance=\"@style/TextAppearance.AppCompat.Body2\"/>\n",
  "btn": "<Button android:id=\"@+id/[ID]\" android:layout_width=\"wrap_content\" android:layout_height=\"wrap_content\" android:text=\"[TEXT]\"/>",
  "slider": "<SeekBar android:id=\"@+id/[ID]\" style=\"@style/Widget.AppCompat.SeekBar.Discrete\" android:layout_width=\"wrap_content\" android:layout_height=\"wrap_content\" android:layout_weight=\"0.9\" android:max=\"10\" android:progress=\"5\"/>",
  "check": "<CheckBox android:id=\"@+id/[ID]\" android:layout_width=\"wrap_content\" android:layout_height=\"wrap_content\" android:paddingRight=\"10dp\" android:text=\"[TEXT]\"/>",
  "radio": "<RadioButton android:id=\"@+id/[ID]\" android:layout_width=\"wrap_content\" android:layout_height=\"wrap_content\" android:paddingRight=\"10dp\" android:text=\"[TEXT]\"/>",
  "switch": "<Switch android:id=\"@+id/[ID]\" android:layout_width=\"wrap_content\" android:layout_height=\"wrap_content\" android:paddingRight=\"10dp\" android:text=\"[TEXT]\"/>",
  "footer": "<LinearLayout android:layout_width=\"match_parent\" android:layout_height=\"wrap_content\" android:orientation=\"horizontal\" android:weightSum=\"1\">\n    {}\n</LinearLayout>",
  "btn-home": "<Button android:id=\"@+id/[ID]\" android:layout_width=\"wrap_content\" android:layout_height=\"wrap_content\" android:background=\"#0ffffff\" android:layout_weight=\"1\" android:drawableBottom=\"@drawable/ic_home_black_24dp\" android:text=\"\"/>",
  "btn-dashboard": "<Button android:id=\"@+id/[ID]\" android:layout_width=\"wrap_content\" android:layout_height=\"wrap_content\" android:background=\"#0ffffff\" android:layout_weight=\"1\" android:drawableBottom=\"@drawable/ic_dashboard_black_24dp\" android:text=\"\"/>",
  "btn-notifications": "<Button android:id=\"@+id/[ID]\" android:layout_width=\"wrap_content\" android:layout_height=\"wrap_content\" android:background=\"#0ffffff\" android:layout_weight=\"1\" android:drawableBottom=\"@drawable/ic_notifications_black_24dp\" android:text=\"\"/>",
  "btn-search": "<Button android:id=\"@+id/[ID]\" android:layout_width=\"wrap_content\" android:layout_height=\"wrap_content\" android:background=\"#0ffffff\" android:layout_weight=\"1\" android:drawableBottom=\"?android:attr/actionModeWebSearchDrawable\" android:text=\"\"/>"
}

iOS DSL

iOS有15个DSL项
常用控件6种：

stack
row
img
label
switch
slider
辅助结构3个：
opening-tag
closing-tag
body
特殊结构6个：
btn-add
footer
btn-search
btn-contact
btn-download
btn-more

{
  "opening-tag": "{",
  "closing-tag": "}",
  "body": "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n<document type=\"com.apple.InterfaceBuilder3.CocoaTouch.Storyboard.XIB\" version=\"3.0\" toolsVersion=\"11201\" systemVersion=\"15G1217\" targetRuntime=\"iOS.CocoaTouch\" propertyAccessControl=\"none\" useAutolayout=\"YES\" useTraitCollections=\"YES\" colorMatched=\"YES\">\n    <dependencies>\n        <deployment identifier=\"iOS\"/>\n        <plugIn identifier=\"com.apple.InterfaceBuilder.IBCocoaTouchPlugin\" version=\"11161\"/>\n        <capability name=\"documents saved in the Xcode 8 format\" minToolsVersion=\"8.0\"/>\n    </dependencies>\n    <scenes>\n        <!--View Controller-->\n        <scene sceneID=\"qAw-JF-viq\">\n            <objects>\n                <viewController id=\"[ID]\" sceneMemberID=\"viewController\">\n                    <layoutGuides>\n                        <viewControllerLayoutGuide type=\"top\" id=\"[ID]\"/>\n                        <viewControllerLayoutGuide type=\"bottom\" id=\"[ID]\"/>\n                    </layoutGuides>\n                    <view key=\"view\" contentMode=\"center\" id=\"[ID]\">\n                        <rect key=\"frame\" x=\"0.0\" y=\"0.0\" width=\"375\" height=\"667\"/>\n                        <autoresizingMask key=\"autoresizingMask\" widthSizable=\"YES\" heightSizable=\"YES\"/>\n                        <subviews>\n                          {}\n                        </subviews>\n                        <color key=\"backgroundColor\" white=\"1\" alpha=\"1\" colorSpace=\"calibratedWhite\"/>\n                    </view>\n                </viewController>\n                <placeholder placeholderIdentifier=\"IBFirstResponder\" id=\"[ID]\" userLabel=\"First Responder\" sceneMemberID=\"firstResponder\"/>\n            </objects>\n            <point key=\"canvasLocation\" x=\"20\" y=\"95.802098950524751\"/>\n        </scene>\n    </scenes>\n</document>\n",
  "stack": "<stackView opaque=\"NO\" contentMode=\"center\" fixedFrame=\"YES\" axis=\"vertical\" alignment=\"center\" spacing=\"10\" translatesAutoresizingMaskIntoConstraints=\"NO\" id=\"[ID]\">\n    <frame key=\"frameInset\" minX=\"16\" minY=\"20\" width=\"343\" height=\"440\"/>\n    <autoresizingMask key=\"autoresizingMask\" flexibleMaxX=\"YES\" flexibleMaxY=\"YES\"/>\n    <subviews>\n        {}\n    </subviews>\n    <color key=\"backgroundColor\" red=\"0.80000001190000003\" green=\"0.80000001190000003\" blue=\"0.80000001190000003\" alpha=\"1\" colorSpace=\"calibratedRGB\"/>\n</stackView>",
  "row": "<view contentMode=\"center\" ambiguous=\"YES\" translatesAutoresizingMaskIntoConstraints=\"NO\" id=\"[ID]\">\n    <frame key=\"frameInset\" width=\"343\" height=\"65\"/>\n    <subviews>\n        <stackView opaque=\"NO\" contentMode=\"center\" fixedFrame=\"YES\" spacing=\"30\" translatesAutoresizingMaskIntoConstraints=\"NO\" id=\"[ID]\">\n            <frame key=\"frameInset\" minX=\"8\" minY=\"6\" width=\"337\" height=\"52\"/>\n            <autoresizingMask key=\"autoresizingMask\" flexibleMaxX=\"YES\" flexibleMaxY=\"YES\"/>\n            <subviews>\n                {}\n            </subviews>\n        </stackView>\n    </subviews>\n    <color key=\"backgroundColor\" red=\"0.9\" green=\"0.9\" blue=\"0.9\" alpha=\"1\" colorSpace=\"calibratedRGB\"/>\n</view>",
  "img": "<imageView userInteractionEnabled=\"NO\" contentMode=\"scaleToFill\" horizontalHuggingPriority=\"251\" verticalHuggingPriority=\"251\" ambiguous=\"YES\" translatesAutoresizingMaskIntoConstraints=\"NO\" id=\"[ID]\">\n    <frame key=\"frameInset\" width=\"36\" height=\"36\"/>\n    <color key=\"backgroundColor\" red=\"0.40000000600000002\" green=\"0.40000000600000002\" blue=\"1\" alpha=\"1\" colorSpace=\"calibratedRGB\"/>\n</imageView>",
  "label": "<label opaque=\"NO\" userInteractionEnabled=\"NO\" contentMode=\"left\" horizontalHuggingPriority=\"251\" verticalHuggingPriority=\"251\" ambiguous=\"YES\" text=\"[TEXT]\" textAlignment=\"natural\" lineBreakMode=\"tailTruncation\" baselineAdjustment=\"alignBaselines\" adjustsFontSizeToFit=\"NO\" translatesAutoresizingMaskIntoConstraints=\"NO\" id=\"[ID]\">\n    <frame key=\"frameInset\" width=\"255\" height=\"52\"/>\n    <fontDescription key=\"fontDescription\" type=\"system\" pointSize=\"17\"/>\n    <nil key=\"textColor\"/>\n    <nil key=\"highlightedColor\"/>\n</label>",
  "switch": "<switch opaque=\"NO\" contentMode=\"scaleToFill\" horizontalHuggingPriority=\"750\" verticalHuggingPriority=\"750\" ambiguous=\"YES\" contentHorizontalAlignment=\"center\" contentVerticalAlignment=\"center\" on=\"YES\" translatesAutoresizingMaskIntoConstraints=\"NO\" id=\"[ID]\">\n    <frame key=\"frameInset\" width=\"51\" height=\"31\"/>\n</switch>",
  "slider": "<slider opaque=\"NO\" contentMode=\"scaleToFill\" ambiguous=\"YES\" contentHorizontalAlignment=\"center\" contentVerticalAlignment=\"center\" value=\"0.5\" minValue=\"0.0\" maxValue=\"1\" translatesAutoresizingMaskIntoConstraints=\"NO\" id=\"[ID]\">\n    <frame key=\"frameInset\" width=\"142\" height=\"31\"/>\n</slider>",
  "btn-add": "<button opaque=\"NO\" contentMode=\"scaleToFill\" ambiguous=\"YES\" contentHorizontalAlignment=\"center\" contentVerticalAlignment=\"center\" buttonType=\"contactAdd\" lineBreakMode=\"middleTruncation\" translatesAutoresizingMaskIntoConstraints=\"NO\" id=\"[ID]\">\n    <frame key=\"frameInset\" width=\"22\" height=\"22\"/>\n</button>",
  "footer": "<tabBar contentMode=\"scaleToFill\" fixedFrame=\"YES\" translatesAutoresizingMaskIntoConstraints=\"NO\" id=\"[ID]\">\n    <frame key=\"frameInset\" height=\"49\"/>\n    <autoresizingMask key=\"autoresizingMask\" widthSizable=\"YES\" flexibleMinY=\"YES\"/>\n    <color key=\"backgroundColor\" white=\"0.0\" alpha=\"0.0\" colorSpace=\"calibratedWhite\"/>\n    <items>\n        {}\n    </items>\n</tabBar>",
  "btn-search": "<tabBarItem systemItem=\"search\" id=\"[ID]\"/>",
  "btn-contact": "<tabBarItem systemItem=\"contacts\" id=\"[ID]\"/>",
  "btn-download": "<tabBarItem systemItem=\"downloads\" id=\"[ID]\"/>",
  "btn-more": "<tabBarItem systemItem=\"more\" id=\"[ID]\"/>"
}

Web的DSL

Web的DSL有16种：

opening-tag
closing-tag
body
header
btn-active
btn-inactive
row
single
double
quadruple
btn-green
btn-orange
btn-red
big-title
small-title
text

{
  "opening-tag": "{",
  "closing-tag": "}",
  "body": "<html>\n  <header>\n    <meta charset=\"utf-8\">\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n    <link rel=\"stylesheet\" href=\"https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css\" integrity=\"sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u\" crossorigin=\"anonymous\">\n<link rel=\"stylesheet\" href=\"https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap-theme.min.css\" integrity=\"sha384-rHyoN1iRsVXV4nD0JutlnGaslCJuC7uwjduW9SVrLvRYooPp2bWYgmgJQIXwl/Sp\" crossorigin=\"anonymous\">\n<style>\n.header{margin:20px 0}nav ul.nav-pills li{background-color:#333;border-radius:4px;margin-right:10px}.col-lg-3{width:24%;margin-right:1.333333%}.col-lg-6{width:49%;margin-right:2%}.col-lg-12,.col-lg-3,.col-lg-6{margin-bottom:20px;border-radius:6px;background-color:#f5f5f5;padding:20px}.row .col-lg-3:last-child,.row .col-lg-6:last-child{margin-right:0}footer{padding:20px 0;text-align:center;border-top:1px solid #bbb}\n</style>\n    <title>Scaffold</title>\n  </header>\n  <body>\n    <main class=\"container\">\n      {}\n      <footer class=\"footer\">\n        <p>&copy; Tony Beltramelli 2017</p>\n      </footer>\n    </main>\n    <script src=\"js/jquery.min.js\"></script>\n    <script src=\"js/bootstrap.min.js\"></script>\n  </body>\n</html>\n",
  "header": "<div class=\"header clearfix\">\n  <nav>\n    <ul class=\"nav nav-pills pull-left\">\n      {}\n    </ul>\n  </nav>\n</div>\n",
  "btn-active": "<li class=\"active\"><a href=\"#\">[]</a></li>\n",
  "btn-inactive": "<li><a href=\"#\">[]</a></li>\n",
  "row": "<div class=\"row\">{}</div>\n",
  "single": "<div class=\"col-lg-12\">\n{}\n</div>\n",
  "double": "<div class=\"col-lg-6\">\n{}\n</div>\n",
  "quadruple": "<div class=\"col-lg-3\">\n{}\n</div>\n",
  "btn-green": "<a class=\"btn btn-success\" href=\"#\" role=\"button\">[]</a>\n",
  "btn-orange": "<a class=\"btn btn-warning\" href=\"#\" role=\"button\">[]</a>\n",
  "btn-red": "<a class=\"btn btn-danger\" href=\"#\" role=\"button\">[]</a>",
  "big-title": "<h2>[]</h2>",
  "small-title": "<h4>[]</h4>",
  "text": "<p>[]</p>\n"
}

前端智能化漫谈 (2) - pix2code实战篇
前端智能化漫谈 (2) - pix2code实战篇将pix2code跑起来先来干货介绍将pix2code跑起来...
前端智能化漫谈 (4) - pix2code结果编辑距离分析
前端智能化漫谈 (4) - pix2code结果编辑距离分析 Levenshtein距离分析从实用的角度，我们先...
前端智能化漫谈 (3) - pix2code推理部分解析
前端智能化漫谈 (3) - pix2code推理部分解析上一节我们将pix2code的流程梳理了一遍，相信大家已...
前端智能化漫谈 (1) - pix2code
前端智能化漫谈 (1) - pix2code 自从有了GUI图形界面，就诞生了跟图形界面打交道的开发工程师，其中最...
未来已来，漫谈纺纱智能化
原创：中国纱线网中国纱线网2017-12-15 《老王漫谈》第37期近几年，在中国制造2025背景下，纺纱智能化...
漫谈前端开发
什么叫做原生App? 什么是混合app? 什么是WebApp开发? 原生App： NativeApp开发即...
漫谈前端之路
前言前端之路何其漫漫~ 说明：本篇文章原是写给学弟学妹的，但想来花的功夫确实不少，就把此篇文章当做自己的一个阶段...
前端开发漫谈
先说点什么吧经过大半年的踟蹰，最终还是决定要写一点关于我所从事的职业的文章。但与以往的分享或文章不同，这次的内容...
漫谈 Clustering
漫谈 Clustering (1): k-means 漫谈 Clustering (2): k-medoids 漫...
「可视化搭建系统」——从设计到架构，探索前端的领域和意义
阿里巴巴集团前端委员会主席 @圆心对前端未来期许有四点：搭建服务， Serverless，智能化，IDE。仔细想...