林轩田机器学习基石课程 - PLA算法 python实现

作者: Spareribs | 来源:发表于2019-02-08 19:41 被阅读42次

作业1：计算

Q1. Implement a version of PLA by visiting examples in the naive cycle using the order of examples in the data set.
Run the algorithm on the data set.
What is the number of updates before the algorithm halts?

       def pla_1(self, X, Y):
        """
        统计迭代次数
        :param X: 特征集
        :param Y: 标签集
        :return: 返回迭代次数
        """
        # w权重初始化, 默认设置成与X唯独一样的零法向量
        W = np.zeros(X.shape[1])

        # PLA iteration
        halt = 0  # number of iteration before halt
        for i in range(X.shape[0]):  # 遍历所有所有X
            score = np.dot(X[i, :], W)  # 计算X与w的乘积
            if score * Y[i] <= 0:  # 出现错误，即 y*X <= 0的时候
                W = W + np.dot(X[i, :].T, Y[i])  # 重置W权重
                halt = halt + 1  # 增加迭代的次数

        return halt

作业2：

Q2. Implement a version of PLA by visiting examples in fixed, pre-determined random cycles throughout the algorithm.
Run the algorithm on the data set. Please repeat your experiment for 2000 times, each with a different random seed.
What is the average number of updates before the algorithm halts?
Plot a histogram ( https://en.wikipedia.org/wiki/Histogram ) to show the number of updates versus frequency.

    def pla_2(self, X, Y):
        """
        在pla_2的基础上统计 平均迭代的次数
        :param X: 特征集
        :param Y: 标签集
        :return: 平均迭代次数和准确率
        """
        Iteration = 2000  # 设置最大的迭代次数
        Halts = []  # list store halt every iteration
        Accuracys = []  # list store accuracy every iteration

        for iter in range(Iteration):
            np.random.seed(iter)  # set random seed, different by iteration

            # 随机选取一个点
            permutation = np.random.permutation(X.shape[0])  # random select index
            X = X[permutation]  # random order X
            Y = Y[permutation]  # random order Y, as the same as X

            # 与 pla_1 的功能一样，遍历X, 更新w权重和统计迭代次数
            W = np.zeros(X.shape[1])  # weights initialization
            halt = 0  # number of iteration before halt
            for i in range(X.shape[0]):
                score = np.dot(X[i, :], W)  # score
                if score * Y[i] <= 0:  # classification error
                    W = W + np.dot(X[i, :].T, Y[i])
                    halt = halt + 1

            # 设置Y标签,如果大于0置1，小于0置-1
            Y_pred = np.dot(X, W)
            Y_pred[Y_pred > 0] = 1
            Y_pred[Y_pred < 0] = -1
            accuracy = np.mean(Y_pred == Y)

            # store Halts & Accuracys
            Halts.append(halt)
            Accuracys.append(accuracy)

        # mean
        halt_mean = np.mean(Halts)
        accuracy_mean = np.mean(Accuracys)

        return halt_mean, accuracy_mean

作业3：

Q3. Implement a version of PLA by visiting examples in fixed, pre-determined random cycles throughout the algorithm, while changing the update rule to be:
Wt+1→Wt+ηyn(t)xn(t) with η=0.5η=0.5 . Note that your PLA in the previous problem corresponds to η=1η=1 .
Please repeat your experiment for 2000 times, each with a different random seed. What is the average number of updates before the algorithm halts?
Plot a histogram to show the number of updates versus frequency. Compare your result to the previous problem and briefly discuss your findings.


    def pla_3(self, X, Y):

        Iteration = 2000  # number of iteration
        Halts = []  # list store halt every iteration
        Accuracys = []  # list store accuracy every iteration

        for iter in range(Iteration):
            np.random.seed(iter)  # set random seed, different by iteration
            permutation = np.random.permutation(X.shape[0])  # random select index
            X = X[permutation]  # random order X_data
            Y = Y[permutation]  # random order Y_data, as the same as X_data

            # look through the entire data set
            W = np.zeros(X.shape[1])  # weights initialization
            halt = 0  # number of iteration before halt
            for i in range(X.shape[0]):
                score = np.dot(X[i, :], W)  # score
                if score * Y[i] <= 0:  # classification error
                    W = W + 0.5 * np.dot(X[i, :].T, Y[i])
                    halt = halt + 1

            # accuracy
            Y_pred = np.dot(X, W)
            Y_pred[Y_pred > 0] = 1
            Y_pred[Y_pred < 0] = -1
            accuracy = np.mean(Y_pred == Y)

            # store Halts & Accuracys
            Halts.append(halt)
            Accuracys.append(accuracy)

        # mean
        halt_mean = np.mean(Halts)
        accuracy_mean = np.mean(Accuracys)
        return halt_mean, accuracy_mean

网友评论

机器学习与数据挖掘

本文标题：林轩田机器学习基石课程 - PLA算法 python实现

本文链接：https://www.haomeiwen.com/subject/iyeosqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

林轩田机器学习基石课程 - PLA算法 python实现

作业1：计算

作业2：

作业3：

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

机器学习与数据挖掘