QUICK REVIEW

[论文解读] The Convolutional Tsetlin Machine

Ole‐Christoffer Granmo, Sondre Glimsdal|arXiv (Cornell University)|May 23, 2019

Machine Learning and Algorithms参考文献 43被引用 48

一句话总结

卷积特斯坦机（CTM）通过使用带有位置感知补丁的基于子句的卷积滤波器，将可解释的特斯坦机扩展到图像数据，在MNIST、Kuzushiji-MNIST、Fashion-MNIST和二维噪声XOR上实现具有竞争力的准确率。

ABSTRACT

Convolutional neural networks (CNNs) have obtained astounding successes for important pattern recognition tasks, but they suffer from high computational complexity and the lack of interpretability. The recent Tsetlin Machine (TM) attempts to address this lack by using easy-to-interpret conjunctive clauses in propositional logic to solve complex pattern recognition problems. The TM provides competitive accuracy in several benchmarks, while keeping the important property of interpretability. It further facilitates hardware-near implementation since inputs, patterns, and outputs are expressed as bits, while recognition and learning rely on straightforward bit manipulation. In this paper, we exploit the TM paradigm by introducing the Convolutional Tsetlin Machine (CTM), as an interpretable alternative to CNNs. Whereas the TM categorizes an image by employing each clause once to the whole image, the CTM uses each clause as a convolution filter. That is, a clause is evaluated multiple times, once per image patch taking part in the convolution. To make the clauses location-aware, each patch is further augmented with its coordinates within the image. The output of a convolution clause is obtained simply by ORing the outcome of evaluating the clause on each patch. In the learning phase of the TM, clauses that evaluate to 1 are contrasted against the input. For the CTM, we instead contrast against one of the patches, randomly selected among the patches that made the clause evaluate to 1. Accordingly, the standard Type I and Type II feedback of the classic TM can be employed directly, without further modification. The CTM obtains a peak test accuracy of 99.4% on MNIST, 96.31% on Kuzushiji-MNIST, 91.5% on Fashion-MNIST, and 100.0% on the 2D Noisy XOR Problem, which is competitive with results reported for simple 4-layer CNNs, BinaryConnect, Logistic Circuits and an FPGA-accelerated Binary CNN.

研究动机与目标

将卷积型特斯坦机（CTM）作为对卷积神经网络（CNN）的可解释替代方案引入。
通过卷积样的滤波将 TM 学习规则扩展到图像补丁。
在标准基准数据集和一个二维 XOR 任务上展示 CTM 的识别与学习性能。

提出的方法

将图像表示为二进制输入，并定义大小为 W×W×Z×2 的基于子句的卷积滤波器。
为每个图像补丁增添编码的位置信息，使子句具有位置感知性。
对每一个补丁上的所有子句进行评估，并通过 OR 汇聚以产生每张图像的子句输出。
将经典 TM 的 Type I 与 Type II 反馈应用于子句内的 Tsetlin 自动机更新，在 CTM 设置中通过从激活该子句的补丁中随机选择一个来进行适配。
可选地加入整数子句加权，以在子句之间执行加权多数表决。
由于输入为比特级别且仅涉及简单的比特操作，展示并行化、便于硬件实现的运算。

实验结果

研究问题

RQ1CTM 是否能够在保持可解释性的同时实现对图像分类的具竞争力的准确率？
RQ2如何将 TM 的学习反馈（Type I 与 Type II）适配到卷积、基于补丁的设置？
RQ3位置信息感知和基于补丁的子句输出对识别性能有何影响？
RQ4子句加权如何影响 CTM 的准确性与计算效率？

主要发现

CTM 在 MNIST 的测试准确率达到 99.4%，在 Kuzushiji-MNIST 上达到 96.31%，在 Fashion-MNIST 上达到 91.5%，在 2D Noisy XOR 上达到 100.0%，并且与基准方法具有竞争力。
CTM 的计算复杂性随子句数量和图像补丁数量呈线性增长，且更新具有并行化优势。
将位置信息纳入后，滤波器可成为适用于图像任务的位置信息感知模式。
子句加权进一步提升性能与效率，使单一加权投票即可替代多条子句。
CTM 在某些数据集上与简单 CNN、BinaryConnect、Logistic Circuits 以及 FPGA 加速的 Binary CNN 相较依然具备竞争力的结果。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。