开源 NSFW 模型
This repo contains code for running Not Suitable for Work (NSFW) classification deep neural network Caffe models. Please refer our blog post which describes this work and experiments in more detail.
该仓库包含运行NSFW类的深度神经网络Caffe模型代码。想要了解有关该模型的更多作品和实验细节,请参阅我们的博客。
Not suitable for work classifier
NSFW分级器
Detecting offensive / adult images is an important problem which researchers have tackled for decades. With the evolution of computer vision and deep learning the algorithms have matured and we are now able to classify an image as not suitable for work with greater precision.
监测黄色或暴力图片是研发人员解决了几十年的重要问题。随着计算机图像和深度机器学习的发展,算法逐渐成熟,我们也能更加精准地识别出黄色或暴力图片。
Defining NSFW material is subjective and the task of identifying these images is non-trivial. Moreover, what may be objectionable in one context can be suitable in another. For this reason, the model we describe below focuses only on one type of NSFW content: pornographic images. The identification of NSFW sketches, cartoons, text, images of graphic violence, or other types of unsuitable content is not addressed with this model.
定义NSFW的分级是一种主观判断,要识别这些黄色或暴力图片也很麻烦。此外,在某些环境下,存在一定异议的东西,在另一个环境下可能是合理的。正因如此,我们下面描述的模型仅涉及NSFW内容中的一种:色情图片。
Since images and user generated content dominate the internet today, filtering nudity and other not suitable for work images becomes an important problem. In this repository we opensource a Caffe deep neural network for preliminary filtering of NSFW images.
由于图片和UGC内容主导着当今的互联网,过滤裸露和其他NSFW图片成为至关重要的问题。本仓库将开源一套基于深度神经网络Caffe模型用来初步过滤NSFW图片。
Demo Image
Usage
使用方法
- The network takes in an image and gives output a probability (score between 0-1) which can be used to filter not suitable for work images. Scores < 0.2 indicate that the image is likely to be safe with high probability. Scores > 0.8 indicate that the image is highly probable to be NSFW. Scores in middle range may be binned for different NSFW levels.
- 在神经网络模型输入一张图片,将输出一个用于过滤NSFW图片的分值(介于0-1之间)。若分值 < 0.2,意味着该图片很可能是正常的;分值 > 0.8 意味着该图片极有可能属于NSFW图片;介于 0.2-0.8 之间的图片一般要按业务需要分类进行处理。
- Depending on the dataset, usecase and types of images, we advise developers to choose suitable thresholds. Due to difficult nature of problem, there will be errors, which depend on use-cases / definition / tolerance of NSFW. Ideally developers should create an evaluation set according to the definition of what is safe for their application, then fit a ROC curve to choose a suitable threshold if they are using the model as it is.
- 我们建议开发人员根据数据集、用例和图片类型选择适当的阈值。由于实际问题的不同,依赖于用例、NSFW 的定义、NSFW的容忍程度等影响因素,会产生一些可能的判断错误。理想状态下,开发人员应当根据其应用的安全性定义创建一个评估数据集,然后在正确使用该模型的条件下,拟合一个ROC曲线来选择一个合适的阈值范围。
- Results can be improved by fine-tuning the model for your dataset/ uscase / definition of NSFW. We do not provide any guarantees of accuracy of results. Please read the disclaimer below.
- 通过微调你模型中的数据集、用例和NSFW的定义范围,能不断优化模型结果。我们对于模型结果的准确性不做任何保证。请阅读下面的免责声明。
- Using human moderation for edge cases in combination with the machine learned solution will help improve performance.
- 在机器学习解决方案的基础上,结合边缘情况进行人工调整,有助于优化模型性能。
Description of model
模型说明
We trained the model on the dataset with NSFW images as positive and SFW(suitable for work) images as negative. These images were editorially labelled. We cannot release the dataset or other details due to the nature of the data.
我们用包含NSFW图像和SFW图像的数据集来训练和调教模型。这些图片都被打上了标签,将NSFW作为阳性结果、SFW作为阴性结果。由于这部分数据集的特殊性质,我们无法公开他们的任何细节信息。
We use CaffeOnSpark which is a wonderful framework for distributed learning that brings deep learning to Hadoop and Spark clusters for training models for our experiments. Big thanks to the CaffeOnSpark team!
在我们的实验中,我们利用了一个非常棒的分布式学习框架 CaffeOnSpark,将深度学习应用到Hadoop和Spark集群中训练模型。特别感谢CaffeOnSpark团队!
The deep model was first pretrained on ImageNet 1000 class dataset. Then we finetuned the weights on the NSFW dataset.We used the thin resnet 50 1by2 architecture as the pretrained network. The model was generated using pynetbuilder tool and replicates the residual network paper's 50 layer network (with half number of filters in each layer). You can find more details on how the model was generated and trained here
我们首先利用ImageNet1000多个数据集对模型进行了预训练。在此基础上对NSFW数据集的比重进行了微调。利用少量的resnet 50 1by2框架作为预训练网站。模型工具由pynetbuilder和50层残留网站的副本(每层中包含一半的过滤器)生成。想要查看更多关于生成和训练模型的信息,请点击这里。
Please note that deeper networks, or networks with more filters can improve accuracy. We train the model using a thin residual network architecture, since it provides good tradeoff in terms of accuracy, and the model is light-weight in terms of runtime (or flops) and memory (or number of parameters).
注意,网络层次越深,或用更多的过滤器,将会提升模型的准确性。由于苛刻的网络架构提供了良好的精度,并且在运行时间(或浮点运算)和内存(或大量参数)上非常的轻量,我们用了这套苛刻的网络架构来训练模型。
Docker Quickstart
Docker 快速入门
This Docker quickstart guide can be used for evaluating the model quickly with minimal dependency installation.
Install Docker Engine:
- Windows Installation
- Mac OSX Installation
- Ubuntu Installation
快速入门手册能帮助你在最小依赖安装的条件下快速评估模型。
安装Docker引擎: - Windows Installation
- Mac OSX Installation
- Ubuntu Installation
Build a caffe docker image (CPU)
编译一个 caffe docker镜像(CPU)
docker build -t caffe:cpu https://raw.githubusercontent.com/BVLC/caffe/master/docker/standalone/cpu/Dockerfile
Check the caffe installation
检查Caffe是否已安装
docker run caffe:cpu caffe --version
caffe version 1.0.0-rc3
Run the docker image with a volume mapped to your * open_nsfw * repository. Your * test_image.jpg * should be located in this same directory.
运行与open_nsfw 库对应的docker镜像。注意* test_image.jpg *应处于同一个文件目录下。
cd open_nsfw
docker run --volume=$(pwd):/workspace caffe:cpu \
python ./classify_nsfw.py \
--model_def nsfw_model/deploy.prototxt \
--pretrained_model nsfw_model/resnet_50_1by2_nsfw.caffemodel \
test_image.jpg
We will get the NSFW score returned:
我们将得到返回的NSFW分值:
NSFW score: 0.14057905972
Running the model
如何运行模型
To run this model, please install Caffe and its python extension and make sure pycaffe is available in your PYTHONPATH.
运行该模型,请安装Caffe和python扩展组件,并确保pycaffe在你的环境变量PYTHONPATH下是可用的。
We can use the classify.py script to run the NSFW model. For convenience, we have provided the script in this repo as well, and it prints the NSFW score.
利用classify.py脚本可运行NSFW模型。为了方便,我们在仓库中已经提供了这个脚本,用它能输出NSFW的分值。
python ./classify_nsfw.py \
--model_def nsfw_model/deploy.prototxt \
--pretrained_model nsfw_model/resnet_50_1by2_nsfw.caffemodel \
INPUT_IMAGE_PATH
Disclaimer
免责声明
The definition of NSFW is subjective and contextual. This model is a general purpose reference model, which can be used for the preliminary filtering of pornographic images. We do not provide guarantees of accuracy of output, rather we make this available for developers to explore and enhance as an open source project. Results can be improved by fine-tuning the model for your dataset.
由于对于“限制级”内容的定义与主观判断和所处上下文有关,该模型仅作为常规参考模型来初步过滤色情图片。我们不保证任何输出结果的准确性,仅作为广大开发者探索和学习的开源项目。通过微调您模型中的数据集可以优化模型结果。
授权
代码基于 BSD 2 clause license 的许可,详情见链接的授权文件。
Contact
联系方式
The model was trained by [Jay Mahadeokar] (https://github.com/jay-mahadeokar/), in collaboration with Sachin Farfade , Amar Ramesh Kamat, Armin Kappeler and others. Special thanks to Gerry Pesavento for taking the initiative for open-sourcing this model. If you have any queries, please raise an issue and we will get back ASAP.
该模型由Jay Mahadeokar,Sachin Farfade , Amar Ramesh Kamat,Armin Kappeler 等人合作训练。特别鸣谢Gerry Pesavento带头倡议开源了该模型。若有任何问题,我们将尽快给您答复。
网友评论