LogoSENSE dataset is aimed for researchers to supply a box-annotated ground truth dataset to evaluate their logo based anti-phishing studies. For this purpose, we supply a corpus involving logos of 15 highly phished brands. The dataset is composed of 2 different sub datasets namely training and wild sets respectively. It is important to mention that, LogoSENSE dataset aims to provide a benchmark dataset for only computer vision (especially object detection) based anti-phishing studies.

Our dataset involves 1530 (102 snapshots of phishing pagesx 15) training and 1979 (979 snapshot containing brand logos + 1000 legitimate) testing samples. Along with 102 brand specific samples we have also included 102 more distractor samples yielding 204 x 15 = 3060 training samples in total. The directory structure of the dataset has been created in order to locate each brand in its respective folder. Training set has been fully bounding box annotated. We have employed two different annotation format namely dLib and PASCAL VOC 2007. In this regard, we aimed to enable researchers to do experiments both in Dlib and several Deep Learning based schemes. Thus it is very easy to use our dataset in many frameworks such as Caffe©, Pytorch©, Tensorflow© and Keras©.

In this web page, we will publish the state of the art results obtained from LogoSENSE dataset on a regular basis. Moreover, we are planing to list the scientific papers citing this dataset. Please keep an eye on leader board section for more detailed information.

This dataset has been generated in HUMIR LAB at Hacettepe University Department of Computer Engineering and it is intended to be used for only academic purposes. Please read the paper provided at the menu before proceeding.


Important Note: The dataset will be available upon the acceptance of the paper! Therefore, we kindly ask you to be patient during its review stage.