Features


Dumpware10 dataset presents a corpus to provide several opportunuties for machine learning and computer vision researchers in order to identify different malware families and benignwares in a manner of static image based analysis.

Open Set Recognition

Dumpware10 is a dataset covering 3686 malware and 608 benignware samples yielding 4294 RBG images in total. So, researchers are not only enabled to recognize malware but also identify benign files resulting to an open-set classification.

Memory Forensics

Our corpus has been generated through running the executables with Procdump software in order to gather full memory dump files. In this context, researchers will be able to work on memory dump files instead of raw executable bytes.

Different Image Sizes

Dumpware10 provides the image corpus via four different image width rendering schemes: (a) 224, (b) 300, (c) file size square root and (d) 4096 pixels such that, it can be observed how rendering affects the accuracy.

Deep Learning

Dumpware 10 dataset is ready to be used through well known deep learning frameworks such as Pytorch, Tensorflow, MxNet and Keras. The image files are well organized for training and validation in same folder structure.

Classes

Our corpus involves files of the following malware families: Adposhel, Allaple.A, Amonetize, AutoRun-PU, BrowseFox, Dinwod, InstallCore.C, MultiPlug, VBA and Vilsel making 10 different types in total.

Ease of Downloading

Dumpware10 dataset is free for academic purposes. You can easily download it by first filling up a 3 minutes-taking form. Next, the download link will be provided to you.

Contact Us


This dataset has been built in HUMIR at Deparment of Computer Engineering Hacettepe University 2020

Visit Us

Multimedia Information Retrieval Lab., Beytepe Ankara, Turkey

Phone: +90 312 297 75 00

Email: selman@cs.hacettepe.edu.tr