Dumpware10 dataset presents a corpus to provide several opportunuties for machine learning and computer vision researchers in order to identify different malware families and benignwares in a manner of static image based analysis.
Dumpware10 is a dataset covering 3686 malware and 608 benignware samples yielding 4294 RBG images in total. So, researchers are not only enabled to recognize malware but also identify benign files resulting to an open-set classification.
Our corpus has been generated through running the executables with Procdump software in order to gather full memory dump files. In this context, researchers will be able to work on memory dump files instead of raw executable bytes.
Dumpware10 provides the image corpus via four different image width rendering schemes: (a) 224, (b) 300, (c) file size square root and (d) 4096 pixels such that, it can be observed how rendering affects the accuracy.
Dumpware 10 dataset is ready to be used through well known deep learning frameworks such as Pytorch, Tensorflow, MxNet and Keras. The image files are well organized for training and validation in same folder structure.
Our corpus involves files of the following malware families: Adposhel, Allaple.A, Amonetize, AutoRun-PU, BrowseFox, Dinwod, InstallCore.C, MultiPlug, VBA and Vilsel making 10 different types in total.
Dumpware10 dataset is free for academic purposes. You can easily download it by first filling up a 3 minutes-taking form. Next, the download link will be provided to you.
This dataset has been built in HUMIR at Deparment of Computer Engineering Hacettepe University 2020
Multimedia Information Retrieval Lab., Beytepe Ankara, Turkey
Phone: +90 312 297 75 00