Phish360: The ultimate
multimodal dataset
for phishing detection

Phish360 is a novel multimodal anti-phishing dataset featuring 10,748 real-world phishing and legitimate samples collected between 2020 and 2023. This dataset is meticulously designed to drive innovation and research in multimodal phishing detection by integrating visual and semantic features.

Why Phish360?

Scarcity of Multimodal Datasets

Despite the abundance of anti-phishing research, publicly available multimodal datasets are limited. This scarcity restricts the development and evaluation of models that can leverage different modalities (e.g., URLs, HTML, screenshots) for phishing detection.

Legitimate Data Bias

Many datasets primarily include the homepages of legitimate websites, leading to a bias where legitimate URLs are significantly shorter. Phish360 addresses this by incorporating legitimate login pages for a realistic and balanced distribution.

Low Diversity in Samples

Existing datasets are often collected from a narrow range of sources over a short timeframe. This leads to a lack of diversity, with many datasets containing similar or redundant URLs, reducing their value for real-world applications.

Data Integrity Issues

Existing datasets often suffer from missing content, duplicates, or offline pages. Phish360 guarantees data integrity by pre-validating every sample to ensure the URL, HTML, and screenshot are accessible and correctly rendered.

Contributions

Multimodal Triplet

First dataset to enforce unique (URL, HTML, Image) triplets to eliminate data leakage. This ensures that models are trained on distinct samples, preventing overfitting to duplicates.

Linguistic Diversity

Includes samples in 30+ languages, moving beyond English-only biases. This global coverage ensures that detection models remain robust across different linguistic contexts.

Optimized Processing

Provided in Parquet format for efficient, column-oriented data retrieval. Researchers can load massive datasets in seconds compared to traditional CSV or JSON formats.

Reproducible Benchmarks

Sets a standard baseline for comparing text-based, image-based, and hybrid models. The pre-defined train/test splits allow for fair and consistent performance evaluation across studies.

Our Features

Screenshots

1280x960 resolution available for 100% of samples.

Raw HTML

Full source code captured for 100% of samples.

Full URLs

Complete paths including query parameters.

Rich Metadata

Includes Brand, TLD, SSL status, and more.

Parquet Format

High-performance columnar storage for big data.

Multi-Language

30 Phishing & 27 Legitimate languages.

Time-Spaced

Collected over 2.5 years to capture trends.

Easily Extendable

Modular architecture allows easy addition of new features.

Dataset Statistics

Class Distribution

Linguistic Diversity

COMPARISON

Phishing URL Domain Statistics

Dataset Name URLs (%) Domains (%) TLD (%) FLD (%) Subdomains (%)
PWD2016 38.25 17.71 1.40 17.85 2.74
PhishIntention 87.21 42.62 1.56 43.04 24.46
PILWD-134K 86.68 45.23 1.08 46.41 21.35
VanNL126K 100.0 25.91 0.67 26.85 13.65
Phish360 (Ours) 98.26 73.63 6.69 73.86 28.69

Legitimate URLDomain Statistics

Dataset Name URLs (%) Domains (%) TLD (%) FLD (%) Subdomains (%)
PWD2016 100.0 92.97 2.46 - 0.65
PhishIntention 87.94 82.98 2.17 86.68 3.21
PILWD-134K 99.23 91.37 0.76 92.57 3.27
VanNL126K 100.0 84.90 2.16 85.68 9.94
Phish360 (Ours) 99.41 88.73 3.07 88.92 7.29

How to Use

Load Data

Easily load specific columns using Pandas.

import pandas as pd

# Load specific columns
cols = ['URL', 'full_html', 'BeautifulSoup_text', 'image_path', 'class']
df = pd.read_parquet('phish360.parquet', columns=cols)

Experiment

Use the pre-defined split for reproducible results.

from sklearn.model_selection import train_test_split
import pandas as pd

# 1. Read datasets
phish = pd.read_parquet('Phish360_phish.parquet')
legit = pd.read_parquet('Phish360_legit.parquet')

# 2. Add & Combine
df = pd.concat([phish, legit], ignore_index=True)

# 3. Split (80/20, seed=42)
train, test = train_test_split(df, test_size=0.2, random_state=42, stratify=y)

Experimental Results

CrossPhire Performance on All Benchmark Datasets

Detailed performance metrics of CrossPhire (using ResNet50 and DenseNet121 vision encoders) across five standard phishing datasets.

Dataset Vision Model Accuracy Precision Recall F1-Score
PILWD-134K ResNet50 98.04% 97.88% 98.10% 97.83%
DenseNet121 98.07% 97.84% 98.21% 98.83%
VanNL126K ResNet50 99.42% 99.57% 99.72% 99.63%
DenseNet121 99.26% 99.49% 99.62% 99.52%
PhishIntention ResNet50 99.57% 99.50% 99.76% 99.61%
DenseNet121 99.63% 99.54% 99.83% 99.66%
PWD2016 ResNet50 100.00% 100.00% 100.00% 100.00%
DenseNet121 100.00% 100.00% 100.00% 100.00%
Phish360 ResNet50 97.71% 97.99% 96.29% 96.02%
DenseNet121 97.96% 97.90% 96.99% 96.53%

How to Cite

CrossPhire: Benefiting Multimodality for Robust Phishing Web Page Identification

Ahmad Hani Abdalla Almakhamreh, Ahmet Selman Bozkir

Applied Sciences, 2026, 16(2), 751

https://doi.org/10.3390/app16020751

Authors

AHMET SELMAN BOZKIR (Ph.D)

Data curation, collection - Initial filtering

Ahmad H. A. Almakhamreh

Data cleaning, visualization - exploratory data analysis - Post-filtering