Dataset
Extremely Imbalanced Data
We used the Fudan text classification corpus to create 8 highly skewed data sets, with the imbalance ratio approximately 1:123. Each data set has exactly the same skew ratio, i.e., 1:123, but with different class numbers. These data sets were used in our DMKD paper below:
Pang, G., Jin, H., & Jiang, S. (2015). CenKNN: a scalable and effective text classifier. Data Mining and Knowledge Discovery, 29(3), 593-625.
These data sets are available at SCHOLAT.