๐ MUSEval Leaderboard
Welcome to the MUSEval Leaderboard! This leaderboard provides comprehensive evaluation results of multivariate time series forecasting. Rows are models and columns are performance metrics.
Use the filters below to explore results by different criteria and compare model performance across various domains and categories. For additional details on the models, click on the models to access the Model Inspector below the table.
Metrics are explained in "About MUSEval Leaderboard" below the table. Submissions can be added at this github repository.
This leaderboard determines the best performing model for multivariate time series forecasting tasks, as measured by the lowest Mean Absolute Percentage Error (MAPE).
High performance on these datasets provides evidence that a model can utilize historical time series relationships to make accurate predictions.
Key Features:
- Multivariate Focused: All datasets are multivariate and evaluations compare univariate and multivariate performance
- Scale: 19 billion data points across 2.6 million time series
- Diversity: 19 multivariate time series domains
Dataset Structure:
- Categories: Traditional, Sequential, Synthetic, Collections
- Domains: Finance, Health, Energy, Environment, Engineering, and more
- Datasets: 83 individual time series datasets
๐ Filter By Category
๐ Filter By Domain
๐ Filter by Dataset
๐ Sort
Models ranked by the number of datasets where they achieve the lowest MAPE (Top-Performer). Click on the model cell to details.
Exponential Smoothing | Salesforce | 20/83 | 116.2% | 116.2% | -4.0% | 0.872 | 2025-10-10 | 10 |
Select Model
Choose a model to view its metadata
Select a model to view its metadata.
Evaluation Metrics
Standard Metrics:
- Rank: Rank of the model as determined by the lowest Multi-MAPE or Uni-MAPE, the normalized metric.
- Multi-MAPE (Mean Absolute Percentage Error): Average percentage error, normalised by the dataset. Lower is better. If the model is univariate then this is the same as Uni-MAPE
- Uni-MAPE: MAPE metric for the model run univariate
- Uni-Multi: Difference in performance between univariate and multivariate models. Higher positive values means the multivariate model outperforms univariate.
- MAE (Mean Absolute Error): Average absolute difference between predicted and actual values, scaled based on the actual values of the dataset
- RMSE (Root Mean Square Error): Square root of average squared differences of actual values
- Top-Performer: The number of datasets that the model achieves lowest Multi-MAPE or Uni-MAPE
Contact & Support
For questions about the dataset or leaderboard:
- Issues: Report issues on the GitHub repository
- Dataset: Try the dataset yourself on Hugging Face
Leaderboard Information
This leaderboard provides:
- Real-time Rankings: Live updates as new submissions are received
- Filtered Views: Explore results by domain, category, and dataset
- Model Inspector: Detailed metadata for each submitted model
- Comprehensive Metrics: Multiple evaluation perspectives
The leaderboard aggregates results across all datasets to provide overall model rankings while maintaining the ability to drill down into specific domains and categories.