๐Ÿ“Š MUSEval Leaderboard

Welcome to the MUSEval Leaderboard! This leaderboard provides comprehensive evaluation results of multivariate time series forecasting. Rows are models and columns are performance metrics. Use the filters below to explore results by different criteria and compare model performance across various domains and categories. For additional details on the models, click on the models to access the Model Inspector below the table. Metrics are explained in "About MUSEval Leaderboard" below the table. Submissions can be added at this github repository. This leaderboard determines the best performing model for multivariate time series forecasting tasks, as measured by the lowest Mean Absolute Percentage Error (MAPE). High performance on these datasets provides evidence that a model can utilize historical time series relationships to make accurate predictions.

Key Features:

  • Multivariate Focused: All datasets are multivariate and evaluations compare univariate and multivariate performance
  • Scale: 19 billion data points across 2.6 million time series
  • Diversity: 19 multivariate time series domains

Dataset Structure:

  • Categories: Traditional, Sequential, Synthetic, Collections
  • Domains: Finance, Health, Energy, Environment, Engineering, and more
  • Datasets: 83 individual time series datasets
๐Ÿ“‚ Filter By Category
๐ŸŒ Filter By Domain
๐Ÿ“Š Filter by Dataset
๐Ÿ”„ Sort

Models ranked by the number of datasets where they achieve the lowest MAPE (Top-Performer). Click on the model cell to details.

Select Model

Choose a model to view its metadata

Select a model to view its metadata.

Evaluation Metrics

Standard Metrics:

  • Rank: Rank of the model as determined by the lowest Multi-MAPE or Uni-MAPE, the normalized metric.
  • Multi-MAPE (Mean Absolute Percentage Error): Average percentage error, normalised by the dataset. Lower is better. If the model is univariate then this is the same as Uni-MAPE
  • Uni-MAPE: MAPE metric for the model run univariate
  • Uni-Multi: Difference in performance between univariate and multivariate models. Higher positive values means the multivariate model outperforms univariate.
  • MAE (Mean Absolute Error): Average absolute difference between predicted and actual values, scaled based on the actual values of the dataset
  • RMSE (Root Mean Square Error): Square root of average squared differences of actual values
  • Top-Performer: The number of datasets that the model achieves lowest Multi-MAPE or Uni-MAPE

Contact & Support

For questions about the dataset or leaderboard:

Leaderboard Information

This leaderboard provides:

  • Real-time Rankings: Live updates as new submissions are received
  • Filtered Views: Explore results by domain, category, and dataset
  • Model Inspector: Detailed metadata for each submitted model
  • Comprehensive Metrics: Multiple evaluation perspectives

The leaderboard aggregates results across all datasets to provide overall model rankings while maintaining the ability to drill down into specific domains and categories.

Submit by creating a pull request with your model's performance here:

๐Ÿš€ Submit Here