ðŸ¦£ MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
Project Page: https://tigerailab.github.io/MAmmoTH/
Paper: https://arxiv.org/pdf/2309.05653.pdf
Code: https://github.com/TIGERAILab/MAmmoTH
Introduction
We introduce ðŸ¦£ MAmmoTH, a series of opensource large language models (LLMs) specifically tailored for general math problemsolving. The MAmmoTH models are trained on ðŸ¤— MathInstruct Dataset, a meticulously curated instruction tuning dataset that is lightweight yet generalizable. MathInstruct is compiled from 13 math rationale datasets, six of which are newly curated by this work. It uniquely focuses on the hybrid use of chainofthought (CoT) and programofthought (PoT) rationales, and ensures extensive coverage of diverse mathematical fields.
Base Model: Llama2  Base Model: Code Llama  

7B  ðŸ¦£ MAmmoTH7B  ðŸ¦£ MAmmoTHCoder7B 
13B  ðŸ¦£ MAmmoTH13B  ðŸ¦£ MAmmoTHCoder13B 
34B    ðŸ¦£ MAmmoTHCoder34B 
70B  ðŸ¦£ MAmmoTH70B   

Training Data
The models are trained on the ðŸ¤— MathInstruct Dataset, which is compiled from 13 different math rationale datasets. Check out the dataset card for more details.
Training Procedure
The models are finetuned with the MathInstruct dataset using the original Llama2 and Code Llama models as base models. The training procedure varies for different models based on their sizes. Check out our paper for more details.
Evaluation
The models are evaluated using openended and multiplechoice math problems from several datasets. Here are the results:
Model  Size  Base  GSM8K  MATH  AQuA  NumGLUE  IID Avg  SVAMP  Mathematics  SimulEq  SATMath  MMLUMath  OOD Avg 

MAmmoTH  7B  Llama2  51.7  31.2  42.9  53.1  44.7  66.7  44.8  42  36.4  38.6  45.7 
MAmmoTHCoder  7B  CodeLlama  58.8  35.2  43  57.1  48.5  71.1  53.9  44.6  40  40.5  50.2 
MAmmoTH  13B  Llama2  61.7  36  44.8  59.6  50.5  72.4  48.7  40.5  42.7  45.3  49.9 
MAmmoTHCoder  13B  CodeLlama  64.3  38.6  46.1  54.2  50.8  73.2  60  44.1  40.9  45.2  52.6 
MAmmoTHCoder  34B  CodeLlama  72.3  46.8  50.8  59.6  57.3  84  64.7  50.6  51.8  50.2  60.3 
MAmmoTH  70B  Llama2  76.7  44.2  61.4  64.3  61.7  81.7  55.3  45.3  58.6  52.3  58.6 
Usage
You can use the models through Model Database's Transformers library. Use the pipeline function to create a textgeneration pipeline with the model of your choice, then feed in a math problem to get the solution. Check our Github repo for more advanced use: https://github.com/TIGERAILab/MAmmoTH
Prompt Format
If you want to do CoT:
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:
If you want to do PoT:
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction} Let's write a program.
### Response:
Intended Uses
These models are trained for research purposes. They are designed to solve general math problems. They can be used in educational software, tutoring systems, or any application where a solution to a math problem is needed. The models can generate both a chain of thought (CoT) rationale and a program of thought (PoT) rationale, providing a comprehensive solution to a given math problem.
Limitations
We've tried our best to build math generalist models. However, we acknowledge that the models' performance may vary based on the complexity and specifics of the math problem. Still not all mathematical fields can be covered comprehensively.
Citation
If you use the models, data, or code from this project, please cite the original paper:
@article{yue2023mammoth,
title={MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning},
author={Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen},
journal={arXiv preprint arXiv:2309.05653},
year={2023}
}
 Downloads last month
 4