output_main
This model is a fine-tuned version of roneneldan/TinyStories-1Layer-21M on the roneneldan/TinyStories dataset. It achieves the following results on the evaluation set:
- Loss: 1.6604
- Accuracy: 0.5791
- Multicode K: 1
- Dead Code Fraction/layer0: 0.1982
- Mse/layer0: 6073.8637
- Input Norm/layer0: 0.7182
- Output Norm/layer0: 76.7891
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 96
- eval_batch_size: 64
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.05
- training_steps: 100000
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy | Multicode K | Dead Code Fraction/layer0 | Mse/layer0 | Input Norm/layer0 | Output Norm/layer0 |
---|---|---|---|---|---|---|---|---|---|
2.2319 | 0.1 | 1000 | 1.9134 | 0.5317 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.8521 | 0.21 | 2000 | 1.7990 | 0.5495 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.7879 | 0.31 | 3000 | 1.7739 | 0.5557 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.7728 | 0.42 | 4000 | 1.7666 | 0.5564 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.7686 | 0.52 | 5000 | 1.7609 | 0.5595 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.7635 | 0.63 | 6000 | 1.7555 | 0.5598 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.7523 | 0.73 | 7000 | 1.7383 | 0.5632 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.7471 | 0.83 | 8000 | 1.7368 | 0.5643 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.7404 | 0.94 | 9000 | 1.7277 | 0.5659 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.728 | 1.04 | 10000 | 1.7290 | 0.5647 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.7195 | 1.15 | 11000 | 1.7244 | 0.5667 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.7198 | 1.25 | 12000 | 1.7230 | 0.5671 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.7171 | 1.36 | 13000 | 1.7177 | 0.5689 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.7185 | 1.46 | 14000 | 1.7150 | 0.5688 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.7149 | 1.56 | 15000 | 1.7125 | 0.5695 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.7105 | 1.67 | 16000 | 1.7097 | 0.5695 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.7107 | 1.77 | 17000 | 1.7073 | 0.5689 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.7113 | 1.88 | 18000 | 1.7025 | 0.5712 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.7078 | 1.98 | 19000 | 1.7048 | 0.5702 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.693 | 2.09 | 20000 | 1.7045 | 0.5696 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6935 | 2.19 | 21000 | 1.7068 | 0.5695 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6962 | 2.29 | 22000 | 1.7046 | 0.5687 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6954 | 2.4 | 23000 | 1.7019 | 0.5706 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6933 | 2.5 | 24000 | 1.7002 | 0.5725 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6942 | 2.61 | 25000 | 1.6983 | 0.5717 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6935 | 2.71 | 26000 | 1.6938 | 0.5730 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6928 | 2.82 | 27000 | 1.6978 | 0.5719 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6927 | 2.92 | 28000 | 1.6935 | 0.5715 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6855 | 3.02 | 29000 | 1.6978 | 0.5726 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6773 | 3.13 | 30000 | 1.6951 | 0.5732 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6788 | 3.23 | 31000 | 1.6926 | 0.5728 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6813 | 3.34 | 32000 | 1.6920 | 0.5726 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6782 | 3.44 | 33000 | 1.6926 | 0.5733 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6801 | 3.55 | 34000 | 1.6894 | 0.5719 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6796 | 3.65 | 35000 | 1.6890 | 0.5728 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6768 | 3.75 | 36000 | 1.6882 | 0.5722 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6802 | 3.86 | 37000 | 1.6872 | 0.5732 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6809 | 3.96 | 38000 | 1.6855 | 0.5750 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6701 | 4.07 | 39000 | 1.6886 | 0.5742 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6646 | 4.17 | 40000 | 1.6890 | 0.5734 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.669 | 4.28 | 41000 | 1.6859 | 0.5747 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6713 | 4.38 | 42000 | 1.6867 | 0.5740 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6693 | 4.48 | 43000 | 1.6821 | 0.5750 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6693 | 4.59 | 44000 | 1.6822 | 0.5747 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6692 | 4.69 | 45000 | 1.6801 | 0.5745 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6703 | 4.8 | 46000 | 1.6834 | 0.5761 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6677 | 4.9 | 47000 | 1.6819 | 0.5756 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6682 | 5.01 | 48000 | 1.6778 | 0.5752 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6547 | 5.11 | 49000 | 1.6825 | 0.5751 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6566 | 5.21 | 50000 | 1.6825 | 0.5758 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6605 | 5.32 | 51000 | 1.6814 | 0.5746 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6603 | 5.42 | 52000 | 1.6768 | 0.5755 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6595 | 5.53 | 53000 | 1.6757 | 0.5753 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6603 | 5.63 | 54000 | 1.6769 | 0.5738 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.662 | 5.74 | 55000 | 1.6758 | 0.5759 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6602 | 5.84 | 56000 | 1.6771 | 0.5757 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6624 | 5.94 | 57000 | 1.6749 | 0.5770 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6527 | 6.05 | 58000 | 1.6791 | 0.5758 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6474 | 6.15 | 59000 | 1.6763 | 0.5773 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6494 | 6.26 | 60000 | 1.6765 | 0.5761 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6539 | 6.36 | 61000 | 1.6741 | 0.5764 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6539 | 6.47 | 62000 | 1.6752 | 0.5768 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6529 | 6.57 | 63000 | 1.6737 | 0.5775 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6533 | 6.67 | 64000 | 1.6725 | 0.5758 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.653 | 6.78 | 65000 | 1.6722 | 0.5774 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6522 | 6.88 | 66000 | 1.6726 | 0.5762 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6528 | 6.99 | 67000 | 1.6726 | 0.5768 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6439 | 7.09 | 68000 | 1.6728 | 0.5771 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6403 | 7.19 | 69000 | 1.6703 | 0.5758 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6447 | 7.3 | 70000 | 1.6697 | 0.5772 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6458 | 7.4 | 71000 | 1.6694 | 0.5777 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6447 | 7.51 | 72000 | 1.6716 | 0.5771 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6449 | 7.61 | 73000 | 1.6680 | 0.5779 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6458 | 7.72 | 74000 | 1.6683 | 0.5779 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6447 | 7.82 | 75000 | 1.6681 | 0.5778 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6451 | 7.92 | 76000 | 1.6677 | 0.5781 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6418 | 8.03 | 77000 | 1.6665 | 0.5789 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6361 | 8.13 | 78000 | 1.6684 | 0.5779 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.636 | 8.24 | 79000 | 1.6687 | 0.5786 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6357 | 8.34 | 80000 | 1.6670 | 0.5790 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6379 | 8.45 | 81000 | 1.6658 | 0.5788 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6405 | 8.55 | 82000 | 1.6661 | 0.5788 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6378 | 8.65 | 83000 | 1.6650 | 0.5789 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6386 | 8.76 | 84000 | 1.6650 | 0.5784 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.638 | 8.86 | 85000 | 1.6644 | 0.5785 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6374 | 8.97 | 86000 | 1.6635 | 0.5777 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6298 | 9.07 | 87000 | 1.6647 | 0.5785 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6302 | 9.18 | 88000 | 1.6649 | 0.5787 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6315 | 9.28 | 89000 | 1.6651 | 0.5782 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.631 | 9.38 | 90000 | 1.6636 | 0.5788 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6316 | 9.49 | 91000 | 1.6627 | 0.5782 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6286 | 9.59 | 92000 | 1.6646 | 0.5783 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6304 | 9.7 | 93000 | 1.6632 | 0.5801 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6298 | 9.8 | 94000 | 1.6623 | 0.5800 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6309 | 9.91 | 95000 | 1.6620 | 0.5800 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6302 | 10.01 | 96000 | 1.6602 | 0.5801 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6242 | 10.11 | 97000 | 1.6610 | 0.5786 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6258 | 10.22 | 98000 | 1.6605 | 0.5795 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6234 | 10.32 | 99000 | 1.6605 | 0.5791 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
1.6245 | 10.43 | 100000 | 1.6604 | 0.5791 | 1 | 1.0 | 0.0 | 0.0 | 0.0 |
Framework versions
- Transformers 4.29.2
- Pytorch 2.0.1+cu117
- Datasets 2.12.0
- Tokenizers 0.13.3
- Downloads last month
- 2
Dataset used to train taufeeque/TinyStories-1Layer-21M-Codebook
Evaluation results
- Accuracy on roneneldan/TinyStoriesself-reported0.579