Datasets for LLM SFT A collection of datasets in the format of instruction-context-completion to use for instruction-following fine-tuning of LLMs tatsu-lab/alpaca Viewer • Updated May 22 • 36.3k • 426 databricks/databricks-dolly-15k Viewer • Updated Jun 30 • 38.2k • 368 Open-Orca/OpenOrca Viewer • Updated Aug 20 • 29.7k • 673 OpenAssistant/oasst1 Viewer • Updated May 2 • 14.6k • 1.03k
GPT-2 from scratch Reading list to fully understand GPT-2 and be able to implement it from scratch. Neural Machine Translation of Rare Words with Subword Units Paper • 1508.07909 • Published Aug 31, 2015 Attention Is All You Need Paper • 1706.03762 • Published Jun 12, 2017 • 6 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Paper • 1810.04805 • Published Oct 11, 2018 Generating Wikipedia by Summarizing Long Sequences Paper • 1801.10198 • Published Jan 30, 2018
Neural Machine Translation of Rare Words with Subword Units Paper • 1508.07909 • Published Aug 31, 2015
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Paper • 1810.04805 • Published Oct 11, 2018