Albert Release 64Ff65Ba18830Fabea2F2Cec

google 's Collections

BERT release

ALBERT release

ELECTRA release

Flan-T5 release

T5 release

MT5 release

ALBERT release

updated 9 days ago

The ALBERT release was done in two steps, over 4 checkpoints of different sizes each time. The first version is noted as "v1", the second as "v2".

albert-base-v1

Fill-Mask • Updated Apr 6 • 61.3k • 3

Note This model has the following configuration: - 12 repeating layers - 128 embedding dimension - 768 hidden dimension - 12 attention heads - 11M parameters Metrics: Average (80.1), Squad v1.1 (89.3/82.3), Squad v2 (80.0/77.1), MNLI (81.6) SST-2 (90.3) RACE(64.0)
albert-large-v1

Fill-Mask • Updated Jan 13, 2021 • 582

Note This model has the following configuration: - 24 repeating layers - 128 embedding dimension - 1024 hidden dimension - 16 attention heads - 17M parameters Metrics: Average (82.4), Squad v1.1 (90.6/83.9), Squad v2 (82.3/79.4), MNLI (83.5) SST-2 (91.7) RACE(68.5)
albert-xlarge-v1

Fill-Mask • Updated Aug 11 • 499

Note This model has the following configuration: - 24 repeating layers - 128 embedding dimension - 2048 hidden dimension - 16 attention heads - 58M parameters Metrics: Average (85.5), Squad v1.1 (92.5/86.1), Squad v2 (86.1/83.1), MNLI (86.4) SST-2 (92.4) RACE(74.8)
albert-xxlarge-v1

Fill-Mask • Updated Jan 13, 2021 • 1.79k • 2

Note This model has the following configuration: - 12 repeating layers - 128 embedding dimension - 4096 hidden dimension - 64 attention heads - 223M parameters Metrics: Average (91.0), Squad v1.1 (94.8/89.3), Squad v2 (90.2/87.4), MNLI (90.8) SST-2 (96.9) RACE(86.5)
albert-base-v2

Fill-Mask • Updated May 30 • 4.98M • 60

Note This model has the following configuration: - 12 repeating layers - 128 embedding dimension - 768 hidden dimension - 12 attention heads - 11M parameters Metrics: Average (82.3) Squad v1.1 (90.2/83.2) Squad v2 (82.1/79.3) MNLI (84.6) SST-2 (92.9) RACE (66.8)
albert-large-v2

Fill-Mask • Updated Apr 6 • 15.2k • 11

Note This model has the following configuration: - 24 repeating layers - 128 embedding dimension - 1024 hidden dimension - 16 attention heads - 17M parameters Metrics: Average (85.7) Squad v1.1 (91.8/85.2) Squad v2 (84.9/81.8) MNLI (86.5) SST-2 (94.9) RACE (75.2)
albert-xlarge-v2

Fill-Mask • Updated Jan 13, 2021 • 3.51k • 3

Note This model has the following configuration: - 24 repeating layers - 128 embedding dimension - 2048 hidden dimension - 16 attention heads - 58M parameters Metrics: Average (87.9) Squad v1.1 (92.9/86.4) Squad v2 (87.9/84.1) MNLI (87.9) SST-2 (95.4) RACE (80.7)
albert-xxlarge-v2

Fill-Mask • Updated Apr 6 • 22.3k • 11

Note This model has the following configuration: - 12 repeating layers - 128 embedding dimension - 4096 hidden dimension - 64 attention heads - 223M parameters Metrics: Average (90.9) Squad v1.1 (94.6/89.1) Squad v2 (89.8/86.9) MNLI (90.6) SST-2 (96.8) RACE (86.8)