title
string
published
string
url
string
video_id
string
channel_id
string
id
string
text
string
start
float64
end
float64
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t0.0"
"Hi, welcome to the video."
0
9.36
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t3.0"
"So this is the fourth video in a Transformers"
3
11.56
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t9.36"
"from Scratch mini series."
9.36
15.84
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t11.56"
"So if you haven't been following along,"
11.56
18.48
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t15.84"
"we've essentially covered what you can see on the screen."
15.84
20.6
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t18.48"
"So we got some data."
18.48
23.72
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t20.6"
"We built a tokenizer with it."
20.6
25.76
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t23.72"
"And then we've set up our input pipeline"
23.72
28.48
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t25.76"
"ready to begin actually training our model, which"
25.76
32.36
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t28.48"
"is what we're going to cover in this video."
28.48
35.96
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t32.36"
"So let's move over to the code."
32.36
39.56
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t35.96"
"And we see here that we have essentially everything"
35.96
40.48
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t39.56"
"we've done so far."
39.56
48.8
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t40.480000000000004"
"So we've built our input data, our input pipeline."
40.48
51.52
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t48.8"
"And we're now at a point where we have a data loader,"
48.8
54.04
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t51.519999999999996"
"PyTorch data loader, ready."
51.52
56.4
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t54.040000000000006"
"And we can begin training a model with it."
54.04
61.84
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t56.4"
"So there are a few things to be aware of."
56.4
64.88
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t61.839999999999996"
"So I mean, first, let's just have a quick look"
61.84
67.28
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t64.88"
"at the structure of our data."
64.88
72.36
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t67.28"
"So when we're training a model for mass language modeling,"
67.28
74.12
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t72.36"
"we need a few tensors."
72.36
76.04
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t74.12"
"We need three tensors."
74.12
80.32
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t76.03999999999999"
"And this is for training Roberta, by the way, as well."
76.04
83.24
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t80.32"
"Same thing with Bert as well."
80.32
88.76
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t83.24"
"We have our input IDs, attention mask, and our labels."
83.24
94.2
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t88.75999999999999"
"Our input IDs have roughly 15% of their values masked."
88.76
96.64
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t94.19999999999999"
"So we can see that here we have these two tensors."
94.2
98.04
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t96.64"
"These are the labels."
96.64
102.56
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t98.03999999999999"
"And we have the real tokens in here, the token IDs."
98.04
105.2
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t102.56"
"And then in our input IDs tensor,"
102.56
108.68
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t105.19999999999999"
"we have these being replaced with mask tokens,"
105.2
110.52
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t108.67999999999999"
"the number fours."
108.68
114.44
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t110.52"
"So that's the structure of our input data."
110.52
119
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t114.44"
"We've created a Torch data set from it"
114.44
122.6
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t119.0"
"and use that to create a Torch data loader."
119
125.48
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t122.6"
"And with that, we can actually begin"
122.6
127.76
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t125.47999999999999"
"setting up our model for training."
125.48
131.72
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t127.75999999999999"
"So there are a few things to that."
127.76
134.36
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t131.72"
"We can't just begin training straight away."
131.72
136.32
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t134.35999999999999"
"So the first thing that we need to do"
134.36
140.56
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t136.32"
"is create a Roberta config object."
136.32
144
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t140.56"
"And the config object is something"
140.56
146.76
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t144.0"
"that we use when we're initializing a transformer"
144
149.16
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t146.76"
"from scratch in order to initialize it"
146.76
152.32
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t149.16"
"with a certain set of parameters."
149.16
153.76
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t152.32"
"So we'll do that first."
152.32
159.96
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t153.76"
"So we want from transformers import Roberta config."
153.76
163.08
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t159.96"
"And to create that config object, we do this."
159.96
167.92
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t163.08"
"So we do Roberta config."
163.08
172.56
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t167.92000000000002"
"And then in here, we need to specify different parameters."
167.92
177.12
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t172.56"
"Now, one of the main ones is the voc up size."
172.56
180.48
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t177.12"
"Now, this needs to match to whichever voc up size"
177.12
186.08
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t180.48000000000002"
"we have already created in our tokenizer"
180.48
187.88
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t186.08"
"when we're initializing it."
186.08
191.48
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t187.88"
"In our tokenizer, when building our tokenizer."
187.88
201.28
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t191.48"
"So I mean, for me, if I go all the way up here to here,"
191.48
203.92
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t201.28"
"this is where I created the tokenizer."
201.28
206.36
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t203.92"
"I can see, OK, it's this number here."
203.92
209.24
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t206.35999999999999"
"So 30,522."
206.36
213.2
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t209.24"
"So I'm going to set that."
209.24
218.64
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t213.2"
"But if you don't have that, you can just write tokenizer voc"
213.2
219.92
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t218.64"
"up size."
218.64
222.2
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t219.92"
"So here."
219.92
224.24
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t222.2"
"And that will return your voc up size."
222.2
226.64
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t224.23999999999998"
"So I mean, let's replace that."
224.24
228.88
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t226.64"
"We'll do this."
226.64
235.64
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t228.88"
"Now, as well as that, we want to also set this."
228.88
239.72
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t235.64"
"So max position embedding."
235.64
246.68
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t239.72"
"And this needs to be set to your max length plus two"
239.72
247.28
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t246.68"
"in this case."
246.68
251.12
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t247.28"
"So max length is set up here."
247.28
253.24
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t251.12"
"So where is it?"
251.12
256
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t253.24"
"Max length here, 512."
253.24
259.96
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t256.0"
"Plus two because we have these added special tokens."
256
263.2
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t259.96"
"If we don't do that, we'll end up with a index error"
259.96
268.56
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t263.2"
"because we're going beyond the embedding limits."
263.2
270.24
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t268.56"
"Now we want our hidden size."
268.56
274.72
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t270.24"
"So this is the size of the vectors"
270.24
277.84
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t274.72"
"that our embedding layers within Roberta will create."
274.72
284.16
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t277.84"
"So each token, so we have 514 or 12 tokens."
277.84
289.24
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t284.16"
"And each one of those will be signed a vector of size 768."
284.16
290.76
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t289.24"
"This is the typical number."
289.24
296.08
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t290.76"
"So that's the originally came from the BERT based model."
290.76
301.76
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t296.08"
"Then we set up the architecture of the internals of the model."
296.08
304.88
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t301.76"
"So we want the number of attention heads,"
301.76
307.08
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t304.88"
"which I'm going to set to 12."
304.88
314.4
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t307.08"
"And also the number of hidden layers, which I..."
307.08
318.72
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t314.4"
"So the default for this is for Roberta, 12."
314.4
324.12
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t318.71999999999997"
"But I'm going to go with six for the sake of keeping train"
318.72
326.96
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t324.12"
"times a little shorter."
324.12
334.28
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t326.96"
"Now we also need to add type, vocab, size, which is just one."
326.96
337.4
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t334.28000000000003"
"So that's the different token types that we have."
334.28
338.36
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t337.4"
"We just have one."
337.4
341.76
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t338.36"
"Don't need to worry about that."
338.36
347.92
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t341.76"
"OK, so that's our configuration object ready."
341.76
352.04
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t347.92"
"And we can import and initialize a Roberta model with that."
347.92
353.72
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t352.04"
"So we went from transformers."
352.04
356.88
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t353.72"
"This is kind of similar to what we usually do."
353.72
359.28
"Training and Testing an Italian BERT - Transformers From Scratch #4"
"2021-07-06 13:00:03 UTC"
"https://youtu.be/35Pdoyi6ZoQ"
"35Pdoyi6ZoQ"
"UCv83tO5cePwHMt1952IVVHw"
"35Pdoyi6ZoQ-t356.88000000000005"
"Import Roberta."
356.88
361.72

The YouTube transcriptions dataset contains technical tutorials (currently from James Briggs, Daniel Bourke, and AI Coffee Break) transcribed using OpenAI's Whisper (large). Each row represents roughly a sentence-length chunk of text alongside the video URL and timestamp.

Note that each item in the dataset contains just a short chunk of text. For most use cases you will likely need to merge multiple rows to create more substantial chunks of text, if you need to do that, this code snippet will help:

from datasets import load_dataset

# first download the dataset
data = load_dataset(
    'jamescalam/youtube-transcriptions',
    split='train'
)

new_data = []  # this will store adjusted data

window = 6  # number of sentences to combine
stride = 3  # number of sentences to 'stride' over, used to create overlap

for i in range(0, len(data), stride):
    i_end = min(len(data)-1, i+window)
    if data[i]['title'] != data[i_end]['title']:
        # in this case we skip this entry as we have start/end of two videos
        continue
    # create larger text chunk
    text = ' '.join(data[i:i_end]['text'])
    # add to adjusted data list
    new_data.append({
        'start': data[i]['start'],
        'end': data[i_end]['end'],
        'title': data[i]['title'],
        'text': text,
        'id': data[i]['id'],
        'url': data[i]['url'],
        'published': data[i]['published']
    })
Downloads last month
354
Edit dataset card
Evaluate models HF Leaderboard