The full dataset viewer is not available (click to read why). Only showing a preview of the rows.
Error code: UnexpectedError
Need help to make the dataset viewer work? Open a discussion for direct support.
id
string
| createdAt
string
| userId
string
| user
dict
| text
string
| visibility
string
| reactionAcceptance
string
| replyId
null
| reply
null
| fileIds
null
| files
null
| renoteId
string
| renote
dict
| localOnly
null
| tags
null
| mentions
null
| cw
string
| poll
null
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
"9hfjs2p2g7" | "2023-07-21T02:49:51.782Z" | "9dla97hyx8" | {
"name": "✿桜うどん✿",
"avatarUrl": "https://proxy.misskeyusercontent.com/avatar.webp?url=https%3A%2F%2Fs3.arkjp.net%2Fmisskey%2F744014ba-1b12-4e11-bbe7-97b7204c8cf7.jpg&avatar=1",
"avatarBlurhash": "ePOfP=%hTwrGtk~W-qt7%2kAR*D*s:OSw1?]%gVtNGRjTdTIw1cCVt",
"onlineStatus": "online",
"uid": "9dla97hyx8#[email protected]",
"states": "",
"badgeRoles": [
{
"name": "Patreon Miskist",
"iconUrl": "https://s3.arkjp.net/misskey/b03aec5c-4ef6-475d-b9ae-040531e77ff2.png",
"displayOrder": 10
},
{
"name": "FANBOX ミス廃",
"iconUrl": "https://s3.arkjp.net/misskey/2e3993ac-1651-4277-80fc-28b36888ec31.png",
"displayOrder": 0
}
]
} | ":ohiru_ittekimasu:" | "public" | null | null | null | null | null | null | null | null | null | null | null | null |
"9hfjs3qkym" | "2023-07-21T02:49:53.132Z" | "9bf1avvb03" | {
"name": "依田芳人:ai_yay:",
"avatarUrl": "https://proxy.misskeyusercontent.com/avatar.webp?url=https%3A%2F%2Fs3.arkjp.net%2Fmisskey%2Fwebpublic-a60ba7e7-468a-4a61-b756-f84b0c415427.png&avatar=1",
"avatarBlurhash": "eONSN:Ib_#xt}k4xxs?DRjnh]|wa=rof9#=WNyVtaztlt+v}X8WEE4",
"onlineStatus": "online",
"uid": "9bf1avvb03#[email protected]",
"states": "",
"badgeRoles": [
{
"name": "Patreon Miskist",
"iconUrl": "https://s3.arkjp.net/misskey/b03aec5c-4ef6-475d-b9ae-040531e77ff2.png",
"displayOrder": 10
}
]
} | "仕事でアホみたいにタスク重ねられたから発狂して最低限終わらせて今週はもうノンストレスで行く気満々だったんだけど上司が進捗確認しに来て敗北した" | "public" | null | null | null | null | null | null | null | null | null | null | null | null |
"9hfjs3s5n9" | "2023-07-21T02:49:53.189Z" | "9b4aoacv9l" | {
"name": ":hol_crew_happy:くりきんとん:hol_crew_love:",
"avatarUrl": "https://proxy.misskeyusercontent.com/avatar.webp?url=https%3A%2F%2Fs3.arkjp.net%2Fmisskey%2Fd5442b68-a459-4b3e-8c53-f938596b59eb.png&avatar=1",
"avatarBlurhash": "eKN1Dm9Y00gF9r.lEd%zI-oY00NYa]j:tLR-tQtQInn#01f+?boy%f",
"onlineStatus": "online",
"uid": "9b4aoacv9l#[email protected]",
"states": "",
"badgeRoles": [
{
"name": "Patreon Miskist",
"iconUrl": "https://s3.arkjp.net/misskey/b03aec5c-4ef6-475d-b9ae-040531e77ff2.png",
"displayOrder": 10
}
]
} | "ちゃんロアイケメンなので" | "public" | null | null | null | null | null | null | null | null | null | null | null | null |
"9hfjs3xjb1" | "2023-07-21T02:49:53.383Z" | "9d9ntalltg" | {"name":"カン :verified_puddingified_verify:","avatarUrl":"https://proxy.misskeyusercontent.com/av(...TRUNCATED) | null | "public" | null | null | null | null | null | "9hfjr193u2" | {"id":"9hfjr193u2","createdAt":"2023-07-21T02:49:03.255Z","userId":"9biuycr857","user":{"name":"🔞(...TRUNCATED) | null | null | null | null | null |
"9hfjs4s0rj" | "2023-07-21T02:49:54.480Z" | "9dh151cs3g" | {"name":":_nu::duckdance:","avatarUrl":"https://proxy.misskeyusercontent.com/avatar.webp?url=https%3(...TRUNCATED) | "大体はよもかさんが仰ったことでまとまってるので私からはほぼ何もにゃ(...TRUNCATED) | "public" | "nonSensitiveOnly" | null | null | null | null | null | null | null | null | null | "蒸し返すので伏せておきます。" | null |
"9hfjs5bvvo" | "2023-07-21T02:49:55.195Z" | "9brpk29clj" | {"name":"リュート","avatarUrl":"https://proxy.misskeyusercontent.com/avatar.webp?url=https%3A%2F(...TRUNCATED) | ":honmaka:" | "public" | null | null | null | null | null | null | null | null | null | null | null | null |
"9hfjs5g1wp" | "2023-07-21T02:49:55.345Z" | "9d5p7f51r5" | {"name":"星坂那ヨワ:verified_blue:","avatarUrl":"https://proxy.misskeyusercontent.com/avatar.we(...TRUNCATED) | "昨日は19時に寝たから今日は元気なのだ。うそだ。元気などない。" | "public" | null | null | null | null | null | null | null | null | null | null | null | null |
"9hfjs628u1" | "2023-07-21T02:49:56.144Z" | "9hbhutlx5y" | {"name":"あかね","avatarUrl":"https://proxy.misskeyusercontent.com/avatar.webp?url=https%3A%2F%2F(...TRUNCATED) | "どんな色やねん" | "public" | "likeOnlyForRemote" | null | null | null | null | null | null | null | null | null | null | null |
"9hfjs6nxw0" | "2023-07-21T02:49:56.925Z" | "9d30aslay2" | {"name":"ひつぜ:peroro_sama:","avatarUrl":"https://proxy.misskeyusercontent.com/avatar.webp?url=h(...TRUNCATED) | null | "public" | null | null | null | null | null | "9hfjr193u2" | {"id":"9hfjr193u2","createdAt":"2023-07-21T02:49:03.255Z","userId":"9biuycr857","user":{"name":"🔞(...TRUNCATED) | null | null | null | null | null |
"9hfjs87zgu" | "2023-07-21T02:49:58.943Z" | "9di7arrhot" | {"name":"安康マイ:io:","avatarUrl":"https://proxy.misskeyusercontent.com/avatar.webp?url=https%3(...TRUNCATED) | null | "public" | null | null | null | null | null | "9eao65oez7" | {"id":"9eao65oez7","createdAt":"2023-05-03T02:54:49.406Z","userId":"9e6ssipa70","user":{"name":"れ(...TRUNCATED) | null | null | null | null | null |
Dataset Card for MissingKeys
Dataset Summary
MissingKeys is a raw dataset archive of the misskey.io network.
Supported Tasks and Leaderboards
This dataset is primarily intended for unsupervised training of text generation models; however, it may be useful for other purposes.
- text-classification
- text-generation
Languages
Primarily japanese, however there are also english as well.
Dataset Structure
All the files are located in jsonl files that has been compressed into .7z archives by date.
Data Instances
Here is a sample with all the potential fields:
{
"id": "9hh9iux6al",
"createdAt": "2023-07-22T07:38:17.994Z",
"userId": "9grv7htulz",
"user": {
"uid": "9grv7htulz#[email protected]",
"name": "千種ナオ(ばすキー)",
"avatarUrl": "https://proxy.misskeyusercontent.com/avatar.webp?url=https%3A%2F%2Fs3.isk01.sakurastorage.jp%2Fbackspacekey%2Fmisskey%2Fca098593-5c2f-4488-8b82-18961149cf92.png&avatar=1",
"avatarBlurhash": "eGD8ztEK0KVb-=4TtSXm-jf4B7Vs~CEND*Fy%2Mct7%Lx.M{xcS0bv",
"states": "bot,nyaa~",
"hostInfo": "[email protected]#e4d440"
"emojis": {},
"onlineStatus": "unknown"
},
"text": "パソコン工房などのユニットコム系列だと、マザボ売るときにドライバディスクがないと30%買取金額が下がるという知見を得た",
"cw": null,
"visibility": "public",
"localOnly": false,
"renoteCount": 0,
"repliesCount": 0,
"reactions": {},
"reactionEmojis": {},
"emojis": {},
"fileIds": [],
"files": [],
"replyId": null,
"renoteId": null,
"uri": "https://misskey.backspace.fm/notes/9hh9iux6p7"
}
If the value is "Falsey" in python, it has been removed to save on space.
states
is a comma seperated string that either includes: bot
or nyaa~
(Indicates they enabled cat mode) or both.
Data Fields
Refer to the sample above. I'll drop in some additional notes:
uid
in user
follows this specific format:
user_id#username@user_host
Data Splits
Each jsonl file is split at 100000 notes.
Dataset Creation
Curation Rationale
Because we need a SNS dataset, and since twitter appears to be quite reluctant, we went for the alternative.
Source Data
Initial Data Collection and Normalization
None. No normalization is performed as this is a raw dump of the dataset. However we have removed empty and null fields to conserve on space.
Who are the source language producers?
The related users of misskey.io network.
Annotations
Annotation process
No Annotations are present.
Who are the annotators?
No human annotators.
Personal and Sensitive Information
We are certain there is no PII included in the dataset.
Considerations for Using the Data
Social Impact of Dataset
[More Information Needed]
Discussion of Biases
Misskey.io tends to be NSFW for images and is focused on Japanese culture.
Other Known Limitations
N/A
Additional Information
Dataset Curators
KaraKaraWitch
Licensing Information
Apache 2.0, for all parts of which KaraKaraWitch may be considered authors. All other material is distributed under fair use principles.
Ronsor Labs additionally is allowed to relicense the dataset as long as it has gone through processing.
Citation Information
@misc{missingkeys,
title = {MissingKeys: A SNS dataset on misskey.io network},
author = {KaraKaraWitch},
year = {2023},
howpublished = {\url{https://huggingface.co/datasets/RyokoExtra/MissingKeys}},
}
Name Etymology
N/A
Contributions
- @KaraKaraWitch (Twitter) for gathering this dataset.
- Downloads last month
- 9