Dataset Preview
Viewer
The full dataset viewer is not available (click to read why). Only showing a preview of the rows.
An error occurred while generating the dataset
Error code:   UnexpectedError

Need help to make the dataset viewer work? Open a discussion for direct support.

id
string
createdAt
string
userId
string
user
dict
text
string
visibility
string
reactionAcceptance
string
replyId
null
reply
null
fileIds
null
files
null
renoteId
string
renote
dict
localOnly
null
tags
null
mentions
null
cw
string
poll
null
"9hfjs2p2g7"
"2023-07-21T02:49:51.782Z"
"9dla97hyx8"
{ "name": "✿桜うどん✿", "avatarUrl": "https://proxy.misskeyusercontent.com/avatar.webp?url=https%3A%2F%2Fs3.arkjp.net%2Fmisskey%2F744014ba-1b12-4e11-bbe7-97b7204c8cf7.jpg&avatar=1", "avatarBlurhash": "ePOfP=%hTwrGtk~W-qt7%2kAR*D*s:OSw1?]%gVtNGRjTdTIw1cCVt", "onlineStatus": "online", "uid": "9dla97hyx8#[email protected]", "states": "", "badgeRoles": [ { "name": "Patreon Miskist", "iconUrl": "https://s3.arkjp.net/misskey/b03aec5c-4ef6-475d-b9ae-040531e77ff2.png", "displayOrder": 10 }, { "name": "FANBOX ミス廃", "iconUrl": "https://s3.arkjp.net/misskey/2e3993ac-1651-4277-80fc-28b36888ec31.png", "displayOrder": 0 } ] }
":ohiru_ittekimasu:"
"public"
null
null
null
null
null
null
null
null
null
null
null
null
"9hfjs3qkym"
"2023-07-21T02:49:53.132Z"
"9bf1avvb03"
{ "name": "依田芳人:ai_yay:", "avatarUrl": "https://proxy.misskeyusercontent.com/avatar.webp?url=https%3A%2F%2Fs3.arkjp.net%2Fmisskey%2Fwebpublic-a60ba7e7-468a-4a61-b756-f84b0c415427.png&avatar=1", "avatarBlurhash": "eONSN:Ib_#xt}k4xxs?DRjnh]|wa=rof9#=WNyVtaztlt+v}X8WEE4", "onlineStatus": "online", "uid": "9bf1avvb03#[email protected]", "states": "", "badgeRoles": [ { "name": "Patreon Miskist", "iconUrl": "https://s3.arkjp.net/misskey/b03aec5c-4ef6-475d-b9ae-040531e77ff2.png", "displayOrder": 10 } ] }
"仕事でアホみたいにタスク重ねられたから発狂して最低限終わらせて今週はもうノンストレスで行く気満々だったんだけど上司が進捗確認しに来て敗北した"
"public"
null
null
null
null
null
null
null
null
null
null
null
null
"9hfjs3s5n9"
"2023-07-21T02:49:53.189Z"
"9b4aoacv9l"
{ "name": ":hol_crew_happy:くりきんとん:hol_crew_love:", "avatarUrl": "https://proxy.misskeyusercontent.com/avatar.webp?url=https%3A%2F%2Fs3.arkjp.net%2Fmisskey%2Fd5442b68-a459-4b3e-8c53-f938596b59eb.png&avatar=1", "avatarBlurhash": "eKN1Dm9Y00gF9r.lEd%zI-oY00NYa]j:tLR-tQtQInn#01f+?boy%f", "onlineStatus": "online", "uid": "9b4aoacv9l#[email protected]", "states": "", "badgeRoles": [ { "name": "Patreon Miskist", "iconUrl": "https://s3.arkjp.net/misskey/b03aec5c-4ef6-475d-b9ae-040531e77ff2.png", "displayOrder": 10 } ] }
"ちゃんロアイケメンなので"
"public"
null
null
null
null
null
null
null
null
null
null
null
null
"9hfjs3xjb1"
"2023-07-21T02:49:53.383Z"
"9d9ntalltg"
{"name":"カン :verified_puddingified_verify:","avatarUrl":"https://proxy.misskeyusercontent.com/av(...TRUNCATED)
null
"public"
null
null
null
null
null
"9hfjr193u2"
{"id":"9hfjr193u2","createdAt":"2023-07-21T02:49:03.255Z","userId":"9biuycr857","user":{"name":"🔞(...TRUNCATED)
null
null
null
null
null
"9hfjs4s0rj"
"2023-07-21T02:49:54.480Z"
"9dh151cs3g"
{"name":":_nu::duckdance:","avatarUrl":"https://proxy.misskeyusercontent.com/avatar.webp?url=https%3(...TRUNCATED)
"大体はよもかさんが仰ったことでまとまってるので私からはほぼ何もにゃ(...TRUNCATED)
"public"
"nonSensitiveOnly"
null
null
null
null
null
null
null
null
null
"蒸し返すので伏せておきます。"
null
"9hfjs5bvvo"
"2023-07-21T02:49:55.195Z"
"9brpk29clj"
{"name":"リュート","avatarUrl":"https://proxy.misskeyusercontent.com/avatar.webp?url=https%3A%2F(...TRUNCATED)
":honmaka:"
"public"
null
null
null
null
null
null
null
null
null
null
null
null
"9hfjs5g1wp"
"2023-07-21T02:49:55.345Z"
"9d5p7f51r5"
{"name":"星坂那ヨワ:verified_blue:","avatarUrl":"https://proxy.misskeyusercontent.com/avatar.we(...TRUNCATED)
"昨日は19時に寝たから今日は元気なのだ。うそだ。元気などない。"
"public"
null
null
null
null
null
null
null
null
null
null
null
null
"9hfjs628u1"
"2023-07-21T02:49:56.144Z"
"9hbhutlx5y"
{"name":"あかね","avatarUrl":"https://proxy.misskeyusercontent.com/avatar.webp?url=https%3A%2F%2F(...TRUNCATED)
"どんな色やねん"
"public"
"likeOnlyForRemote"
null
null
null
null
null
null
null
null
null
null
null
"9hfjs6nxw0"
"2023-07-21T02:49:56.925Z"
"9d30aslay2"
{"name":"ひつぜ:peroro_sama:","avatarUrl":"https://proxy.misskeyusercontent.com/avatar.webp?url=h(...TRUNCATED)
null
"public"
null
null
null
null
null
"9hfjr193u2"
{"id":"9hfjr193u2","createdAt":"2023-07-21T02:49:03.255Z","userId":"9biuycr857","user":{"name":"🔞(...TRUNCATED)
null
null
null
null
null
"9hfjs87zgu"
"2023-07-21T02:49:58.943Z"
"9di7arrhot"
{"name":"安康マイ:io:","avatarUrl":"https://proxy.misskeyusercontent.com/avatar.webp?url=https%3(...TRUNCATED)
null
"public"
null
null
null
null
null
"9eao65oez7"
{"id":"9eao65oez7","createdAt":"2023-05-03T02:54:49.406Z","userId":"9e6ssipa70","user":{"name":"れ(...TRUNCATED)
null
null
null
null
null

Dataset Card for MissingKeys

Dataset Summary

MissingKeys is a raw dataset archive of the misskey.io network.

Supported Tasks and Leaderboards

This dataset is primarily intended for unsupervised training of text generation models; however, it may be useful for other purposes.

  • text-classification
  • text-generation

Languages

Primarily japanese, however there are also english as well.

Dataset Structure

All the files are located in jsonl files that has been compressed into .7z archives by date.

Data Instances

Here is a sample with all the potential fields:

{
    "id": "9hh9iux6al",
    "createdAt": "2023-07-22T07:38:17.994Z",
    "userId": "9grv7htulz",
    "user": {
        "uid": "9grv7htulz#[email protected]",
        "name": "千種ナオ(ばすキー)",
        "avatarUrl": "https://proxy.misskeyusercontent.com/avatar.webp?url=https%3A%2F%2Fs3.isk01.sakurastorage.jp%2Fbackspacekey%2Fmisskey%2Fca098593-5c2f-4488-8b82-18961149cf92.png&avatar=1",
        "avatarBlurhash": "eGD8ztEK0KVb-=4TtSXm-jf4B7Vs~CEND*Fy%2Mct7%Lx.M{xcS0bv",
        "states": "bot,nyaa~",
        "hostInfo": "[email protected]#e4d440"
        "emojis": {},
        "onlineStatus": "unknown"
    },
    "text": "パソコン工房などのユニットコム系列だと、マザボ売るときにドライバディスクがないと30%買取金額が下がるという知見を得た",
    "cw": null,
    "visibility": "public",
    "localOnly": false,
    "renoteCount": 0,
    "repliesCount": 0,
    "reactions": {},
    "reactionEmojis": {},
    "emojis": {},
    "fileIds": [],
    "files": [],
    "replyId": null,
    "renoteId": null,
    "uri": "https://misskey.backspace.fm/notes/9hh9iux6p7"
}

If the value is "Falsey" in python, it has been removed to save on space.

states is a comma seperated string that either includes: bot or nyaa~ (Indicates they enabled cat mode) or both.

Data Fields

Refer to the sample above. I'll drop in some additional notes:

uid in user follows this specific format:

user_id#username@user_host

Data Splits

Each jsonl file is split at 100000 notes.

Dataset Creation

Curation Rationale

Because we need a SNS dataset, and since twitter appears to be quite reluctant, we went for the alternative.

Source Data

Initial Data Collection and Normalization

None. No normalization is performed as this is a raw dump of the dataset. However we have removed empty and null fields to conserve on space.

Who are the source language producers?

The related users of misskey.io network.

Annotations

Annotation process

No Annotations are present.

Who are the annotators?

No human annotators.

Personal and Sensitive Information

We are certain there is no PII included in the dataset.

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

Misskey.io tends to be NSFW for images and is focused on Japanese culture.

Other Known Limitations

N/A

Additional Information

Dataset Curators

KaraKaraWitch

Licensing Information

Apache 2.0, for all parts of which KaraKaraWitch may be considered authors. All other material is distributed under fair use principles.

Ronsor Labs additionally is allowed to relicense the dataset as long as it has gone through processing.

Citation Information

@misc{missingkeys,
  title         = {MissingKeys: A SNS dataset on misskey.io network},
  author        = {KaraKaraWitch},
  year          = {2023},
  howpublished  = {\url{https://huggingface.co/datasets/RyokoExtra/MissingKeys}},
}

Name Etymology

N/A

Contributions

Downloads last month
9
Edit dataset card
Evaluate models HF Leaderboard