dawnbrandbots / yaml-yugi Goto Github PK

A machine-readable, human-editable database of the Yu-Gi-Oh! Trading Card Game, Official Card Game, Master Duel, Rush Duel, Speed Duel.

Home Page: https://dawnbrandbots.github.io/yaml-yugi/cards.json

License: GNU Affero General Public License v3.0

Python 75.93% TypeScript 16.01% JavaScript 0.51% HTML 7.56%

etl yaml yaml-yugi yu-gi-oh yugioh master-duel masterduel rush-duel json ocg

yaml-yugi's Introduction

YAML Yugi

This project aims to create a comprehensive, machine-readable, human-editable database of the Yu-Gi-Oh! Trading Card Game (TCG), Official Card Game (OCG), Master Duel video game, Rush Duel, and Speed Duel.

YAML Yugi is the primary data source for the new version of Discord bot Bastion.

Most card text is © Studio Dice/SHUEISHA, TV TOKYO, KONAMI. They can be found under /data and aggregations are published to GitHub Pages.

The remaining files — the actual source code of this stage of the pipeline — are available under the GNU Affero General Public License 3.0 or later. See COPYING for more details.

The aggregate branch is very large and the history is not very relevant. It could be moved elsewhere in the future if this is a problem. For now, it is recommended to clone this repository with the --single-branch flag to work on it. If you need to fetch other branches from the remote automatically afterward, you can edit the corresponding section of your .git/config file to look like this:

[remote "origin"]
        url = [email protected]:DawnbrandBots/yaml-yugi.git
        fetch = +refs/heads/*:refs/remotes/origin/*
        fetch = ^refs/heads/aggregate

Sample links

Forbidden & Limited Lists (Limit Regulations)

Individual cards, JSON and YAML variants both available

OCG/TCG card by password

Canonical download: https://github.com/DawnbrandBots/yaml-yugi/raw/master/data/cards/00010000.json
CDN with correct MIME type and CORS: https://cdn.jsdelivr.net/gh/DawnbrandBots/yaml-yugi/data/cards/00010000.json
Alternative CDN: https://cdn.statically.io/gh/DawnbrandBots/yaml-yugi/master/data/cards/00010000.json

OCG/TCG card without password by Konami ID

Canonical download: https://github.com/DawnbrandBots/yaml-yugi/raw/master/data/cards/kdb5000.json
CDN with correct MIME type and CORS: https://cdn.jsdelivr.net/gh/DawnbrandBots/yaml-yugi/data/cards/kdb5000.json
Alternative CDN: https://cdn.statically.io/gh/DawnbrandBots/yaml-yugi/master/data/cards/kdb5000.json

Prerelease OCG/TCG card

Same as above, but with yugipedia<PAGE_ID> file names.

Rush Duel card by Konami ID

Canonical download: https://github.com/DawnbrandBots/yaml-yugi/raw/master/data/rush/15150.json
CDN with correct MIME type and CORS: https://cdn.jsdelivr.net/gh/DawnbrandBots/yaml-yugi/data/rush/15150.json
Alternative CDN: https://cdn.statically.io/gh/DawnbrandBots/yaml-yugi/master/data/rush/15150.json

TCG Speed Duel Skill Card

Canonical download: https://github.com/DawnbrandBots/yaml-yugi/raw/master/data/tcg-speed-skill/yugipedia585581.json
CDN with correct MIME type and CORS: https://cdn.jsdelivr.net/gh/DawnbrandBots/yaml-yugi/data/tcg-speed-skill/yugipedia585581.json
Alternative CDN: https://cdn.statically.io/gh/DawnbrandBots/yaml-yugi/master/data/tcg-speed-skill/yugipedia585581.json

Aggregations

Series and archetypes, JSON and YAML both available

As list, Canonical download: https://github.com/DawnbrandBots/yaml-yugi/raw/master/data/series/list.json
As list, CDN with correct MIME type and CORS: https://cdn.jsdelivr.net/gh/DawnbrandBots/yaml-yugi/data/series/list.json
As list, alternative CDN: https://cdn.statically.io/gh/DawnbrandBots/yaml-yugi/master/data/series/list.json
As mapping from English name, Canonical download: https://github.com/DawnbrandBots/yaml-yugi/raw/master/data/series/map.json
As mapping from English name, CDN with correct MIME type and CORS: https://cdn.jsdelivr.net/gh/DawnbrandBots/yaml-yugi/data/series/map.json
As mapping from English name, alternative CDN: https://cdn.statically.io/gh/DawnbrandBots/yaml-yugi/master/data/series/map.json

All OCG/TCG cards, including prereleases

All Rush Duel cards

All Master Duel cards

https://dawnbrandbots.github.io/yaml-yugi/master-duel-raw.json

All TCG Speed Duel Skill Cards

https://dawnbrandbots.github.io/yaml-yugi/skill.json

yaml-yugi's People

Contributors

Stargazers

Watchers

Forkers

larry126 aliaryantech xyj-3

yaml-yugi's Issues

Interconvert Traditional and Simplified Chinese translations

https://github.com/BYVoid/OpenCC

Korean input priority for card names and text

https://github.com/DawnbrandBots/yaml-yugi-ko/blob/master/overrides.tsv (ultimately, this should not be needed)
Yugipedia, if it contains ruby text
Official database contents otherwise preferred

Highlight discrepancies between Yugipedia and the official database and correct Yugipedia

Parse bonus derived field "ritualmonster" for Ritual Spells

Note: this may contain a bulleted list, as in "End of the World".

I believe the corresponding ritualcard field for Ritual Monsters does not yet support bulleted lists, which is why "Black Luster Soldier - Legendary Swordmaster" has it left blank.

Transform Duel Links Skills

Target: https://github.com/DawnbrandBots/yaml-yugipedia/tree/master/wikitext/duel-links-skills, pulled from https://yugipedia.com/wiki/Category:Yu-Gi-Oh!_Duel_Links_Skills

These use Template:Duel_Links_Skill.

Fields

name (all languages, like CardTable2)
lore (all languages, like CardTable2)
releases (important for identifying the character and possibly the world?)

Validate that output documents match a given JSON schema

Rush Duel cards missing Japanese names should check a corresponding OCG card for name

e.g. https://yugipedia.com/wiki/Worm_Drake_(Rush_Duel) does not have ja_name, instead obtaining it from its OCG counterpart.

Add data for card set information

I quite love that this data set has information about what sets what cards appeared in regardless of locale. A lot of Yugioh tools I've found care only about the TCG, and usually only the English TCG at that. However, I've run into a big issue when trying to pin cards to certain sets, considering all possible locales:

Sometimes, two sets from different locales that have the same name refers to the same set (i.e. Legacy of Darkness)
Sometimes, two sets from different locales that have the same name refers to a different set (i.e. Pharaoh's Servant)
Sometimes, two sets from different locales that have the same name refers to the same set, BUT the sets have different contents across locales (i.e. Legend of Blue-Eyes White Dragon)

Furthermore, there is some data I'd like regarding sets, such as when a set first came out in each locale. Have you considered making an additional dataset for card sets?

Or a way to convert JASON to YDK I don't want to convert DACK files one by one, I want to convert them all at once.

Add dry run version of merge-transform pipeline to test changes

Test that no wikitext templates are in the output

It should suffice to check for {{

Improve parallelization on worker

The current split into two batches is naïve. One worker always finishes well before the other.

Forbidden & Limited Lists

Yugipedia categories for yaml-yugipedia, including subcategories for the OCG:

Official sources:

Card database (all regions except China): https://www.db.yugioh-card.com/yugiohdb/forbidden_limited.action
~~TCG (including historical, updates faster than US): https://www.yugioh-card.com/uk/limited/~~
TCG (including historical): https://www.yugioh-card.com/eu/play/forbidden-and-limited-list/
- US: https://www.yugioh-card.com/en/limited/
- LATAM (current only): https://www.yugioh-card.com/lat-am/limited/index.html
- TCG cards not legal in EU: https://www.yugioh-card.com/eu/play/card-legality/
OCG Japan (including historical): https://www.yugioh-card.com/japan/event/rankingduel/limitregulation/
- English version (can also be found at my, sg, tw, ph): https://www.yugioh-card.com/hk/event/rules_guides/forbidden_cardlist.php
OCG Korea (current only): https://yugioh.co.kr/site/limit_regulation.php
OCG China (including historical): https://www.yugioh-card-cn.com/simplifiedLimitRegulation

Onboard Ruff and GuardDog linters

https://github.com/DawnbrandBots/yaml-yugipedia/blob/master/.github/workflows/python.yml

Some insights/opinions

Koder asked me to check this repo out and provide a second opinion.

Disclaimer: I have no idea what the motivation, vision or long term plans for this project are so some of my points may be invalid.

General

Try to version your database schema so that you can do breaking changes without messing stuff for consumers up. Some random (probably flawed) idea :

Create a branch called "v1". Make it very clear that consumers are supposed to use that one and not main or whatever.
Keep the data of v1 updated.
When you want to make a breaking change to the schema: Branch of v1 to v2. Make your breaking changes.
Keep v2 updated and dont touch v1. v1 will become stale but at least it wont break anything.

Alternatively add a note that there is no guarantee for stability and to do automatic pulls on the users own risk.

You should also check out YGOPRODECKs API. Some aspects of it can be improved but I'm sure it can also be an inspiration.

Format

It's a bit odd that card data, ocg lists and tcg lists all have different structures. The lists should have the same format/structure at least so that the same code can be used to consume them.

For the lflists I would just provide a cards passcode with its legality. Additional information (like card type and name) can be fetched from the database.

I think json would be a better choice here instead of yaml. While human-writeability of yaml is better, the readability of both is pretty much the same imo. Plus, json parsers are more ubiquitous and even part of some languages standard library and they generally have a better parsing speed which is relevant for such huge datasets. (Disclaimer: I have a huge dislike against yaml.)

Card data

monster_type_line is cool for when you want to output it without having to manually format it (which can be a pain with the correct ordering I assume.) But it would be cool for searching/filtering features if the type would be an additional/separate field and extra properties an array of tags (like "Fusion", "Ritual", "Normal", "Effect", "Toon", "Flip", "Spirit" etc). If you decide to go with the array-of-tags route do the tags explicitly; While on a real card a Spirit tag implies that its an Effect monster, the database should still say that it is a Spirit and Effect monster. That makes searching easier since if you search for effect monsters you do not need to manually include Spirits, Toons, Flips etc.

In addition to the official password I would add an array with unofficial ids that are sorted ascendingly. This array should not include the official password imo. Thats YGOPRO specific but Bastion already lists those rn so I assume you are not against it.

Do not use unicode symbols in the database for the link markers. Just write them out (like "Left", "Top", "Bottom-Right") and make it the responsibility of the consumer to map them to whichever represantion they want.

Avoid mixing types; That will make the life of devs using languages like Java or C# hard. I noticed that this happens with cards that have "?" stats.

For cards with "?" I would use 0 as the stat and add a boolean which indicates that they have that stat as "?". It's a good compromise between not breaking arithmetic operations and giving the consumer a way to display "?" stats properly.

Release dates would be a super useful feature. Adding them to "sets" would be nice and an extra field for the initial tcg release would be super useful for people playing alt and retro formats. There are a couple issues though:

They are not synchronized for different regions. Just taking the lowest one found would in theory be an option but the Portuguese or Brazilians made this impossible since they released some products weeks(!) in advance and that date would be annoying for the majority of the users. Another option would be to make that information NA TCG centric. Retro and alt formats usually revolve around NAs TCG releases so that would be not an issue imo (I'm not an American btw)
Some products do not have a concrete release date. Like SJ promos. Yugipedia and Konami list no release dates for them. If you got them a couple days before the start of the month in which the SJ magazine debuted you were allowed to use them. For those cases it is alright to just use the first day of a month as the release date imo and I'm sure most people would agree.

Older versions of names (like Kinetic Soldier) would be cool but that may be hard to get depending on the data sources.

Card (and maybe archetype) name aliases

Like the assignments system, there should be another override system for commonly used shorthands for cards and possibly archetypes, to aid in search resolution. Particularly well-known examples would be MST and VFD. A very outdated collection (some are not relevant) exists in old Bastion.

No Korean Pendulum Effect is inconsistent with other languages

Other languages: null
Korean: empty string

Add test for Rush Duel card appearing in OCG/TCG data

Simple heuristic: if jp_sets contains a set number starting with RD/. Skip and ignore in pipeline.

Exclude "TBA" card text from pipeline

e.g. https://yugipedia.com/wiki/Spell%20Card%253A%20%22Soul%20Exchange%22 https://yugipedia.com/wiki/Spell_Card:_%22Monster_Reborn%22

Transform Rush Duel cards

Target: https://github.com/DawnbrandBots/yaml-yugipedia/tree/master/wikitext/rush, pulled from https://yugipedia.com/wiki/Category:Rush_Duel_cards

Rush Duel cards use Template:CardTable2, so some transformation code can be reused (database_id, name, lore, sets, monster fields).

New properties to parse and transform:

misc (if it contains "Legend Card")
image + current_image (if the actual URL is also pulled, note that some have VG arts now)
requirement (all languages, e.g. ja_requirement, may be "None")
maximum_atk
summoning_condition (all languages, e.g. ja_summoning_condition)
continuous_effect

Additional notes:

Multi-choice effect has an empty requirement and an attempted reproduction of the layout shoved into the lore field
Should represent the [Effect] label in a more structured way than exists
Future task could be to maintain the references to Maximum L and R monsters

Implement override system

Extend the pipeline to merge in any additional repository data sources for:

Fake passwords assigned to cards without
Fake passwords assigned to alternative artworks
Decide whether to merge exceptional cases of alternate-password Dark Magician and Polymerization
Unofficial translations that may not make sense to be part of Yugipedia articles

Process Master Duel accessories

https://github.com/DawnbrandBots/yaml-yugipedia/tree/master/wikitext/Yu-Gi-Oh!_Master_Duel_accessories
https://yugipedia.com/wiki/List_of_Yu-Gi-Oh!_Master_Duel_accessories

Convert to JSON/YAML for future use, e.g. API to enumerate all icons, annotating card data with related icons and sleeves

Add a flag to generate JSON schema for output documents

Suggested library: GenSON

Parse archetypes and series in pipeline

Transform TCG Speed Duel Skill Cards

Target: https://github.com/DawnbrandBots/yaml-yugipedia/tree/master/wikitext/speed, pulled from https://yugipedia.com/wiki/Category:Skill_Cards

https://yugipedia.com/wiki/List_of_Skill_Cards ~~appears to include Duel Links skills, to bring up~~

Skill Cards use Template:CardTable2, so some transformation code can be reused, mainly for name, front text box, sets, and types.

New properties to parse and transform:

character (technically may be inferred from types)
skill_activation (all languages, e.g. fr_skill_activation)
image (front scan)
image2 (back scan)
Bonus field: source_card ???

Additional notes:

Skills that behave as a different kind of card (Continuous Trap, Continuous Spell, Field Spell) are still structured like skill pages

Add more filed for Master Duel card data

Currently, the JSON data returned from this link includes the following fields: ["rarity", "attribute", "types", "atk", "lore", ...]. I understand that this data is retrieved from the API, such as this.

However, the Yugipedia web page offers additional information that is crucial for a comprehensive Master Duel card database. These include fields like:

Craftable: Whether the card can be crafted in the game.
Release Date (specific to Master Duel): When the card was released in the game.
Included Set (specific to Master Duel): Which set the card is part of in the game.
These fields can be extracted from the HTML content returned by a different API endpoint, such as this one. Unfortunately, the API response here is in HTML format, which requires additional parsing to extract the necessary fields.

At this time, I haven't found a more straightforward way to retrieve these specific fields from the API. Including these additional fields would significantly enhance the utility of the Master Duel cards database.

Error notification

Need some kind of alert across YAML Yugi and YAML Yugipedia when pipelines fail to run for a while.

Just turning on Actions webhooks for YAML Yugipedia is too noisy because often there is no delta. Need to distinguish between the no-delta case and the total failure. YAML Yugi could have Actions webhooks on because there are no spurious failures here, but this will not capture the failure mode when the merge workflow is totally not being called.

Publishing the aggregate file

The aggregate branch may be untenable in the long run. git-sizer could be used to monitor repository performance.

Static site hosting alternatives for cards.json and cards.yaml:

GitHub Pages via actions/upload-pages-artifact and actions/deploy-pages
- determine whether this still uses the gh-pages branch, because if it does, it faces the same problem
Netlify (CDN!)
Cloudflare Pages (CDN!)
Vercel (CDN!)
Surge (never heard of it before)

Exclude Rush Duel match winners

https://yugipedia.com/wiki/?curid=1078921
https://yugipedia.com/wiki/Elttaes,_the_Master_of_Duels_(Rush_Duel)

Addition of non-playable cards

Hello ,

I noticed that these 3 cards are missing from the aggreate json file

https://yugipedia.com/wiki/Obelisk_the_Tormentor_(original) : LC01-EN001
https://yugipedia.com/wiki/Slifer_the_Sky_Dragon_(original) : LC01-EN002
https://yugipedia.com/wiki/The_Winged_Dragon_of_Ra_(original) : LC01-EN003

Is it possible to add them as well as the other non-playable cards?

Regards,

Nadjim

Parse “Unofficial name” and “Unofficial lore” tags in pipeline

Move commit-push action to alternate repository

Possibly https://github.com/DawnbrandBots/.github

GitHub Actions has to download the entire tarball for this repository to access the action, which is inefficient as most of this repository is data. https://github.com/DawnbrandBots/yaml-yugipedia/actions/runs/6079438273/job/16491927404

Merge in Master Duel translations when other translations are not available

Some cards have never been printed in a locale but have translations available in Master Duel.

e.g. https://yugipedia.com/wiki/Mushroom_Man_(Master_Duel) in Korean

Handle "Fullwidth wordwrap" and potentially other templates in names

Because #4 has yet to be implemented, this slipped through.

Prime example: https://yugipedia.com/wiki/The_Tyrant_Neptune -> https://github.com/DawnbrandBots/yaml-yugi/blob/master/data/cards/88071625.yaml#L10

Split up and normalize monster_type_line

monster_type_line should be broken up into an array of tokens kept in the same order.

monster_type_line as it currently exists, obtained from Yugipedia, has the following limitations:

Old monsters with abilities (e.g. Spirit) are missing the modern "Effect" at the end;
Old Tuners lack Normal or Effect classification;
Old Normal monsters lack the modern "Normal" at the end;

This is because the string as it exists on Yugipedia reflects the latest available physical print of the card (confirmed with admin). Normalization involves obtaining data from the official database, which has the modern representation regardless of recent prints, or a heuristic (PowerShell snippet from @NeilBeforeMemes):

            $types = $content.monster_type_line.Split(" / ")

            if($types.length -eq 1) {
                $types += "Normal"
            }

            if(($types[$types.length - 1] -eq "Sprit") -or 
                ($types[$types.length - 1] -eq "Gemini") -or 
                ($types[$types.length - 1] -eq "Toon") -or 
                ($types[$types.length - 1] -eq "Union") -or 
                ($types[$types.length - 1] -eq "Flip")) {
                $types += "Effect"
            }

            if($types[$types.length - 1] -eq "Tuner") {
                if(($data.name -eq "Water Spirit") -or
                ($data.name -eq "Dragon Core Hexer") -or
                ($data.name -eq "Angel Trumpeter") -or
                ($data.name -eq "Tune Warrior")) {
                    $types += "Normal"
                }
                else {
                    $types += "Effect"
                }
            }

This would fix DawnbrandBots/bastion-bot#174. Inferring the OCG "Special Summon" type (DawnbrandBots/bastion-bot#134) could also be done as part of the data here. This would help native language support for advanced queries (DawnbrandBots/bastion-bot#343) as the translations of these card properties can be added in the load step.