Git Product home page Git Product logo

dawnbrandbots / yaml-yugi Goto Github PK

View Code? Open in Web Editor NEW
12.0 3.0 3.0 6.17 GB

A machine-readable, human-editable database of the Yu-Gi-Oh! Trading Card Game, Official Card Game, Master Duel, Rush Duel, Speed Duel.

Home Page: https://dawnbrandbots.github.io/yaml-yugi/cards.json

License: GNU Affero General Public License v3.0

Python 75.93% TypeScript 16.01% JavaScript 0.51% HTML 7.56%
etl yaml yaml-yugi yu-gi-oh yugioh master-duel masterduel rush-duel json ocg

yaml-yugi's Introduction

YAML Yugi

This project aims to create a comprehensive, machine-readable, human-editable database of the Yu-Gi-Oh! Trading Card Game (TCG), Official Card Game (OCG), Master Duel video game, Rush Duel, and Speed Duel.

YAML Yugi is the primary data source for the new version of Discord bot Bastion.

Most card text is © Studio Dice/SHUEISHA, TV TOKYO, KONAMI. They can be found under /data and aggregations are published to GitHub Pages.

The remaining files — the actual source code of this stage of the pipeline — are available under the GNU Affero General Public License 3.0 or later. See COPYING for more details.

Rush Duel cards Speed Duel Skill cards

Merge all data sources Validate data Validate assignments.yaml

The aggregate branch is very large and the history is not very relevant. It could be moved elsewhere in the future if this is a problem. For now, it is recommended to clone this repository with the --single-branch flag to work on it. If you need to fetch other branches from the remote automatically afterward, you can edit the corresponding section of your .git/config file to look like this:

[remote "origin"]
        url = [email protected]:DawnbrandBots/yaml-yugi.git
        fetch = +refs/heads/*:refs/remotes/origin/*
        fetch = ^refs/heads/aggregate

Sample links

Forbidden & Limited Lists (Limit Regulations)

Individual cards, JSON and YAML variants both available

OCG/TCG card by password

OCG/TCG card without password by Konami ID

Prerelease OCG/TCG card

Same as above, but with yugipedia<PAGE_ID> file names.

Rush Duel card by Konami ID

TCG Speed Duel Skill Card

Aggregations

Series and archetypes, JSON and YAML both available

All OCG/TCG cards, including prereleases

All Rush Duel cards

All Master Duel cards

All TCG Speed Duel Skill Cards

yaml-yugi's People

Contributors

dependabot[bot] avatar kevinlul avatar larry126 avatar web-flow avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

yaml-yugi's Issues

Add data for card set information

I quite love that this data set has information about what sets what cards appeared in regardless of locale. A lot of Yugioh tools I've found care only about the TCG, and usually only the English TCG at that. However, I've run into a big issue when trying to pin cards to certain sets, considering all possible locales:

  1. Sometimes, two sets from different locales that have the same name refers to the same set (i.e. Legacy of Darkness)
  2. Sometimes, two sets from different locales that have the same name refers to a different set (i.e. Pharaoh's Servant)
  3. Sometimes, two sets from different locales that have the same name refers to the same set, BUT the sets have different contents across locales (i.e. Legend of Blue-Eyes White Dragon)

Furthermore, there is some data I'd like regarding sets, such as when a set first came out in each locale. Have you considered making an additional dataset for card sets?

Forbidden & Limited Lists

Some insights/opinions

Koder asked me to check this repo out and provide a second opinion.

Disclaimer: I have no idea what the motivation, vision or long term plans for this project are so some of my points may be invalid.

General

Try to version your database schema so that you can do breaking changes without messing stuff for consumers up. Some random (probably flawed) idea :

  • Create a branch called "v1". Make it very clear that consumers are supposed to use that one and not main or whatever.
  • Keep the data of v1 updated.
  • When you want to make a breaking change to the schema: Branch of v1 to v2. Make your breaking changes.
  • Keep v2 updated and dont touch v1. v1 will become stale but at least it wont break anything.

Alternatively add a note that there is no guarantee for stability and to do automatic pulls on the users own risk.

You should also check out YGOPRODECKs API. Some aspects of it can be improved but I'm sure it can also be an inspiration.

Format

It's a bit odd that card data, ocg lists and tcg lists all have different structures. The lists should have the same format/structure at least so that the same code can be used to consume them.

For the lflists I would just provide a cards passcode with its legality. Additional information (like card type and name) can be fetched from the database.

I think json would be a better choice here instead of yaml. While human-writeability of yaml is better, the readability of both is pretty much the same imo. Plus, json parsers are more ubiquitous and even part of some languages standard library and they generally have a better parsing speed which is relevant for such huge datasets. (Disclaimer: I have a huge dislike against yaml.)

Card data

monster_type_line is cool for when you want to output it without having to manually format it (which can be a pain with the correct ordering I assume.) But it would be cool for searching/filtering features if the type would be an additional/separate field and extra properties an array of tags (like "Fusion", "Ritual", "Normal", "Effect", "Toon", "Flip", "Spirit" etc). If you decide to go with the array-of-tags route do the tags explicitly; While on a real card a Spirit tag implies that its an Effect monster, the database should still say that it is a Spirit and Effect monster. That makes searching easier since if you search for effect monsters you do not need to manually include Spirits, Toons, Flips etc.

In addition to the official password I would add an array with unofficial ids that are sorted ascendingly. This array should not include the official password imo. Thats YGOPRO specific but Bastion already lists those rn so I assume you are not against it.

Do not use unicode symbols in the database for the link markers. Just write them out (like "Left", "Top", "Bottom-Right") and make it the responsibility of the consumer to map them to whichever represantion they want.

Avoid mixing types; That will make the life of devs using languages like Java or C# hard. I noticed that this happens with cards that have "?" stats.

For cards with "?" I would use 0 as the stat and add a boolean which indicates that they have that stat as "?". It's a good compromise between not breaking arithmetic operations and giving the consumer a way to display "?" stats properly.

Release dates would be a super useful feature. Adding them to "sets" would be nice and an extra field for the initial tcg release would be super useful for people playing alt and retro formats. There are a couple issues though:

  • They are not synchronized for different regions. Just taking the lowest one found would in theory be an option but the Portuguese or Brazilians made this impossible since they released some products weeks(!) in advance and that date would be annoying for the majority of the users. Another option would be to make that information NA TCG centric. Retro and alt formats usually revolve around NAs TCG releases so that would be not an issue imo (I'm not an American btw)
  • Some products do not have a concrete release date. Like SJ promos. Yugipedia and Konami list no release dates for them. If you got them a couple days before the start of the month in which the SJ magazine debuted you were allowed to use them. For those cases it is alright to just use the first day of a month as the release date imo and I'm sure most people would agree.

Older versions of names (like Kinetic Soldier) would be cool but that may be hard to get depending on the data sources.

Transform Rush Duel cards

Target: https://github.com/DawnbrandBots/yaml-yugipedia/tree/master/wikitext/rush, pulled from https://yugipedia.com/wiki/Category:Rush_Duel_cards

Rush Duel cards use Template:CardTable2, so some transformation code can be reused (database_id, name, lore, sets, monster fields).

New properties to parse and transform:

  • misc (if it contains "Legend Card")
  • image + current_image (if the actual URL is also pulled, note that some have VG arts now)
  • requirement (all languages, e.g. ja_requirement, may be "None")
  • maximum_atk
  • summoning_condition (all languages, e.g. ja_summoning_condition)
  • continuous_effect

Additional notes:

  • Multi-choice effect has an empty requirement and an attempted reproduction of the layout shoved into the lore field
  • Should represent the [Effect] label in a more structured way than exists
  • Future task could be to maintain the references to Maximum L and R monsters

Implement override system

Extend the pipeline to merge in any additional repository data sources for:

  • Fake passwords assigned to cards without
  • Fake passwords assigned to alternative artworks
  • Decide whether to merge exceptional cases of alternate-password Dark Magician and Polymerization
  • Unofficial translations that may not make sense to be part of Yugipedia articles

Transform TCG Speed Duel Skill Cards

Target: https://github.com/DawnbrandBots/yaml-yugipedia/tree/master/wikitext/speed, pulled from https://yugipedia.com/wiki/Category:Skill_Cards

Skill Cards use Template:CardTable2, so some transformation code can be reused, mainly for name, front text box, sets, and types.

New properties to parse and transform:

  • character (technically may be inferred from types)
  • skill_activation (all languages, e.g. fr_skill_activation)
  • image (front scan)
  • image2 (back scan)
  • Bonus field: source_card ???

Additional notes:

  • Skills that behave as a different kind of card (Continuous Trap, Continuous Spell, Field Spell) are still structured like skill pages

Add more filed for Master Duel card data

Currently, the JSON data returned from this link includes the following fields: ["rarity", "attribute", "types", "atk", "lore", ...]. I understand that this data is retrieved from the API, such as this.

However, the Yugipedia web page offers additional information that is crucial for a comprehensive Master Duel card database. These include fields like:

Craftable: Whether the card can be crafted in the game.
Release Date (specific to Master Duel): When the card was released in the game.
Included Set (specific to Master Duel): Which set the card is part of in the game.
These fields can be extracted from the HTML content returned by a different API endpoint, such as this one. Unfortunately, the API response here is in HTML format, which requires additional parsing to extract the necessary fields.

At this time, I haven't found a more straightforward way to retrieve these specific fields from the API. Including these additional fields would significantly enhance the utility of the Master Duel cards database.

Error notification

Need some kind of alert across YAML Yugi and YAML Yugipedia when pipelines fail to run for a while.

Just turning on Actions webhooks for YAML Yugipedia is too noisy because often there is no delta. Need to distinguish between the no-delta case and the total failure. YAML Yugi could have Actions webhooks on because there are no spurious failures here, but this will not capture the failure mode when the merge workflow is totally not being called.

Split up and normalize monster_type_line

monster_type_line should be broken up into an array of tokens kept in the same order.

monster_type_line as it currently exists, obtained from Yugipedia, has the following limitations:

  • Old monsters with abilities (e.g. Spirit) are missing the modern "Effect" at the end;
  • Old Tuners lack Normal or Effect classification;
  • Old Normal monsters lack the modern "Normal" at the end;

This is because the string as it exists on Yugipedia reflects the latest available physical print of the card (confirmed with admin). Normalization involves obtaining data from the official database, which has the modern representation regardless of recent prints, or a heuristic (PowerShell snippet from @NeilBeforeMemes):

            $types = $content.monster_type_line.Split(" / ")

            if($types.length -eq 1) {
                $types += "Normal"
            }

            if(($types[$types.length - 1] -eq "Sprit") -or 
                ($types[$types.length - 1] -eq "Gemini") -or 
                ($types[$types.length - 1] -eq "Toon") -or 
                ($types[$types.length - 1] -eq "Union") -or 
                ($types[$types.length - 1] -eq "Flip")) {
                $types += "Effect"
            }

            if($types[$types.length - 1] -eq "Tuner") {
                if(($data.name -eq "Water Spirit") -or
                ($data.name -eq "Dragon Core Hexer") -or
                ($data.name -eq "Angel Trumpeter") -or
                ($data.name -eq "Tune Warrior")) {
                    $types += "Normal"
                }
                else {
                    $types += "Effect"
                }
            }

This would fix DawnbrandBots/bastion-bot#174. Inferring the OCG "Special Summon" type (DawnbrandBots/bastion-bot#134) could also be done as part of the data here. This would help native language support for advanced queries (DawnbrandBots/bastion-bot#343) as the translations of these card properties can be added in the load step.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.