Git Product home page Git Product logo

xivstats-gatherer-java's Introduction

XIV Stats Gatherer (Java)

Build Status codecov.io

XIVStats Gatherer Java is a multi threaded Java program with the purpose of fetching a set of character profiles from the Final Fantasy XIV Lodestone, and then parse the content of the character profile page before passing it into a database (MySQL or PostgreSQL).

To see an example of data gathered using this script see FFXIVCensus.com. The database generated by this program can be used in conjunction with the XIVStats PHP Script(s) to produce a web page displaying statistics for the data gathered.

Table of contents

Quick Start

Follow these steps to setup XIVStats-Gatherer-Java:

1. Database setup

  1. Setup your own MySQL or PostgreSQL server instance (if you have not already done so).
  2. Setup a database to store the program data in:
  CREATE DATABASE dbplayers;
  1. Create a user for the program to use to connect to the database.

Replace {password} with your choice of password, take a note of this for later.

GRANT ALL PRIVILEGES ON dbplayers.* TO `xivstats`@`localhost` IDENTIFIED BY '{password}';

2. Program Setup

  1. Install the latest version of either OpenJDK 8 JRE or Oracle JRE (if you have not already done so).
  2. Download the latest release from the releases page.
  3. Extract the zip to the directory you wish to install to.
  4. If this is your first time setting up the program - rename the file config_example.xml to config.xml.
  5. Open config.xml in editor of choice.
  6. Configure the options as follows
    • Set the url parameter to the URL of your SQL server instance, MySQL default: mysql://localhost:3306, PostgreSQL default: postgresql://localost:5407.
    • Set the database parameter to the database you configured earlier (dbplayers).
    • Set the username parameter to the username you configured earlier (xivstats).
    • Set the password parameter to the password you configured earlier.
    • Set the threads parameter to the number of threads you want the program to utilize (more threads = faster gatherer crawls). At present there is a safety limit of 64.
  7. Save and close config.xml.
  8. Using a shell (or CMD on windows) run the following command (replace {words in brackets} with integer parameters):
java -jar XIVStats-Gatherer-Java.jar -s {lowest character id to fetch}

The application can be run with the following command line options/args:

Short option Long option Argument type Description
-d --database String database name
-f --finish integer the character id to conclude character run at (inclusive)
-a --autostopfrom integer the lowest character id to allow auto-stop to happen
-g --autostopgap integer the number of continuous invalid characters to trigger auto-stopping
-h --help none display help message
-p --password String database user password
-s --start integer the character id to start from (inclusive)
-t --threads integer number of gatherer thrads to running
-u --user String database user
-U --url String the database URL of the database server to connect to

Note: On Linux/Unix it is advised to run the program in Tmux/Screen or similar.

Logging

Running the JAR will generate 2 log files, in a /.ffxivcensus/ folder in the user's home directory. There are 2 log files produced:

  • gatherer.log
    • A record of all debug logging generated during the gathering run
  • progress.log
    • A simple per-character result log to enable tracking of progress
    • Note: As characters are threaded, there is no guarantee the characters will be presented in this log in sequential order

Logs are currently overwritten with each run, so in the event you wish to save a log file, please re-name or copy the desired file for later review.

Bugs and feature requests

If you have discovered a bug, are having issues running the program (and Google hasn't been any immediate help), or you would like to request a feature open an issue over here.

Documentation

Javadoc documentation for the program can be found here.

Contributing

If you want to contribute to the XIVStats-Gatherer-Java project, please fork this repository, make the changes you want to make and commit them then open a pull request describing clearly what you are adding. All feature additions, bug fixes and other positive contributions are welcome.

All pull requests are subject to contributor review before passing, all build and test CI stages must also pass before a contribution can be merged.

Database

The database table tblplayers has the following structure:

Column Name Datatype Checks for Mount/Minion
id int N/A
name text N/A
realm text N/A
gender text N/A
grand_company text N/A
level_gladiator int N/A
level_pugilist int N/A
level_marauder int N/A
level_lancer int N/A
level_archer int N/A
level_rogue int N/A
level_conjurer int N/A
level_thaumaturge int N/A
level_arcanist int N/A
level_darkknight int N/A
level_machinist int N/A
level_astrologian int N/A
level_scholar int N/A
level_redmage int N/A
level_samurai int N/A
level_bluemage int N/A
level_gunbreaker int N/A
level_dancer int N/A
level_reaper int N/A
level_sage int N/A
level_carpenter int N/A
level_blacksmith int N/A
level_armorer int N/A
level_goldsmith int N/A
level_leatherworker int N/A
level_weaver int N/A
level_alchemist int N/A
level_culinarian int N/A
level_miner int N/A
level_botanist int N/A
level_fisher int N/A
level_bozja int N/A
level_eureka int N/A
p30days bit Minion - Wind-up Cursor
p60days bit Minion - Black Chocobo Chick
p90days bit Minion - Beady Eye
p180days bit Minion - Minion Of Light
p270days bit Minion - Wind-up Leader
p360days bit Minion - Wind-up Odin
p450days bit Minion - Wind-up Goblin
p630days bit Minion - Wind-up Nanamo
p960days bit Minion - Wind-up Firion
prearr bit Minion - Cait Sith Doll
prehw bit Minion - Chocobo Chick Courier
presb bit Minion - Wind-up Red Mage
preshb bit Minion - Baby Gremlin
arrartbook bit Minion - Model Enterprise
sbartbook bit Minion - Wind-up Yotsuyu
sbartbooktwo bit Minion - Dress-up Tataru
hwartbookone bit Minion - Wind-Up Relm
hwartbooktw bit Minion - Wind-Up Hraesvelgr
hasencyclopedia bit Minion - Namingway
beforemeteor bit Minion - Wind-up Dalamud
beforethefall bit Minion - Set Of Primogs
soundtrack bit Minion - Wind-up Bahamut
saweternalbond bit Minion - Demon Box
sightseeing bit Minion - Fledgling Apkallu
arr_25_complete bit Minion - Midgardsormr
comm50 bit Minion - Princely Hatchling
moogleplush bit Minion - Wind-up Delivery Moogle
topazcarubuncleplush bit Minion - Heliodor Carbuncle
emeraldcarbuncleplush bit Minion - Peridot Carbuncle
hildibrand bit Minion - Wind-up Gentleman
ps4collectors bit Minion - Wind-up Moogle
dideternalbond bit Mount - Ceremony Chocobo
arrcollector bit Mount - Coeurl
kobold bit Mount - Bomb Palanquin
sahagin bit Mount - Cavalry Elbst
amaljaa bit Mount - Cavalry Drake
sylph bit Mount - Laurel Goobbue
moogle bit Mount - Cloud Mallow
vanuvanu bit Mount - Sanuwa
vath bit Mount - Kongamato
hw_complete bit Mount - Midgardsormr
hw_31_complete bit Minion - Wind-up Haurchefant
hw_33_complete bit Minion - Wind-up Aymeric
legacy_player bit Mount - Legacy Chocobo
mounts text N/A
minions text N/A
date_active date N/A
is_active bit N/A
character_status varchar N/A

Performance Statistics

This section provides some insight into the performance of the application when performing censuses for ffxivcensus.com

How many IDs are there?

  • As of February 2019, the highest ID is between 24 and 25 million. However, almost 10 million character IDs have been deleted.

How long does a complete census take?

  • A complete census currently takes around 14 days

What hardware is the live census using?

  • The live census is running on a CentOS 7 VM with 2 CPU cores from an i5-750, 2GB of RAM and a 100mbit network connection.

What is the CPU, memory and network usage like?

  • The census uses 30-60% of the two CPU cores
  • The census will happily run with a 1GB heap size allocated
  • Network usage is fairly minimal, with about 4mbit download and 300kbit upload with 64 threads configured

XIVStats-Gatherer-Ruby

XIVStats-Gatherer-Java begun by providing the same functionality as the original ruby-based XIVStats-Gatherer-Ruby, but is written in Java to make use of Multi-threading capabilities that could not be harnessed in Ruby. This allows XIVStats-Gatherer-Java to perform large crawl operations in a much shorter period of time, utilizing only one application instance. XIVStats-Gatherer-Java also brings with it the benefit of being able to use a full SQL setup as opposed to a sqlite file, giving the advantage of being able to perform asynchronous database transactions.

The Ruby implementation is now out-of-date and no-longer maintained.

Creators

Peter Reid (Project Maintainer)

Jonathan Price (XIVStats and XIVStats-Gatherer-Ruby)

Copyright and license

Code and documentation copyright 2015-2018 Jonathan Price & Peter Reid, Code and documentation released under the BSD 2-Clause "Simplified" License.

xivstats-gatherer-java's People

Contributors

crakila avatar jonathan-martin-ujf avatar matthewhillier avatar matthewhillier-cp avatar pricetx avatar reidweb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

xivstats-gatherer-java's Issues

Verbosity switch

A way to make it only notify if a character cannot be added to the database would be useful for me (but, it's just a request, no worries :D)

Statistics about the software

Hey guys,

do you have some more recent statistics on how long does it take to run for the entire base?
How many characters are you gathering per seconds/minutes?
What are the hardware you are using to run it?
Bandwidth consumption?
CPU, MEM usage?

I'm curious on the performance of the current implementation, also, would be very good to provide this information in the README, so other people can have a notion of what they can expect for the current version, especially a more recent "high id", so it won't run forever, and also give some information so others can help you improve it :)

Anyway, good work doing it, it's very interesting.

Did SE really not block you with 64 threads?

I decided to run this collector myself because it looks like ffxivcensus.com data is pretty old, have it running on an EC2 instance in japan and it's chugging along fine but I'm really just curious, did SE really not block your IP??

This doesn't seem to care about any sort of rate limiting

JDBC error?

java -jar XIVStats-Gatherer-Java.jar 1 13500000

Connection failed! Please see output console
java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost:3306/dbplayers
at java.sql.DriverManager.getConnection(DriverManager.java:596)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
at com.ffxivcensus.gatherer.GathererController.openConnection(GathererController.java:234)
at com.ffxivcensus.gatherer.GathererController.main(GathererController.java:81)
java.lang.NullPointerException
at com.ffxivcensus.gatherer.GathererController.main(GathererController.java:83)

I have JDBC connector installed. Running Ubuntu on a VPS.

Lodestone layout has changed

The layout of the lodestone has changed, unit tests are now failing indicating gatherer can no longer correctly parse page!

Resolution of this issue is a high priority.

Thanks go to @fahy for notifying me of the possible impact of this!

Feature: Oceania DC

Out of curiosity, when the Oceania Servers and DC get released, would this automatically be accounted for or would be a feature item?

Some fields not populating

I was beginning to debug the active player count on ffxivcensus.com by running the latest version of the java gatherer in a test environment.

When running the gatherer on my character (8308898) there appears to be a number of empty fields. I have attached a CSV showing an excerpt (please ignore the formatting with the mounts, limitation of csv).

The following fields all appear to be blank:

  • p30days
  • p60days
  • p90days
  • p180days
  • p270days
  • p360days
  • p450days
  • p630days
  • p960days
  • prearr
  • prehw
  • presb
  • arrartbook
  • hwartbookone
  • hwartbooktwo
  • hasencyclopedia
  • beforemeteor
  • beforethefall
  • ps4collectors
  • soundtrack
    -saweternalbond
  • sightseeing
  • comm50
  • moogleplush
  • topazcarbuncleplush
  • emeraldcarbuncleplush
  • hildibrand
  • dideternalbond
  • arrcollector
  • kobold
  • sahagin
  • amaljaa
  • sylph
  • moogle
  • vanuvanu
  • vath
  • arr_25_complete
  • hw_complete
  • hw_31_complete
  • hw_33_complete
  • sb_complete
  • legacy_player

Some of these fields may intentionally be blank as my character doesn't meet all of the requirements, however, a number of them should be populated.

The arguments used when calling the gatherer:
/usr/bin/java -Xmx1024M -jar XIVStats-Gatherer-Java-v2.0.0.jar -s 8308898 -f 8308898

The version of the gatherer is the most recent build on ci.reidweb.com. This is build 78 and is built from commit 4f9f2c8, which at the time of writing is the latest commit to master.

I have tried re-creating the MySQL database in case that was somehow an issue, but this did not affect the result.

It does not look like this is the issue currently facing the live environment as it has data for the soundtracks, collectors editions etc, so I will continue investigating that.

8308898.xlsx

Improve Mount & Minion Handling for New Characters

When parsing a new character, the Mounts & Minions page have no content, so the there's a silent error caught in the logs. It would be better to detect and mitigate for this rather than throw an exception.

[2020-02-26 11:24:59,503] [pool-1-thread-29] DEBUG com.ffxivcensus.gatherer.task.GathererTask - Starting evaluation of player ID: 19066937
[2020-02-26 11:24:59,753] [pool-1-thread-3] ERROR com.ffxivcensus.gatherer.task.GathererTask - Index: 0, Size: 0
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0

Found during testing of #57

Lodestone 5.2 Updates

It looks like the lodestone has again separated out some more content into different static pages, so we'll need to update the class & job level parsing to cope with this.

Add Blue-Mage to gatherer

With the release of 4.5, we'll have BLU available as a new job. We'll need to add stats for BLU and update the lodestone gathering when it's updated.

Increase code coverage

Unit tests need to be implemented for the 'Console' class before final release of v1.0.

Lodestone change to Mounts & Minions

The latest lodestaone change moved Mounts & Minions into separate pages, which means that we have a lot of missing stats and our active counts are screwed.

Gatherer needs to be updated to handle the new layout.

Option to output log of characters during scan

Currently, the application outputs the results of what it's doing for each character to STDOUT. It would be nice if in addition to this I could have it output to a file so that I could then make some sort of progress bar in a webpage.

I can probably split STDOUT into multiple locations in the shell, but having a log file option just seems like it would be generally useful.

Performance optimizations, speed and MORE speed

Wanted to set up a new site with the support of new jobs (recently introduced). All went well, but finding the highest ID was a bit tedious. I had to hand edit the number and play around for an hour. And still, I am not sure if I have the highest valid ID at all.

Would it be possible to add a way to the Java collector to find the highest ID?
Thank you.

Example, why it would be useful:

My script is at: 1,803,264
The fresh stat page (mine) says: 242,560 players.
According to my hand guess, the highest ID is: 19,096,969
Meanwhile, the "official" site lists 10 million players, So at this rate, my script is really off.

Provided Table Structure is Incorrect

The structure for tblplayers listed on the main project page is out of date with the most recent SQLdump (2018-01).

It is now as follows:

columns = [
           'id',
           'name',
           'realm',
           'race',
           'gender',
           'grand_company',
           'free_company',
           'level_gladiator',
           'level_pugilist',
           'level_marauder',
           'level_lancer',
           'level_archer',
           'level_rogue',
           'level_conjurer',
           'level_thaumaturge',
           'level_arcanist',
           'level_darkknight',
           'level_machinist',
           'level_astrologian',
           'level_scholar',
           'level_redmage',
           'level_samurai',
           'level_carpenter',
           'level_blacksmith',
           'level_armorer',
           'level_goldsmith',
           'level_leatherworker',
           'level_weaver',
           'level_alchemist',
           'level_culinarian',
           'level_miner',
           'level_botanist',
           'level_fisher',
           'p30days',
           'p60days',
           'p90days',
           'p180days',
           'p270days',
           'p360days',
           'p450days',
           'p630days',
           'p960days',
           'prearr',
           'prehw',
           'presb',
           'arrartbook',
           'hwartbookone',
           'hwartbooktwo',
           'hasencyclopedia',
           'beforemeteor',
           'beforethefall',
           'ps4collectors',
           'soundtrack',
           'saweternalbond',
           'sightseeing',
           'comm50',
           'moogleplush',
           'topazcarbuncleplush',
           'emeraldcarbuncleplush',
           'hildibrand',
           'dideternalbond',
           'arrcollector',
           'kobold',
           'sahagin',
           'amaljaa',
           'sylph',
           'moogle',
           'vanuvanu',
           'vath',
           'arr_25_complete',
           'hw_complete',
           'hw_31_complete',
           'hw_33_complete',
           'legacy_player',
           'mounts',
           'minions'
          ]

All of these dumps seem to be missing the two most interesting features: date_active and is_active.

I'm doing a pet project using topic modeling of Lodestone forum post titles against various data points, namely active subscribers. I could key these counts manually from the ffxivcensus site, but where's the fun in that? ๐Ÿ˜„๐Ÿ‘๐Ÿผ

Add Eureka Elemental Level Stats

So looking at LuckyBancho's stats that they recently released here: https://www.reddit.com/r/ffxiv/comments/apfx16/luckybancho_unofficial_census_10_february_2019/

It seems that they are tracking for Elemental Levels (it's under the Class/Job tab on a character's page)
As of Patch 4.55 (Released today 12th Feb 2019), there are 4 "Eureka" instances, with there being Level caps for each zone. Now it is possible to level past 20 in Anemos but I figured that the figures could be displayed in the next (March report) once players have got to max level in Hydatos.

  • Eureka Anemos - 1 - 20
  • Eureka Pagos - 21 - 35
  • Eureka Pyros - 36 - 50
  • Eureka Hydatos - 51 - 60

My idea would be that the result would be:

  • "Total Players who have played Eureka": (eg. Everyone that has a Elemental Level)
  • "Players who are levelling in Eureka: Hydatos" (eg. Anyone that is between 50 and 59)

Add Samurai & Red Mage to gatherer

Add Red Mage and Samurai to parse list.

Verify that all other jobs are being correctly evaluated - I cannot remember off the top of my head if they are index based or HTML text based on how it determines which is which.

Very high number of characters marked as DELETED

We have just completed a new census run for December 2021. The stats show a remarkably low number of characters. On further inspection, the number of DELETED characters seems extremely high:

SELECT COUNT(*) FROM tblplayers;
34,329,901

SELECT COUNT(*) FROM tblplayers where character_status = "DELETED";
28,142,672

The result is that the census reckons there are only 6,187,229 characters in existance.

We should probably investigate and try to find some examples of characters the database reckons are deleted but actually aren't to enable us to troubleshoot further.

Run terminates early around 1.5 million ID mark

For the past month or so the run has been terminating early, at around ID 1.5 million. It was previously thought that it was an environmental issue. However, further investigation has uncovered an issue with a system designed to detect the end of the player list.

There is a system in place which detects when 200 consecutive IDs do not have characters associated with them. When this condition is detected, the run is terminated. However, this is experienced both at around 1.5 million and at around 1.6 million in the ID range, example shown below:

[2019-02-11 18:50:42,571] [pool-1-thread-16] INFO com.ffxivcensus.gatherer.player.PlayerBuilder - Character 1606804 does not exist. (404)
[2019-02-11 18:50:42,577] [pool-2-thread-1] DEBUG com.ffxivcensus.gatherer.task.LevemeteTask - Loading new jobs into the gatherer
[2019-02-11 18:50:42,610] [pool-2-thread-1] DEBUG com.ffxivcensus.gatherer.task.GatheringLimiterTask - GATHERING CAPPING: Checking whether the gathering should stop...
[2019-02-11 18:50:42,613] [pool-2-thread-1] INFO com.ffxivcensus.gatherer.task.GatheringLimiterTask - GATHERING CAPPING: FINISHING - No valid characters found for at least 200 ID's after Character #1606512

for safety, I am going to raise this number to 5,000, as this still only represents a few minutes of scanning beyond the last character but it much less likely to accidentally trigger like it has done here.

Characters on a PvP Team set free_company and grand_company field to 'none'

As the main story has you choose a Grand Company, I found it odd that there were active characters (with the Endwalker MSQ minion) with no Grand Company. Digging a bit deeper, I found that characters in the database with that minion and their grand_company field set to 'none' had one thing in common; having a PvP team listed on their Lodestone page. Admittedly, I only checked a couple IDs, but after a dozen or so I assumed the problem extends to all 17,145.

I think the issue stems from characters on PvP teams having an additional character-block field on their Lodestone page, causing both the Free Company and Grand Company functions in the PlayerBuilder class to fall through when the number of elements is 6, rather than 3-5. I also assume the case of having 5 elements could refer to a character with a PvP team but no Free Company and cause an issue somewhere. I don't think a character can have a PvP team with no Grand Company, but I'm not 100% certain on that, as I don't know exactly when PvP is unlocked.

If you'd like the full list of IDs, the query I ran was:
SELECT id FROM tblplayers WHERE grand_company = 'none' AND minions like '%Wind-up Herois%';

For simplicity, here's the first 5 results so you can check their pages:
5313
6855
6913
8283
9307

Better Handling for Command-Line Parameter Errors

With the switch to Spring Boot, the Command-Line error handling is a lot less obvious and can get hidden in Spring's startup sequence.

Need to investigate how to pull the CLI parsing forward in the startup sequence and make sure the actual errors are more obvious.

Update documentation

README.md needs to be updated in line with changes to console interface implemented recently.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.