View previous topic :: View next topic |
Author |
Message |
J. S. Mill Maniac Member
Joined: 28 Apr 2003 Location: New York, New York |
0. Posted: Sun Jun 04, 2006 4:53 pm Post subject: ITGmetrics |
|
|
This is a long post with a very complicated purpose. I am going to motivate it by telling everyone a story. About thirty years ago, a fellow named Bill James figured out that nobody knew what the hell actually happened in the game of baseball. A bunch of players gave it their all, played 110% and other such cliche nonsense, but nobody could predict the winners with any level of accuracy. Heck, nobody even knew what mattered for winning. He introduced a theory of scientific and mathematical analysis, which he called sabermetrics, which (unless you're afraid of math) revolutionized the game. By developing scientific hypothesis and testing them through a strategy he called risky back-prediction, he discovered that previously unappreciated skills were vital for winning, and that you could win with a low budget if you understood these key aspects of baseball.
I work in baseball sabermetrics, and my job is to develop and apply many of these statistics. Recently, I was sitting at home, reading an ITGFreak topic about which ITG players were better then others. Although some players had good GS scores, others were better because they played 110%, gave it their all, were clutch under pressure, etc etc. Suddenly, a thought struck me. Nobody really has any idea what they're talking about, do they?
Despite about five years of playing, I realized I couldn't answer simple questions, like "What are five middle-range ten footers for score, emphazing diversity of charts over parity?" "What chance does JJK have of beating DukAmok in a fair match?" "What songs should Mike Quintance practice in order to get better fastest?" I could guess, and my hunches were pretty educated (as I'd later find out), but I had no scientific method of checking any of this information. I had hundreds of pages of song statistics, meticulously kept by Groovestats, and advanced tournament data of my own devising, but I was still clueless. Maybe In the Groove was too casual to know the answers to these questions?
I decided to find out. Over the last few weeks, I've applied some of my sabermetric knowledge to In the Groove. First, I developed a formula for integrated difficulty, the difficulty of a song to score on paper. I learned very quickly that this number was relatively meaningless. Variance of scores, the ability to remain consistent, was most important for tournament performance, and so I developed a form of ordinal and composite tournament difficulty.
I used these statistics to derive two different statistics for advantages. One of these I called Paper Advantage (PA), and I defined it as (Normalized Best - mean Normalized Best)(Integrated Difficulty). This was fairly useless. However, Tournament Advantage (TA), which I defined as (Tournament Difficulty)(Variance - mean Variance) seemed more promising. I made both ordinal and composite models of tournament advantage (called oTA and cTA)
Taking a chapter from the discipline called APBRmetrics (sabermetrics for basketball), I developed two statistics:
Quote: | Efficiency: cTA - mcTA |
Quote: | Match Efficiency: {[X(oTA)-Y(opponent oTA)+R(cTA)-R(opponent cTA)]/100} + 50%, where X is the number of your picks in the match, Y is the number of their picks in the match, and R is the number of random picks in the match |
Then, I backchecked them, using the same techniques I have been trained to use for sabermetrics. Backchecking using match efficiency, I was able to sucessfully predict the winner of a tournament match about 96% of the time. Backchecking using Efficiency, normalizing for era and probability, I was able to sucessfully predict the exact placing of 95% of tournament entreants.
I see three applications for this work:
-Tournament Design: select qualifiers with mathematically assured parity, be able to discern exactly what effects rule changes will have on all players in an event
-Player Ranking and Prediction: rank players in an ordinal tournament system which enables you to not only mathematically determine who is better then whom, but also by how much and what the difference really means; predict the placing in tournaments with far greater transparency and reliability
-Practice and Training: isolate the precise strength and weaknesses of your own skills, and then design the optimal program of songs to increase your tournament advantage and match efficiency; design the practice repertoire which makes you the most better, the fastest
This work generalizes to DDR, and I have tested some of it through backchecking on DDR tournaments. It does require some manipulation to use, but, as a for instance, I used it to select "four mid-range ten footers as tournament qualifiers, emphasizing chart-diversity over parity, and including at least one song from each game (target audience: intermediate-skilled tournament competitiors)" and I got this list (from easiest to hardest):
Quote: | Kagami, Cryosleep, Holy Guacamole, Twilight |
I don't think I could have come up with four songs nearly that good; as soon as I saw the list I liked it but I would never have been able to come up with that.
If anyone finds any of this topic interesting, please let me know. I am going to use some of these formula's to predict a number of events in the next few months, and in order to derive practical recommendations for someone. I'm pretty sure I'm the only person to have done something like this, but if anyone has any constructive inquiry, I'm in need of it. Serious comments only please (no "omfg u are teh fg u need teh life for making stats about itg lololol." We know. Go back to high school). _________________
|
|
Back to top |
|
|
Karl Popper Trick Member
Joined: 03 Mar 2005
|
1. Posted: Sun Jun 04, 2006 5:03 pm Post subject: |
|
|
Wow, 96 percent accuracy is amazing. And that example qualifier list is perfect. Great job. But how well does the accuracy hold up in predicting matches that are actually pretty close, such as in the finals and semifinals of tournaments? I'm assuming that number includes really obvious matches, like LilMegamanXAmok vs. random noob in the first round of a tourney. |
|
Back to top |
|
|
MinN_Limited Trick Member
Joined: 28 Aug 2005 Location: Cambridge, Ontario, Canada |
2. Posted: Sun Jun 04, 2006 5:52 pm Post subject: |
|
|
Very impressive. I honestly don't understand the formula but it's quite intriguing, good work. The results you got were also pretty amazing. How long did it take you to work all this out? |
|
Back to top |
|
|
Emptyeye Trick Member
Joined: 28 Jan 2006 Location: Waterbury, CT |
3. Posted: Sun Jun 04, 2006 6:21 pm Post subject: |
|
|
Oh **** YES this is interesting.
I'm definitely intrigued by this, and like MinN_Limited, I'd love to know where these statistics are derived from. In particular, I'm curious as to how Tournament Difficulty varies from Integrated Difficulty.
I presume that these formulas, with some manipulation, could be used in other settings, for instance "What are four DDR Standard song qualifiers, emphasizing chart-diversity over parity, and coming from either DDRMAX2 PS2 or DDREX2 PS2 (target audience: casual players whose skill limit is generally passing 9-footers)?"
As I see it, one of my main questions here is "does this have, or can it be modified to have, any practical application outside of tournament situations?" I realize that I'm in a distinct minority on this thinking, but my feeling is that the true "best" arrowsmasher should be adept at a variety of skillsets that tournaments don't (generally) presently test: The popular "tournament" format, yes, but also Marathons, Survivals, and within reason (Defining "within reason" is another problem I admittedly have yet to solves) random mods of combinations of mods on a song. You mention "practice and training" as a potential use for this, but maybe I want to, say, improve my mod reading, or become more proficient at marathons, or just improve my scores in general but without tournaments specifically in mind, as opposed to training for tournaments. Can this be used in that regard?
Either way, this is definitely some interesting stuff and I'm curious to read more. _________________
|
|
Back to top |
|
|
IHYD.dimo Trick Member
Joined: 08 Jul 2005
|
4. Posted: Sun Jun 04, 2006 8:55 pm Post subject: |
|
|
great stuff, but it's only applicable outside of socal seeing how i win 100% of the time. _________________
|
|
Back to top |
|
|
Synaesthesia Trick Member
Joined: 03 Apr 2005 Location: Crushing all deceivers, smashing non-believers |
5. Posted: Sun Jun 04, 2006 8:58 pm Post subject: |
|
|
How well is this system able to account for variability (particularly with the pad) of scenarios? I mean, how well can it factor in possibilities like, say for instance I'm playing someone who is somehow worse than I am, but you know I have allergies and so the probability of me sneezing and breaking focus could play a role in whether or not I lose? Also, does Groovestats track progress of scores entered? That could be useful in seeing how much someone's best score improves over a given interval, but I still think that wouldn't tell the entire story. I'm thinking of my DDRecall, which has the highest score I achieved on particular songs, but doesn't give you the real idea of how consistent I am. For instance, sure I can get 9 on Stoic, but an on-the-spot performance would more likely be 15-24. It seems to me you'd want to have data from every time I played a song so you would get less of a high-end skew of my scores and a better idea of what I consistently manage. _________________
im a lasagna whale
G_G |
|
Back to top |
|
|
IHYD.DukAmok Trick Member
Joined: 10 Dec 2003 Location: Corona, CA |
6. Posted: Sun Jun 04, 2006 9:16 pm Post subject: |
|
|
my interest is piqued
tell me more cory _________________
Sappy_!?! wrote: | just to answer, if someone who stands next to you watching you play PSMO but you get a D on it, versus somebody who understands perfect attacking and stuff, will think you suck. A player is considered good in my opinion when a player of a higher level comments about you or see's you triple A a song. Or if somebody looks up to you. Hope it clarifies. |
|
|
Back to top |
|
|
J. S. Mill Maniac Member
Joined: 28 Apr 2003 Location: New York, New York |
7. Posted: Mon Jun 05, 2006 5:17 pm Post subject: |
|
|
Jacques Derrida wrote: | But how well does the accuracy hold up in predicting matches that are actually pretty close, such as in the finals and semifinals of tournaments? I'm assuming that number includes really obvious matches, like LilMegamanXAmok vs. random noob in the first round of a tourney. |
In close matches, I'm still able to make predictions over the second stanard deviation in accuracy; it's just that the predictions are less meaningful (JT beats Cyrus 52.4% is less impressive then Mike beats Jimmy 99.9987%)
MinN_Limited wrote: | Very impressive. I honestly don't understand the formula but it's quite intriguing, good work. The results you got were also pretty amazing. How long did it take you to work all this out? |
It's been a steady project over the last couple weeks, but I've only touched the tip of the iceberg.
Emptyeye wrote: | In particular, I'm curious as to how Tournament Difficulty varies from Integrated Difficulty. |
Integrated Difficulty is a measure of high score. By using an integration formula and the raw statistics of groovestats for data, I generate the integrated difficulty of a song.
Tournament Difficulty is a measure of high score and range. I take a representative sample size of a song being played at a tournament, combined with the range (variance) from the players then high score, differentiate it and multiply it with a weighting factor by Integrated Difficulty.
Integrated Difficulty has two main functions: calculating Paper Advantage, and calculating the raw difficulty of songs.
Tournament Difficulty has two main funcitons: calculating Tournament Advantage, and calculating the tournament difficulty of songs.
Emptyeye wrote: | I presume that these formulas, with some manipulation, could be used in other settings, for instance "What are four DDR Standard song qualifiers, emphasizing chart-diversity over parity, and coming from either DDRMAX2 PS2 or DDREX2 PS2 (target audience: casual players whose skill limit is generally passing 9-footers)?" |
They would, although I would need the old NNR records for some songs.
Emptyeye wrote: | I realize that I'm in a distinct minority on this thinking, but my feeling is that the true "best" arrowsmasher should be adept at a variety of skillsets that tournaments don't (generally) presently test: The popular "tournament" format, yes, but also Marathons, Survivals, and within reason (Defining "within reason" is another problem I admittedly have yet to solves) random mods of combinations of mods on a song. |
I disagree. If tournament play under normal rules is 99% of the community (and it is), then 99% of player skill is determined by it. The best chessplayer in the world doesn't have to also demonstrate that he is the best bullet player, speed player, bughouse player, etc etc...
Synaesthesia wrote: | How well is this system able to account for variability (particularly with the pad) of scenarios? I mean, how well can it factor in possibilities like, say for instance I'm playing someone who is somehow worse than I am, but you know I have allergies and so the probability of me sneezing and breaking focus could play a role in whether or not I lose? |
It would account for this, it would factor into your tournament range.
Synaesthesia wrote: | I'm thinking of my DDRecall, which has the highest score I achieved on particular songs, but doesn't give you the real idea of how consistent I am. For instance, sure I can get 9 on Stoic, but an on-the-spot performance would more likely be 15-24. It seems to me you'd want to have data from every time I played a song so you would get less of a high-end skew of my scores and a better idea of what I consistently manage. |
To run the statistics, I need a representative sample size of their performance in tournaments, which I crossrate with their then-bests to derive a range factor. _________________
Last edited by J. S. Mill on Sun Jun 11, 2006 9:51 pm, edited 6 times in total |
|
Back to top |
|
|
Tyrgannus Trick Member
Joined: 19 Oct 2005 Location: Not about to tell |
8. Posted: Mon Jun 05, 2006 9:25 pm Post subject: |
|
|
This is more complicated than the stamina formula I made for songs.
(And no, I don't just count tap steps)
My stamina calculator is impossible to calculate to the calorie because people differ in size and shape and they also differ in the movement of a certain song, but it is accurate relative to a person.
In other words, if my formula says that Delirium takes 1029 Dance energy (again, you can't use units like the calorie) than if the guy uses the same style (flatoot, arms on bar, whatever) than Anubis would cost around 734 Dance energy points.
The formula takes into account jumps, hands, footspeed, candles, and length of streams. While it's not perfect, It is fairly accurate.
Sorry, of on a tangent. The whole point of that was to say ITGmetrics are much more complicated and very interesting (though I find my formula interesting as well) _________________
AA Bob wrote: | Summer is as much of a 12 as PSMO is a 9. |
|
|
Back to top |
|
|
mydixiewrecked Trick Member
Joined: 29 Mar 2005
|
9. Posted: Mon Jun 05, 2006 9:33 pm Post subject: |
|
|
Tyrgannus wrote: | In other words, if my formula says that Delirium takes 1029 Dance energy (again, you can't use units like the calorie) than if the guy uses the same style (flatoot, arms on bar, whatever) than Anubis would cost around 734 Dance energy points. |
What exactly does a "Dance energy point" represent? |
|
Back to top |
|
|
Tyrgannus Trick Member
Joined: 19 Oct 2005 Location: Not about to tell |
10. Posted: Mon Jun 05, 2006 10:04 pm Post subject: |
|
|
A dance energy point represents a measurement of exertion needed. With my system, the dance energy points for a song should remain the same for everyone even though 1000 may be big for some people while 2000 may not be too bad for another. It's all proportional and that proportionality is based on your stamina and ability to conserve energy.
In other words, if Delirium requires 1029 Dance Energy Points (DEP), then I might find that inordinately tiring while someone else wouldn't notice the drain at all.
Did that answer your question? _________________
AA Bob wrote: | Summer is as much of a 12 as PSMO is a 9. |
|
|
Back to top |
|
|
mydixiewrecked Trick Member
Joined: 29 Mar 2005
|
11. Posted: Mon Jun 05, 2006 11:02 pm Post subject: |
|
|
Tyrgannus wrote: | Did that answer your question? |
No, it didn't. Units of measure must have a specific definition or they are absolutely meaningless. A calorie, for example, is the amount of energy it takes to cause the temperature of 1 gram of water to increase by one degree celsius. That's specific. Your definition, "a measurement of exertion needed", isn't nearly specific enough.
EDIT: I'm totally de-railing this thread. Sorry. |
|
Back to top |
|
|
J. S. Mill Maniac Member
Joined: 28 Apr 2003 Location: New York, New York |
12. Posted: Tue Jun 06, 2006 12:33 am Post subject: |
|
|
Baggage wrote: | EDIT: I'm totally de-railing this thread. Sorry. |
I don't think so, specific discussions of technical concepts for energy are very much in line with the spirit of this discussion. _________________
|
|
Back to top |
|
|
HipHopNotik. Trick Member
Joined: 08 Apr 2004
|
13. Posted: Wed Jun 07, 2006 4:07 pm Post subject: |
|
|
If you are using GS, how important is it for everyone who is included to keep their GS accounts updated? (it sounds like a stupid question but I ask this for a reason). _________________
|
|
Back to top |
|
|
IHYD.Blake Vivid Member
Joined: 14 Aug 2004 Location: Solar City, California |
14. Posted: Wed Jun 07, 2006 7:29 pm Post subject: |
|
|
And how important is it to not have fake scores on it due to corrected pad excellents turning into 100%s? _________________
|
|
Back to top |
|
|
J. S. Mill Maniac Member
Joined: 28 Apr 2003 Location: New York, New York |
15. Posted: Mon Jun 12, 2006 11:47 am Post subject: |
|
|
HipHopNotik. wrote: | If you are using GS, how important is it for everyone who is included to keep their GS accounts updated? (it sounds like a stupid question but I ask this for a reason). |
It's very important for making training recommendations, in general since I use a representative sample size weighted for momentum it isn't so critical.
Jesse Katsopolis wrote: | And how important is it to not have fake scores on it due to corrected pad excellents turning into 100%s? |
As above. _________________
Last edited by J. S. Mill on Tue Jun 13, 2006 3:32 pm, edited 1 time in total |
|
Back to top |
|
|
Emptyeye Trick Member
Joined: 28 Jan 2006 Location: Waterbury, CT |
16. Posted: Tue Jun 13, 2006 2:04 pm Post subject: |
|
|
You had edited a post between the time I looked at it and the time I looked at it again, so apologies for the lateness of this response.
Søren Kierkegaard wrote: |
Integrated Difficulty has two main functions: calculating Paper Advantage, and calculating the raw difficulty of songs.
Tournament Difficulty has two main funcitons: calculating Tournament Advantage, and calculating the tournament difficulty of songs. |
So if I'm understanding correctly, and I may be oversimplifying here, but Integrated Difficulty is a measure of how difficult a song is to score high on, whereas Tournament Difficulty is a measure of how difficult a song is to score consistently on. As a hypothetical example, Vertex^2 would almost undoubtedly rank highest in terms of Integrated Difficulty, but may or may not (Actually, not having access to the statistics, I would venture a guess that it ranks pretty highly here too, though again, the example is more hypothetical than anything) be at the top of the Tournament Difficulty list, is that right?
Søren Kierkegaard wrote: | I disagree. If tournament play under normal rules is 99% of the community (and it is), then 99% of player skill is determined by it. The best chessplayer in the world doesn't have to also demonstrate that he is the best bullet player, speed player, bughouse player, etc etc... |
Fair enough, and that analogy actually makes a lot of sense. Doesn't mean I necessarily agree with it--I'll just stubbornly stick to my minority position with no real rhyme or reason --but I see where it comes from. I still find the work you're doing to be fascinating in itself, even if its application to me personally is limited.
Hopefully you'll indulge me by elaborating on something you said in another topic, since I think it fits in with this discussion. You mentioned that:
Søren Kierkegaard wrote: | Consistency outside of a tournament setting is absolutely worthless. |
I see this as having one of two meanings and was hoping you could clarify which (If not both) you meant. The first is that it's worthless because, effectively, anything outside of a tournament setting doesn't matter for determining who the "best" player is. This seems consistent with what you said to me earlier, and explains why tournaments are the main focus of your research. The second is that it's worthless because it doesn't actually carry over into a tournament setting. This I would have a harder time believing. Presuming there aren't any extraordinary circumstances--for instance, a player's "consistent scores" on whatever being on the home version and thus nigh worthless for tournaments on an arcade machine--I would think the consistency would carry over at least somewhat. Of course, I'm not the one with the statistics, so maybe I'm way off base here. _________________
|
|
Back to top |
|
|
J. S. Mill Maniac Member
Joined: 28 Apr 2003 Location: New York, New York |
17. Posted: Wed Jun 14, 2006 7:13 am Post subject: |
|
|
Emptyeye wrote: | So if I'm understanding correctly, and I may be oversimplifying here, but Integrated Difficulty is a measure of how difficult a song is to score high on, whereas Tournament Difficulty is a measure of how difficult a song is to score consistently on. As a hypothetical example, Vertex^2 would almost undoubtedly rank highest in terms of Integrated Difficulty, but may or may not (Actually, not having access to the statistics, I would venture a guess that it ranks pretty highly here too, though again, the example is more hypothetical than anything) be at the top of the Tournament Difficulty list, is that right? |
Well, you have the essence of the distinction correct. But the specifics might be a bit off. A good example would be to compare Disconnected Hyper with Visible Noise. Hyper has a higher Integrated Difficulty (it's just a harder song), but Visible Noise has a higher Tournament Difficulty (since it factors in the difficulty of staying consistent on the song). V2 is unique in that it has both the highest Integrated Difficulty (as the hardest song in the game) and the highest Tournament Difficulty. Did that make any sense?
When calculating the efficiency of songs, we inverse the range factor (which measures the difficulty in staying consistent on a song). So, in fact, you are looking for songs with high Integrated Difficulties but with (relatively) lower marginal Tournament Difficulties.
As for your final question, I meant the former (worthless because only tournaments determine the best). _________________
|
|
Back to top |
|
|
Irish.MTA (Retired?) Trick Member
Joined: 28 Apr 2003 Location: Clemson, SC |
18. Posted: Sat Jun 17, 2006 8:46 am Post subject: |
|
|
As a total math junky this is definitely intriguing work. Assuming I have any decent amount of free time I'll try to get in touch with you about some of this stuff - or feel free to IM me if you want to just talk about any data you think you might want to collect or anything I could do to help or whatever.
I'm going to a tournament next weekend. I can try to collect a whole bunch of data and you/I/we can try plugging it in if you want another set of test cases or something.
Keep it up.
Note: I'm a little worried that a lot of the players at this tournament won't have GS accounts. -_- _________________
The winds of change are oft unpredictable... |
|
Back to top |
|
|
nasheq Trick Member
Joined: 04 Jun 2006
|
19. Posted: Sun Jun 18, 2006 9:24 pm Post subject: |
|
|
i think the entire 96% prediction accuracy is bullshit. unless you're comparing like lilQ to mr. scrub.. i feel this is absurd. also i'm sure you aren't claiming that you will predict future matches - only past matches. hence i find it pretty useless. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB 2 © 2001, 2002 phpBB Group
|