u/GoatRocketeer

How do I quantify if a model is good?

I have data from a video game - winrate of each character at a given level of experience on that character and at a given skill level of the player using that character. The data on some characters is deficient at the extreme skill levels. I would like to use the data which is not deficient to make predictions in the areas which is deficient.

As I understand it, in order to make predictions I have to parameterize the data. That is, I have to make an educated guess beforehand of the underlying function winrate = f(experience, skill).

Say I have made such a guess of what f() is. How do I quantify whether my guess was shit or good?

reddit.com
u/GoatRocketeer — 4 days ago

ELI5: Why is it ok to penalize MLE on the 2nd derivative?

I'm studying splines.

The soft justification for penalizing the second derivative makes sense: "2nd derivative+ is wiggily". But the text implies this choice has some sort of formal backing and not just "it looks good".

The text had a passage on "bayesian priors". I don't have any bayesian statistical background im afraid.

Maybe its too complicated to ELI5 a bayesian prior and the only answers are "it means wiggilyness" and "go learn bayesian statistics", but on the offchance there's some deeper explanation that's still intuitive i made this post.

reddit.com
u/GoatRocketeer — 6 days ago

I don't have professional or academic experience with higher level statistics so sorry if I ask obvious questions or make stupid choices.

I have video game data with one dependent variable and three independent variables.

The dependent variable is winrate. The three independent variables are character, experience on that character, and rank of the player.

I know the general shape of winrate as a function of experience with fixed character and fixed player rank (steep initial rise that eventually stabilizes at some final value). For this I've had most success with LOWESS, but that's besides the point.

Winrate is clearly also a function of player rank. The problem is, experience-on-character is also a function of player rank (highly ranked players tend to play more). That means winrate as a function of experience with fixed character and unfixed player rank must take this into account.

I'm unsure how to approach a situation where I have multiple, related independent variables. I could just fix player rank but certain ranks have low sample sizes and I suspect partitioning the data that way leaves information on the table. I believe I should probably start by selecting a model to relate winrate, experience, and rank together. I suppose I could create a 3D scatterplot and attempt to eyeball a relationship off of the resulting surface, but beyond that I'm at a loss.

How do I select a model in this situation?

reddit.com
u/GoatRocketeer — 20 days ago