r/econometrics

▲ 13 r/econometrics+7 crossposts

I’m the founder of https://marketontology.com, if you think you can grow this platform to 1,000+ retained paying users then please message me, there is a significant amount of money to be made. Main customer acquisition channels are currently Google search ads (recently became more effective) and organic Reddit posting (has pretty much stopped working).

u/thinq-81 — 16 hours ago

Why are SUTVA violations so neglected in econometrics?

As a macroeconomist, general equilibrium and spillover effects are bread and butter for my field. E.g. corporate tax cut in one state attracts businesses from other states, stimulus checks boost up prices which then dampen an aggregate demand effect etc.

I found it quite surprising that none of the major textbooks in econometrics, like Hayashi, Wooldridge, Angrist and Pischke, Hansen etc. cover violations of SUTVA.

Also, while I'm not an expert in this field, I noticed a very large dearth of econometrics research papers allowing for SUTVA violations. Many of the key identification theorems do not have counterparts allowing for SUTVA violations. Notable exceptions are Munro, Kuang and Wager (2025), Vazquez Bare (2023) and Butts (2023).

reddit.com
u/Global_Channel1511 — 22 hours ago

Potential outcomes and structural equations, book/paper recommendations?

Hello everyone,

I recently started working on a project where most people come from an economics/econometrics background, while mine is mostly in computer science.

I'm running into some friction when discussing modeling approaches with my colleagues. I learned causal inference mainly from the potential outcomes perspective, and I've been surprised to face some resistance when using terminology like ATT, ATE, LATE, or discussing unconfoundedness.

From what I gather, most of my colleagues learned from books like Wooldridge, which frames causal inference largely in terms of structural equations (please correct me if I'm wrong).

Can anyone recommend authors, books, or papers that bridge these two frameworks?

reddit.com
u/Raz4r — 3 days ago

Am I the only one bothered when some textbooks conflate causal/structural and statistical linear regression models?

Or at least not emphasize on it enough. Feel like making this distinction explicit early on would prevent a lot of back-and-forth later.

reddit.com
u/Wudulala — 6 days ago
▲ 0 r/econometrics+1 crossposts

Eco (Hons) from Heritage college

Hi. I want to pursue economics hons after 12. I was thinking about applying to Heritage college, kolkata. I would like to know what are the career prospects and is a good option. I am going to be honest my boards marks are not really good but I really want to pursue economics but my options are limited.

reddit.com
u/Fickle-Ideal3711 — 5 days ago

DiD with continuous treatment

Hi everyone! I'm currently working on my Master's thesis and I would appreciate your feedback on a few doubts/questions I have.

My research question examines whether a broadband expansion policy in rural areas affected new firm formation. Although all provinces were exposed to the policy to some extent (i.e. there are no untreated units), due to the presence of rural areas in each province, exposure intensity varied across provinces. Therefore, treatment is modeled as a continuous rather than a binary variable.

In this case, what seems most appropriate to me is to follow the framework proposed by Brantly Callaway, Andrew Goodman-Bacon, and Pedro H. C. Sant'Anna (2024), although I am still struggling to understand how pre-trend tests should be conducted in this setting.

What are your thoughts on this? I would really appreciate hearing your views on the issue.

Thank you all in advance!

reddit.com
u/Ill_Veterinarian1275 — 7 days ago
▲ 9 r/econometrics+1 crossposts

Logistic Regression with structurally missing predictor subset

Hi all,

I am a ML academic researcher and for a project need to implement a logistic regression baseline.

The problem is however that a subset of my predictor variables are only available if a 'Presence Inidicator' variable = 1

So:

Variable group A (binary, categorical, numeric) are always available

Availability indicator B (binary) is always available

Variable group C (binary, categorical, numeric) is only available if B = 1, else NA

Tree-based models handle these NA values automatically , but Logistic Regression does not.

Knowing that the numeric variables in C can have an actual value of 0, how would you model this specification to remain (somewhat) interpretable.

Shoutout in my PhD dissertation for the amazing person who can help me out!

reddit.com
u/svr120 — 7 days ago
▲ 32 r/econometrics+2 crossposts

Canada Still Calls Itself G7… But It’s Now Ranked G11 by IMF, World Bank & UN

The cope is strong.
Canada continues to brand itself as a G7 country, but the actual GDP rankings tell a different story:
Latest Rankings:

  1. United States
  2. China
  3. Germany
  4. Japan
  5. United Kingdom
  6. India
  7. France
  8. Italy
  9. Russia
  10. Brazil
  11. Canada
  12. Australia
    Source: IMF (2026), World Bank & United Nations (2024)
    We used to be a top 10 economy. Now we’re sliding.
    What do you think is behind the decline? Productivity? Policy? Energy? Immigration without growth? All of the above?
u/metricshour — 14 days ago

Where to begin for somebody that should know better

I’m a macro economist but currently only work on analysis not forecasting. I did a bachelor of econ which involved econometrics, but theory was only the first year and in intermediate/ advanced it was all just coding through R. I feel like I got good grades without knowing much.

I then took 2 years off after uni so now I’m 6 years since I did any theory and I’m out of my depth. What resources would people recommend to relearn as I don’t have access to any of my first year materials.

Also any other advanced materials to get into forecasting would be appreciated.

reddit.com
u/Longjumping_Monk2694 — 10 days ago

Fixed Effects Model

Am I correct in my understanding that FEMs have low statistical power and therefore we cannot assume causality, only association? And to assume causality, we have to make sure it is not reverse causality? Not really sure about the strengths of the FEM as all I read seems to point to the low statistical power and potential for bias estimates

reddit.com
u/AgitatedHuckleberry8 — 11 days ago

Self studying econometrics as a math major.

I am a mathematics major and I have already taken economics electives up to intermediate micro and macro economic theory.

I am also proficient in R and Python, and my specialization in mathematics is in statistics and data analysis. So I have taken time series data analysis, probability theory, regression methods, multivariate analysis, stochastic processes, statistical inference and convex optimization along with the usual pure math courses (real and complex analysis, linear algebra, graph theory etc.)

I would like to start self learning econometrics since I have taken a strong interest in it after learning what it’s about on the surface, but I don’t know where to start. Any help would be appreciated.

Also, is measure theory required for econometrics? I can either study measure theory or or stochastic calculus, so which is more useful in econometrics?

reddit.com
u/Outrageous-Sun3203 — 13 days ago

I built a quantitative model to find the fair value of raw Pokémon cards (Hedonix H6 raw engine update)

Hey guys, I'm back with another Hedonix update for you.

After implementing the first H6 engine predicting PSA 10 prices and improving it with pop counts and gem rates, I wanted to build a new model that predicts raw card prices. This one was quite difficult since it does not factor in any price as an input (like the graded model does with raw prices).

The whole research started based off a YouTuber's video idea, in which he claimed he built a model doing the exact same thing while achieving an R² of 0.88. My model started with an R² of 0.31.

Why his R² looked so good: His sample was around 30 hand-picked chase cards. With 4-5 regressors on 30 data points, you get an R² > 0.85 in-sample almost mechanically. Unfortunately, no cross-validation was shown in the video. When I rebuilt his architecture on 358 cards with an honest leave-one-set-out CV, it dropped to 0.31. That's not a knock on his work, just what happens when you scale a small in-sample model to a real out-of-sample test.

How I got from 0.31 to a usable model:

  • Bigger panel + era flags (358 SV cards → 2,622 across SM/SWSH/SV): +0.12 R².
  • Adding graded data as features (pop count, gem rate): +0.05 R².
  • eBay daily volume time-series (730 days of daily sales counts per card): +0.28 R².
  • XGBoost over Linear Regression: +0.07 R².

Features that surprised me by having zero impact:

  • LLM artwork scoring (composition, pose, color).
  • Google Trends per character.
  • Manual character tier tags (Eeveelutions, starters, legendaries).

Final result: I'm proud to say that the new raw model achieves an out-of-sample R² of 0.83 and a median error of 34% on 2,622 cards. For comparison, my graded H6 v2 lands at an 0.87 R² / 20% median error. But keep in mind that raw data will always be noisier than graded because of bulk listings, casual sellers, and the lack of a PSA arbiter to standardize condition.

Thanks for reading. As always, I'm still looking for beta testers, so let me know if you wanna test Hedonix

https://preview.redd.it/d5kwu3346xzg1.png?width=1080&format=png&auto=webp&s=88480c8d0ffd369d37d2a55f9216a57d95fadd1f

https://preview.redd.it/bfej2xo46xzg1.png?width=1080&format=png&auto=webp&s=ae40902ff443e8242086c9e985e17d0d08cc9885

reddit.com
u/Commercial_Many_909 — 14 days ago
▲ 9 r/econometrics+5 crossposts

More Resources to Share: Forecast Calculations of S&P 500 Companies

Hello! I made a post on this sub a couple days ago about how I made a resource for stock return statistics (average, standard deviation,median, mode, max, min, kurtosis, skewness) that gets updated every trading day.

Well over the last couple of days, I added a daily forecast page that calculates every S&P 500 company's estimated return for close of THAT DAY. There are quite a few common forecasting methods such as calculations for monte carlo simulations, Volatility Adjusted Geometric Brownian motion, exponential triple smoothing, simple linear, and future value, and what the calculations have to say for the price of the stock.

Feel free to take a look! It gets updated every weekday morning.

I should say, the calculations aren't what is going to ACTUALLY happen by close haha, but they are interesting to see what trends are occurring when the calculations are done.

I'll continue to post more resources, but I just wanted to share this since the statistics resource seemed to be something that some people like.

Happy Investing.

Original Post: https://www.reddit.com/r/hedgefund/s/u1PFxB8tyD

Resource: https://www.systemscapital.net/

u/SystemsCapital — 10 days ago

Spatial Econometrics + Graph Theory advices

Hi!

I’m starting to research topics for my master’s degree thesis.

My main research areas are: Spatial Statistics/Econometrics and Spatial Machine Learning and i’m trying to connect these topics with Graph Theory/Network Science.

I would ask you to suggest me some:

  1. ⁠Good books/resources to study theory of Graphs/Network Science
  2. ⁠Any relevant paper or study who tries to connect or give me a review of the literature between Spatial Statistics and Network Science/Graphs.
  3. ⁠I’m trying to stay more “Statistical” as possibile avoiding Neural Networks/Deep Learning/Computer Science research areas but whether you have some relevant materials I’ll evaluate it.

I know it’s a huge request but my PI gave me the topics and said: now go away and research and I don’t know where to start.

reddit.com
u/endixx__ — 14 days ago

Dought

How's econometrics with data science at bachelor's level? Is it worth it?

What kind of roles does that mainly take me to?

Is there scope to enter into core finance roles?

reddit.com
u/Tiny_Wing_Thing — 12 days ago
▲ 4 r/econometrics+1 crossposts

Backcasting forecast errors: model collapsing to mean [P]

Hey everyone,

I am kind of desperate for help right now on my current project. I'll try and be as clear as possible.

I'm working on a time series backcasting problem. The values I want to backcast are forecasts (not ML forecast, but think of weather forecasts) at different horizon (from 1 to 14). So to be clear, at a date D, I have 14 forecasts (forecast at D+1,..., D+14). I have such forecasts from 2020 to 2026 (each row represents a day, each (date, horizon) key is unique). So I have 14 dates duplicated as blocks because each row consists of on unique(date, horizon) -> target_date. I hope this is clear enough.

So the goal is to backcast those forecasts before 2020 (say 2019-2020 for simplicity). Besides forecasts values and horizon columns, I have "actuals" that are the true measured values for a particular variable (say temperature), and "normals" which is a smooth curves representing the climatology norm for a particular data. This "normals" column captures the seasonality, trend, and every other repetitive and predictable patterns.

So to be clear I have :

* dates (of forecast emission) | actuals | normals | horizon | forecasts *

And to really emphasise this point : dates, actuals and normals are the same for 14 consecutive rows (One row equals one horizon).

The target I want to predict is the following : forecast - actual_at_forecast_date

So i want to predict the true error observed (say i had predicted 20 (forecast) for today and I measure 18 (actual) then my target is +2).

So far, I've done the following :

- Transform target to remove annual seasonality, long-term trend and level-scaling

- Engineered classic features such as anomaly (actual-normal), lagged anomalies, rolling stats (std, mean, median, quantiles)

- Engineered target encoding features such as target_encoding_horizon_x_month

- RandomForest with max_depth 10-15, min_leaf 10, max features "sqrt", n_estimators 300

My train/val folds are reversed because I wanted to best evaluate on a backcasting framework. I made sure there is no leakage.

FINALLY:

My main problem is that, even with a LOT of features combination, trying a LOT of tuning, my prediction is very shallow and shrinking to the mean (the std and q10, q90 are off by a lot). So given I try to predict forecast_error which is centered on 0, I start to think that I only capture noise because my predictions really won't fit anything. MAE is getting worse with higher horizon forecasts which is only natural but even for horizon 1 my prediction is as good as predicting only 0s MAE-wised. Please if anyone has ideas that I can explore on my own I would be so grateful. I know you don't have all the details here but if you have experience with backcasting and has some recommendations I would be so grateful.

Hey everyone,

I'm working on a time series backcasting problem and I'm running into a fairly stubborn issue. I'd really appreciate any insights from people who have worked on similar setups.

Problem setup

I have daily-issued forecasts with multiple horizons:

  • At each date D, I have forecasts for D+1, ..., D+14
  • Data spans 2020–2026
  • Each row is a unique (forecast_date, horizon) pair

Toy example:

forecast_date horizon target_date forecast actual normal
2023-01-01 1 2023-01-02 20 18 19
2023-01-01 2 2023-01-03 21 20 19
... ... ... ... ... ...
2023-01-01 14 2023-01-15 25 23 20

Important:

  • forecast_dateactual, and normal are identical across the 14 horizons
  • Only horizontarget_date, and forecast vary

Objective

I want to backcast forecast errors before 2020.

Target:

target = forecast − actual(target_date)

So if forecast = 20 and actual = 18 → target = +2.

Features

  • forecast, horizon
  • actual, normal
  • anomaly = actual − normal
  • lagged anomalies
  • rolling stats (mean, std, quantiles)
  • target encoding (e.g. horizon × month)

Model

Random Forest:

  • max_depth: 10–15
  • min_samples_leaf: 10
  • max_features: sqrt
  • n_estimators: 300

Validation

  • Time-based splits adapted for backcasting
  • No leakage (checked carefully)

Main issue

Predictions are very shallow and collapse toward 0:

  • Very low variance
  • Poor estimation of tails (q10 / q90)
  • Even for horizon = 1, performance is close to predicting constant 0 (in MAE)

MAE increases with horizon (expected), but overall performance remains weak.

Diagnostics

  • std(predictions) / std(target) ≈ 0.4 at best
  • This ratio decreases with horizon

So the model is clearly under-dispersed.

Interpretation

At this point I suspect:

  • either the signal is very weak
  • or the model is too conservative and fails to capture amplitude

Any help, feedback, or ideas to explore would be greatly appreciated.

Thanks a lot.

reddit.com
u/Ambitious-Log-5255 — 13 days ago

Careers and job roles.

Hi!

I am 24M moving to Netherlands this fall to pursue a masters in Econometrics (Quantitative Finance track) at Erasmus University Rotterdam.

To the people here who have studied Econometrics (or equivalent degree) or are working in this field-

What jobs are you in now and in which city are you based in?

What do you do on a daily basis?

How much programming and mathematics do you use on a daily basis?

How is the career progression?

I have experience as a SWE and I’m looking to transition into roles which are closer to the markets.

Thanks!

reddit.com
u/lazy_boy_1 — 14 days ago