u/Open-Ease685

A clean CSV makes it feel like the scraping part is done.

Usually, that is where the trouble starts.

Most web-scraped analysis breaks before the model, before the dashboard, before the “insight.”

A few examples:

Blank fields can mean different things.

Maybe the site did not list the value. Maybe your selector broke. Maybe the crawl got blocked. Maybe the page layout changed. If those blanks mostly happen on one region, vendor, or page type, that is not just missing data. That is bias.

Dates can look valid and still be wrong.

03/04/2024 is March 4 in one place and April 3 in another. A parser will not always throw an error. Sometimes it just gives you the wrong date very confidently.

URLs are messy.

HTTP, HTTPS, www, no www, trailing slashes, tracking parameters, session tokens. Same page, five URLs. Count too early and you are measuring URL noise, not entities.

That is the annoying part about scraping. The pipeline can look fine while the dataset is already off.

The real validation is not “did the script run?”

It is:

Does the scraped field match the rendered page?

Are the row counts close to what you expected?

Are missing values clustered somewhere suspicious?

Did the site layout change across page types?

Did you dedupe before counting?

Bad extraction does not get fixed later with a better model.

Before analyzing scraped data, first check what you actually scraped.

u/Open-Ease685 — 14 days ago
▲ 2 r/EnergentAI+2 crossposts

A lot of structural analysis reviews treat FEA/test correlation like a pass/fail check:

Simulation close to test result = model validated.

But sometimes the model matches because it’s wrong in two ways that cancel each other out.

Examples:

  • Boundary conditions too stiff, but material modulus too low
  • Bonded contact too stiff, but fixture compliance missing
  • Missing bolt preload offset by friction set too high
  • Coarse mesh hiding stress peaks while over-constrained supports inflate stress elsewhere

Each can make one metric look “right.” Peak displacement matches. Max stress looks reasonable. The contour plot looks convincing.

But the load path can still be wrong.

That’s the dangerous part. The model may match the first test, then fail on the next design change because it never captured the real physics.

A better check is to perturb assumptions one at a time: fixture stiffness, friction range, contact behavior, preload, mesh density. If several different assumption sets can all be tuned to match the same test number, that number didn’t really validate the model.

Good correlation should be pattern-based, not just scalar-based. It’s much harder to fake displacement, strain distribution, reaction forces, failure location, and deformation shape all at once.

The better question is not “does the number match?”

It’s “which assumption is driving the mismatch?”

Matching one test result should be the start of validation, not the end.

reddit.com
u/Open-Ease685 — 1 day ago

The shift is real, but the usual “e-commerce killed stores” framing is too simple.

U.S. e-commerce went from 0.6% of retail sales in Q4 1999 to 11.2% in Q4 2019, then 16.4% by Q3 2025.

That’s a big move, but it wasn’t smooth.

Before COVID, online retail was already gaining share steadily, especially through the 2010s. It had stopped being a niche and had become a normal growth channel.

Then COVID messed up the chart.

In Q2 2020, e-commerce share jumped from 11.9% to 16.3% in one quarter. That wasn’t some clean adoption curve. A lot of people were buying online because they had no better option.

Then some of it reversed. By 2022, e-commerce share was back around 14.2%.

So I don’t think the pandemic “changed everything forever.” But it also didn’t change nothing. It pulled some adoption forward, then gave part of it back.

Since 2023, the share has been climbing again, just more slowly. It went from 15.0% in Q1 2023 to 16.4% in Q3 2025.

So the story is probably:

  • online is still gaining share
  • the pandemic spike was not fully permanent
  • the post-pandemic pullback was not a reversal
  • the current trend is slower, but still positive

For retailers, that matters. Treating 2020 as the new normal would have been a mistake. Treating the 2021–2022 pullback as proof that e-commerce stalled would also be a mistake.

The more realistic version is boring but useful: e-commerce has been taking share for 25 years, got a temporary COVID boost, gave some of it back, and is now back to grinding upward.

reddit.com
u/Open-Ease685 — 22 days ago

If you’re like me, and linking a 12 month order sheet to a delivery sheet by supplier and material thickness, the main trade-off is speed vs durability. A workbook can look fine early on, then fall apart once new months get added or source data gets messy.

How the formulas compare

SUMIFS is usually best for numeric outputs like delivered quantity, open quantity, or totals by supplier and thickness. It handles multiple criteria cleanly and is usually the most reliable.

XLOOKUP is better for returning one field, like status or promised date. It works for multi-key joins, but only if the match is truly unique.

INDEX/MATCH still works, but it is harder to audit and usually not the best choice unless you need compatibility with older Excel versions.

Why lookups fail or return wrong values

The common issues are:

  • extra spaces or inconsistent supplier names
  • thickness stored as text in one sheet and number in another
  • duplicate supplier and thickness combinations
  • fixed ranges that do not expand with new data

The bigger risk is often not #N/A. It is a formula returning the wrong match without obvious signs.

How to structure the workbook

The cleanest setup is:

  • one flat Orders table
  • one flat Deliveries table
  • a separate Report sheet
  • Excel Tables instead of hardcoded ranges
  • a helper key if needed, like Supplier + Thickness + Month

This makes it much easier to add new monthly data without breaking the report layout.

reddit.com
u/Open-Ease685 — 28 days ago
▲ 2 r/EnergentAI+3 crossposts

I’ve been trying to make cleaner, more readable graphs lately and realized most default tools don’t look that great out of the box.

Excel works, but it often ends up looking… basic.

Some tools look better, but take way more effort to learn.

So I’m curious what people actually use in practice:

  • what you consistently go back to
  • what gives you good results without too much friction
  • what you’d recommend to someone who cares about how charts actually look
  • Bonus if you’ve switched tools and noticed a big difference.
reddit.com
u/Open-Ease685 — 1 day ago
▲ 8 r/EnergentAI+1 crossposts

I’ve been working on extracting structured data from directory style websites like media listings, product catalogs, and radio directories, and it’s way less straightforward than it looks. Here's what I've learned though this self-inflicted journey:

1. Static parsing vs headless browsers

If the data is in the raw HTML, use a simple parser. It’s fast, cheap, and easy to scale.

Headless browsers like Playwright or Puppeteer are only worth it if the site is heavily JS driven. Otherwise you’re burning CPU and RAM for no real gain.

2. Picking the “real” URL

Directories often list multiple links for the same item, like mirrors, redirects, or regional versions.

need a consistent rule for what counts as the primary URL. Usually this means using canonical tags or prioritizing certain domains. Everything else should be stored as alternatives, not separate entries, or your dataset gets messy fast.

3. Pagination vs infinite scroll

Pagination is easy. You iterate pages and you’re done.

Infinite scroll is trickier, but the better approach is to skip the UI and look for the underlying API calls. Once you find those, it behaves like normal pagination again.

4. Validating what you extract

Just because you scraped a URL doesn’t mean it’s usable.

You’ll want to check if it responds properly, if it redirects somewhere unexpected, and if the content type matches what you expect.

Deduping also matters a lot, otherwise you end up storing the same thing multiple times.

5. Not getting blocked

If you go too fast, you will get rate limited or blocked.

Basic things still matter like respecting robots.txt, adding delays, and backing off when you hit limits.

You

u/Open-Ease685 — 23 days ago

They said they would cut the 2x usage bonus and cut more of the 5 hours limits, but the consumption has raised to 10x, 15x of what it was before. Codex has become useless for Plus users, two simples prompts now use 75% of the 5h limit. No point of paying anymore, probably switching to Claude soon.

reddit.com
u/Open-Ease685 — 1 month ago
▲ 2 r/EnergentAI+1 crossposts

I’ve been using a bunch of different models pretty heavily lately, and the one thing that keeps bugging me is how obsessed people seem to be with which model is best.

Benchmarks, rankings, tiny differences in reasoning scores…

But honestly, when you actually use this stuff day to day, that’s not what makes the biggest difference, this is (IMO):

Speed:
If a model is even a bit slower, it gets annoying fast.
Like it actually breaks your flow when you’re working.

Cost:
Some of these models are way more expensive, and if you’re using them a lot it adds up quicker than you expect.

Consistency:
Getting a great answer once is easy.
Getting something solid every time is a lot harder, and way more important.

Prompting:
Same model can feel insanely good or completely useless depending on how you ask things.

usually, switching to a “better” model usually doesn’t change as much as people think.

It’s not like everything suddenly becomes 10x better.

Curious if this is just me or if others have noticed the same thing?

reddit.com
u/Open-Ease685 — 23 days ago