u/Chunky_cold_mandala

How do you use LinkedIn for your build in public strategy?

What been your experience? I've gotten okay engagement with YouTube shorts and I'm wondering if I should give LinkedIn a try, how it's different, etc.

reddit.com
u/Chunky_cold_mandala — 2 days ago
▲ 2 r/Rag

I built a codebase RAG tool that chunks at the function level (AST-free) and queries via SQLite

Standard RAG pipelines are wonky for codebases because they slice text arbitrarily by token count (e.g., every 500 tokens). This rips functions in half, separates decorators from their classes, and destroys the architectural context before the LLM even sees it.

To solve this, I built GitGalaxy (and its blAST engine), a utility that drops arbitrary token slicing and builds the RAG context starting strictly at the function level.

Because it starts at the function level, the telemetry naturally rolls upward to give your RAG agent exact context at any scale:

  1. Functions/Methods roll up into...
  2. Classes/Structs (Entities), which roll up into...
  3. Files (calculating exact Blast Radius and network centrality), which roll up into...
  4. Modules/Folders, up to the global Repository.

I built this specifically for the utility of giving agents a deterministic map rather than a fuzzy embedding search.

reddit.com
u/Chunky_cold_mandala — 10 days ago

I’m an outsider looking to transition into the deep-tech/systems architecture space. I have a PhD in Pharmacology, 10+ years of wet-lab research (assay development, quantitative data analysis pipelines, microscopy), and a decade as a tenured biology professor.

The Strategy & Portfolio: I know that if I drop my resume into a standard Workday portal, the ATS will instantly banish me to the shadow realm because it says "Biology Professor" instead of "SWE with 4 YOE." To counter this, I’ve spent the last several months building an aggressive "Proof of Work" GitHub portfolio. My goal wasn't just to write code, but to prove I understand professional hygiene: strict CI/CD pipelines, proper Git branching, robust testing, and enterprise-grade documentation. I tackled the hardest, highest-friction problems I could find that were genuinely fun. My repos (which include short video demos of the tech working) currently feature:

* A bare-metal, distributed SCADA middleware for a physical small-parts sorting machine (handling deterministic hardware interrupts).

* A custom AST-free, LLM-free static analysis engine that maps massive enterprise codebases into 3D WebGPU knowledge graphs.

* A genetic evolution engine coupled with a physics simulation to optimize machinery tolerances.

I am 100% transparent that I babysit an AI agent and we ping-pong code and ideas off each other. I architect the physics and the systems logic; the AI acts as my high-speed syntax translator.

The Go-To-Market Plan: Instead of fighting the ATS, my plan is to bypass it entirely. I want to use my GitHub and video demos as a battering ram, sending targeted LinkedIn drops directly to CTOs, Lead Engineers, and VPs with a simple message: "This is my background, I built X to solve Y, I find your team's work fascinating—want to chat for 10 mins?" My Questions for the Veterans Here: Does this strategy actually stand a chance? In today’s brutal market, will CTOs/Leads actually respect the deep-tech hustle, or will I just get ignored? The Resume Dilemma: Should I still bother trying to format a traditional resume to grind through the ATS, or should I go all-in on the direct-networking/portfolio approach? The AI Elephant: Is being honest about pair-programming with AI agents a red flag for hiring managers, or is it seen as a standard force-multiplier now, given the complexity of the systems I'm building? I'm ready for blunt truths. Thanks in advance.

reddit.com
u/Chunky_cold_mandala — 16 days ago

Time is a highly abstract concept. As a brother of someone with cognitive impairments who is not good with numbers or geometric representations of concepts on wheels, I often marvel at how difficult we've made communicating time. Our brains evolved to detect motion. My brother can't read numbers but he can sure understand space, like catching a baseball.

So I built a clock that shows time as physically tangible motion. This is more like a sand timer or a sun dial than a clock.

I use an IR remote to set the time, distance and it calculates the speed for a bus to leave and arrive at the house.

GitHub Repository (Source Code & Schematics): https://github.com/squid-protocol/No_number_clock

I use an Arduino Mega Board and a custom program with a:
The Fractional RPM Algorithm: How to "trick" standard motor libraries to pulse a 28BYJ-48 stepper motor at an incredibly slow, steady crawl (e.g., 0.3 RPM) without divide-by-zero crashes.
The Hardware: Integrating an Arduino Mega 2560, a DS3231 Real-Time Clock, and an IR receiver for wireless menu navigation.
The Math: Calculating precise sub-integer speeds over long distances.

u/Chunky_cold_mandala — 21 days ago

Hey all,

I built a custom fast, deterministic regex scanner for another project but realized the underlying engine would help me solve some other annoying problems in my life.

Thought it could be helpful in a jam, if you ever need to scan a massive log on-prem and don't wanna wait hours for your SIEM to index the data.

I recently ran it against a simulated raw 2.1GB production stream log hunting for specific error signatures:

  • The speed: Completed a single-pass scan in 30.07 seconds.
  • The memory: Minimal. It streams binary and never loads the full file into RAM.
  • The catch: isolated a simulated coordinated brute-force attack occurring exactly at 14:00 that I had created from a fake_giant_log_with_random_issues.py.

It spits out dynamically scaled ASCII histograms right in the terminal to help you isolate spikes from the millions of lines of background noise:

 === TIME-SERIES: ERROR ===
 (Filtering to Top 15 Highest Volume Spikes)
 [2026-04-16 14:00] ███████████████████████████████████████ (5,759 hits)  <-- ANOMALY SPIKE
 [2026-04-27 14:00] ███████████████████████████████████████ (5,753 hits)  <-- ANOMALY SPIKE
 [2026-05-02 14:00] ███████████████████████████████████████ (5,718 hits)  <-- ANOMALY SPIKE

How it works under the hood:

  • Zero-loading: Continuous binary streaming. No DB ingestion required.
  • Flexible targeting: Manual grep-style (-k ERROR TIMEOUT) or automated CI/CD ingestion via JSON.
  • Deterministic: Powered by a custom heuristics engine. No heavy ASTs, no LLM hallucinations.
  • Pipeline ready: Outputs telemetry JSON sidecars if you want to hook it into external dashboards later.

https://github.com/squid-protocol/gitgalaxy/tree/main/gitgalaxy/tools/terabyte_log_scanning

reddit.com
u/Chunky_cold_mandala — 24 days ago
▲ 10 r/KnowledgeGraph+1 crossposts

Hi everyone,

I’ve spent the last few months building a custom knowledge graph extraction engine (which I call blAST) designed to map the architectural physics of massive software repositories.

Usually, extracting code into a graph requires an Abstract Syntax Tree (AST). The problem is ASTs are incredibly heavy, strictly monolingual, and fail if a repository doesn't compile. I wanted to map planetary-scale, multi-lingual enterprise systems, so I built a deterministic parser instead. It treats code like text and scans for keyword markers across 50+ languages to build the graph.

Here is how the graph ontology and analytics work:

1. The Ontology

  • Nodes: Files, Classes, and Functions.
  • Node Properties: 50+ dimensional vectors representing regex keyword hits (e.g., raw memory manipulation, state flux,etc).
  • Edges: File (imports/dependencies) and functional execution paths (outbound calls/reachability).

2. Graph Analytics & Network Topology

Once the graph is built, the engine runs network math over the repository to find architectural bottlenecks. I calculate:

  • Modularity & Average Path Length to measure encapsulation.
  • Articulation Points to find the "God Nodes" (if these fail, the graph shatters).
  • Cyclic Loop Density to measure static friction in the architecture.

3. K-Means Clustering on 1.5M Nodes

As all langauges have keywords that roughly mean the same thing, I analyzed 1000 repos of different languages and I took the regex count vectors of 1.59 million file nodes across 50 languages and ran them through an unsupervised K-Means clustering algorithm. The graph converged into 10 distinct architectural "micro-species" (e.g., UI View Layers, Highly Concurrent State Managers, Unshielded Native Core). The clustering algorithm successfully grouped a complex Java service and a defensive Rust file into the same exact node category based purely on their physical edge/property behavior.

4. Graph Traversal Use Cases

I used this graph engine to tear down Google DeepMind's original AlphaFold repo. By traversing the graph, the engine instantly isolated the absolute heaviest bottleneck in the network: a single node (contacts_network.py) running an $O(N^6)$ complexity loop holding up the entire pipeline.

code - https://github.com/squid-protocol/gitgalaxy

example data of google Deepmind's Alphafold - https://squid-protocol.github.io/gitgalaxy/museum-of-code/alphafold_teardown.html

Population data from 100's of repos - https://squid-protocol.github.io/gitgalaxy/03-04-claim-4-comparing-languages/

u/Chunky_cold_mandala — 8 days ago
▲ 39 r/mainframe+2 crossposts

EDIT: Huge thanks to Sirkitbreak99 in the comments for an architectural reality check. I completely mixed up the CICS transactional environment with the JCL batch processing paradigm here. I’m taking the L on the terminology, pivoting the pitch, and focusing this Forge strictly on batch modernization moving forward!

​hey all,

outsider here - phd in pharmacology on a very non-traditional path - the journey of how I got here is winding - never been directly employed in mainframes - anywho -

I’ve been working on a mainframe refactoring suite called GitGalaxy. One of the biggest challenges I’ve run into with legacy CICS applications is that the OS (and the security layer like RACF) usually just sees the massive CICS Server Region as a single black box. The execution intents of the individual COBOL programs inside are largely hidden from the batch environment.

I wrote a deterministic static analysis tool (the JCL Forge) to fix this, and I wanted to get this community's thoughts on the approach.

What the tool is doing in the GIF: I pointed it at IBM’s public cics-genapp sample repository. Since it’s a purely transactional app, IBM never wrote batch JCLs for the business logic.

  1. The Forge parses the monolithic COBOL without using ASTs.
  2. It detects the SELECT statements, EXEC CICS, and EXEC SQL boundaries.
  3. It extracts the program and auto-generates a rigid, least-privilege batch JCL wrapper for it (e.g., locking it to explicit VSAM files or DB2 databases).

Why this matters (The Goal): By generating these JCLs, we are dragging transactional logic out of the black box and forcing it to declare its exact execution intent. It creates an auditable "Zero-Trust" boundary where none previously existed, which is super helpful for modern security teams trying to understand legacy footprints before migrating them.

I've got some other cobol analysis tools, like a dead code finder, dag and schema generators from raw cobol fi

les - What do you think?

https://github.com/squid-protocol/gitgalaxy/tree/main/gitgalaxy/tools/cobol_to_cobol

u/Chunky_cold_mandala — 25 days ago