u/Glittering-Pop-7060

What is the best way to easily extract and identify the content of a large and dense code?

Through imports? Global constants or functions? By retrieving the structure using AST? Taking the most frequent nameable primitives? Or something like that?

What can I identify in the code that tells me what it does, instead of having to read the entire content?

I need something lightweight, free, and easy to run.

reddit.com
u/Glittering-Pop-7060 — 14 days ago

i have around 2000 files (3GB) related to a personal project. they are a mix of code snippets, markdown notes, and research I gathered with llms. the problem is that the files are all over the place, and i do not want to reread everything just to remember what each file was for.

i already have some logs, commit history, message records, and other documentation about parts of the work, but the actual files are still messy and scattered. i need a better way to sort them into more intuitive folders, figure out what is worth keeping or deleting, and identify what is still useful right now.

filename and creation data gives some signal, but the file contents usually tell much more. the issue is scale: i do not want to manually inspect ~2000 files.

i am thinking about writing a script that can extract the most important information about each file’s purpose, so i can generate some kind of overview before reorganizing everything.

has anyone dealt with something like this before? what kind of workflow or tools would you recommend for this?

Edit: I imagine Markdown is easy to handle, since I just need to extract the headings, which already provides a lot of relevant information. But I don't know what to do with the text files and the JS, HTML, or Python code.

reddit.com
u/Glittering-Pop-7060 — 17 days ago

i have around 2000 files (3GB) related to a personal project. they are a mix of code snippets, markdown notes, and research I gathered with llms. the problem is that the files are all over the place, and i do not want to reread everything just to remember what each file was for.

i already have some logs, commit history, message records, and other documentation about parts of the work, but the actual files are still messy and scattered. i need a better way to sort them into more intuitive folders, figure out what is worth keeping or deleting, and identify what is still useful right now.

filename and creation data gives some signal, but the file contents usually tell much more. the issue is scale: i do not want to manually inspect 2000 files.

i am thinking about writing a script that can extract the most important information about each file’s purpose, so i can generate some kind of overview before reorganizing everything.

has anyone dealt with something like this before? what kind of workflow or tools would you recommend for this?

Edit: I imagine Markdown is easy to handle, since I just need to extract the headings, which already provides a lot of relevant information. But I don't know what to do with the text files and the JS, HTML, or Python code.

reddit.com
u/Glittering-Pop-7060 — 17 days ago