u/Beginning-Fruit-1397

▲ 140 r/Python

How to deal with slop PR's as a maintainer?

says you are the maintainer of a small (50-100 stars) library.

You see someone fork your repo, mention one of your issues in his commits, so your are happy, someone taking true interest in your work!

You take a look at his branch, and there you see pure AI slop, with files at the repo root (not in the src), tests with print statement even tough you use pytest and it's clearly explained in the contributing doc, and purely hallucinated imports like "from my lib import Foo, Bar" even tough there's never any mention of these two in the code or the documentation (and thus completely incomprehensible code with subclasses from these hallucinated types, etc...)

how to best deal with this without appearing hostile to other potential future contributors?

I want contributors, I'm very happy for anyone taking a look at my work, but at the same time that person has other forks of repos where it just seems to be hunting for "good first issues" label, and thus I'm not sure on the value of giving an honest review if it's not clear on wether there's a genuine intention to resolve the issue or just collect cool github points.

EDIT 11h later:

Thanks to everyone who gave his perspective!!

I don't think I have the time immediately to answer to everyone but there's a lot of good advice here.

By the way LMAO I should have linked my lib to maybe get actual contributors, this post is doing views.

Hint: it's the top one ranked in this comparison ->
https://www.reddit.com/r/Python/comments/1rj3ct7/a_comparison_of_rustlike_fluent_iterator_libraries/

reddit.com
u/Beginning-Fruit-1397 — 13 days ago
▲ 25 r/rust

I made Option and Result in Rust, for python (and Iterator but this one not fully in Rust, yet)

Hello all!

I've been working for a while on a library for python users with Pyo3 bindings.

github:
https://github.com/OutSquareCapital/pyochain

website doc:
https://outsquarecapital.github.io/pyochain/

It provides notably:

- Fully type safe, performant `Option` and `Result` types written in Rust for minimal overhead, and fully type safe (generics + exhaustive pattern matching with type checkers)
- An `Iter` class mimicking rust `Iterator`, with lots of fast methods, most of the time calling Rust/Cython/C level compiled code, and an API close to it's model, e.g `Iter::find` return an `Option`, `Iter::filter_map` filter the `Null`, etc...
- Various collections types -> `Vec`, `Seq`, `Dict`, `Set`, `SetMut`, `Range``, interacting with the two aforementionned constructs, and between them in similar ways, e.g `Dict::get_item` return an `Option`
- A hierarchy of classes mimicking `collections.abc` from Python standard library that all pyochain objects herit from, and who allow user to have lots of possibilities regarding typing, as well as lot of shared capabilities between them

Someone on the Python sub did a comparison 3 months ago between my library and many other similar, and It was determined to be the best choice (yes I'm still very happy from this).

See here:

https://www.reddit.com/r/Python/comments/1rj3ct7/comment/o8aordo/?context=3

I haven't shared it on the Rust sub since then because making the following ->

from pyochain import Null, Option, Some

def foo(x: Option[int]) -> int:
    match x:
        case Some(value):
            return value
        case Null():
            return 0

Work with type checkers in a way that was flagged immediately if badly handled, was non-negociable for me, which is now working!

At the same time I wanted to improve the performance. It's already by far the best compared to all other equivalents, but I'm sure there's still a lot of room for improvement, as I am still a beginner in Rust.

Eventually I want to move all the code to Rust to have builtin-like speed on the whole ecosystem.

Tought it may spark some interest if you happen to use both languages regularly :)

Note that it's still in active developpement, so breaking changes are to be expected if you want to start to use it.

If that don't scare you, here's the Pypi link:
https://pypi.org/project/pyochain/

u/Beginning-Fruit-1397 — 23 days ago

Hello everyone,

For the past two months I've been working on belugas:
https://github.com/OutSquareCapital/belugas

It's a python dataframe library aiming to provide a polars-like API to build and execute queries on a duckdb backend.

Compared to Ibis or narwhals, the goal is not to be a general multi-backend tool, but rather a specialist one, covering as much as possible of polars and duckdb functionnalities (currently +700 expressions functions, selectors, unpivot, join_asof, geometry datatypes, and more).

Disclaimer:

I'm just an uni student, I have no affiliation at all with DuckDB, or polars.

It's not an "official" project showcase, because even if it work and has already can do a lot, it's still WIP.

There's no website doc yet, nor Pypi package, nor proper docstrings in all expressions, nor full test coverage (+1000 currently but it should be way more), etc....

I will do one at some point in the future however!

Hence, the motivation behind my post is to ask the following question:

If you were to use this library, would you rather have duckdb aligned results after computation, or polars ones?

They can diverge quite a lot.

For example, the `millisecond` function doesn't give the same results at all, the `len` function doesn't have the same meaning, column names can be different after a join, etc...

I see pros and cons for both.

## DuckDB aligned results

- Best performance wise if there's a divergence that need a coalesce, null handling, or custom column name resolution implementation

- Aligned with duckdb documentation and backend expectations

- Are simpler to implement internally

## Polars aligned results

- Way easier to test (simply check equality with polars computations)

- I expect that polars users would be more interested by belugas than duckdb users, hence they would be the most expected results

- The possibility to simply switch from `bl.col()` to `pl.col` seamessly in a data pipeline when one tool is preferrable to the other (which is a huge point IMO)

reddit.com
u/Beginning-Fruit-1397 — 1 month ago