34
1
Why scikit learn's fit transform is probably not for you (stephantul.github.io)
4
Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep (github.com/minishlab)
3
Show HN: Semble – Fast code search for agents with near-transformer accuracy (github.com/minishlab)
1
Show HN: Skeletoken, a Python package for editing model tokenizers (github.com/stephantul)
1
Show HN: PyNIFE. 400-900× speedup for embedding-based retrieval pipelines (github.com/stephantul)
1
Show HN: Skeletoken, a Package for Editing Tokenizers (github.com/stephantul)
2
Turning any tokenizer into a greedy one (stephantul.github.io)
3
Decasing Transformers for Fun (stephantul.github.io)
4
Model2Vec as a Fasttext Alternative (minish.ai)
2
Using overloads to handle union return types in Python (stephantul.github.io)
2
Ask HN: Favourite resources for learning programming type theory?
1
Evaluating ML classifiers using relative error instead of absolute accuracy (stephantul.github.io)
1
Defeat stringly typing without making your users unhappy (stephantul.github.io)
5
Distilling ModernBERT into a static model doesn't work (minishlab.github.io)
4
Show HN: SemHash – Fast Semantic Text Deduplication for Cleaner Datasets (github.com/minishlab)
18
Train faster static embedding models with sentence transformers (huggingface.co)
4
Semhash: Fast deduplication and dataset multitool in Python (minishlab.github.io)
5
Model2Vec: Make sentence transformers 500x faster on CPU, 15x smaller (huggingface.co)
6
Show HN: Model2Vec: make sentence transformers 500x faster on CPU, 15x smaller (github.com/minishlab)
3