Articles by stephantul
34

From Chesterton's fence to Chesterton's gap (stephantul.github.io)

1

Why scikit learn's fit transform is probably not for you (stephantul.github.io)

4

Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep (github.com/minishlab)

3

Show HN: Semble – Fast code search for agents with near-transformer accuracy (github.com/minishlab)

1

Show HN: Skeletoken, a Python package for editing model tokenizers (github.com/stephantul)

1

Show HN: PyNIFE. 400-900× speedup for embedding-based retrieval pipelines (github.com/stephantul)

1

Show HN: Skeletoken, a Package for Editing Tokenizers (github.com/stephantul)

2

Turning any tokenizer into a greedy one (stephantul.github.io)

3

Decasing Transformers for Fun (stephantul.github.io)

4

Model2Vec as a Fasttext Alternative (minish.ai)

2

Using overloads to handle union return types in Python (stephantul.github.io)

2

Ask HN: Favourite resources for learning programming type theory?

1

Evaluating ML classifiers using relative error instead of absolute accuracy (stephantul.github.io)

1

Defeat stringly typing without making your users unhappy (stephantul.github.io)

5

Distilling ModernBERT into a static model doesn't work (minishlab.github.io)

4

Show HN: SemHash – Fast Semantic Text Deduplication for Cleaner Datasets (github.com/minishlab)

18

Train faster static embedding models with sentence transformers (huggingface.co)

4

Semhash: Fast deduplication and dataset multitool in Python (minishlab.github.io)

5

Model2Vec: Make sentence transformers 500x faster on CPU, 15x smaller (huggingface.co)

6

Show HN: Model2Vec: make sentence transformers 500x faster on CPU, 15x smaller (github.com/minishlab)

3

Show HN: Model2Vec: make sentence transformers 500x faster on CPU, 15x smaller (github.com/minishlab)