mezark - Hacker News

HN

Articles by mezark

63

What happens when you run a CUDA kernel? (fergusfinn.com)

4 hours ago mezark fergusfinn.com

3

A running list of reasons to move to open source (whyopensource.ai)

a week ago mezark whyopensource.ai

1

Moe inference optimizations: 15% lower expert load by request reordering (doubleword.ai)

a month ago mezark doubleword.ai

1

Tensor Network Attention (mainlymatmul.com)

a month ago mezark mainlymatmul.com

5

Redundant Information in LLM Weights (fergusfinn.com)

a month ago mezark fergusfinn.com

1

Tans: Precomputing RANS (fergusfinn.com)

a month ago mezark fergusfinn.com

2

Also-RANS: Asymmetric Numeral Systems for Entropy Coding (fergusfinn.com)

a month ago mezark fergusfinn.com

4

70x faster cold(ish) starts for SGLang (fergusfinn.com)

2 months ago mezark fergusfinn.com

1

QueueSpec – drafting speculation tokens while a request queues (doubleword.ai)

5 months ago mezark doubleword.ai

1

ZeroDP: Just-in-Time Weight Offloading over NVLink for Data Parallelism (mainlymatmul.com)

5 months ago mezark mainlymatmul.com

1

Parallel Primitives for Multi-Agent Workflows (fergusfinn.com)

5 months ago mezark fergusfinn.com

2

New fastest AI Model Gateway – 450x less overhead than LiteLLM (github.com/doublewordai)

8 months ago mezark github.com

4

Should GPUs Make Free Trade Agreements? (doubleword.ai)

9 months ago mezark doubleword.ai

2

Controlled generation of OS LLMs – without impacting latency (youtube.com)

2 years ago mezark youtube.com

3

Takeoff Inference Server Is Now Open Source (github.com/titanml)

2 years ago mezark github.com

4

Falcon 7B running real time on CPU (youtube.com)

2 years ago mezark youtube.com