Unicode Tutorials - Search News

cmp-nct/ggllm.cpp

Run any Falcon Model at up to 16k context without losing sanity Current Falcon inference speed on consumer GPU: up to 54+ tokens/sec for 7B and 18-25 tokens/sec for 40B 3-6 bit, roughly 38/sec and ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Feedback

cmp-nct/ggllm.cpp

Trending now