A new study finds that large language models (LLMs), used with straightforward prompting, perform poorly on routine number-crunching tasks that hospital administrators depend on every day to track ...