The use of generative AI may not be helpful in reducing burnout in health care, new research suggests.

Previous research indicated that increased time spent using electronic health record (EHR) systems and handling administrative responsibilities has been a burden on doctors.

So some people had heralded artificial intelligence as a potential solution — yet recent investigations by U.S. health systems found that large language models (LLMs) did not simplify clinicians’ day-to-day responsibilities.

WHAT IS ARTIFICIAL INTELLIGENCE (AI)?

For instance, a 2023 observational study at Brigham and Women’s Hospital in Boston, Massachusetts, examined the impact of using AI for electronic patient messaging.

Researchers prompted a large language model to respond to simulated questions from cancer patients — then compared its output to responses from six board-certified radiation oncologists.

Medical professionals then edited the AI-generated responses into "clinically acceptable" answers to send to patients.

doctor stressed out next to AI image

New studies suggest that generative AI might not help burnout among physicians, as had been previously thought. (iStock)

The study, published in The Lancet Digital Health, found that the LLM drafts posed "a risk of severe harm in 11 of 156 survey responses, and death in one survey response."

"The majority of harmful responses were due to incorrectly determining or conveying the acuity of the scenario and recommended action," the researchers wrote.

FIRST-EVER AUGMENTED REALITY ABDOMINAL SURGERY PERFORMED IN CHILE: ‘A REVOLUTION’

The researchers concluded that LLM-assisted results (those edited by physicians) displayed a "best-of-both-worlds scenario" — reducing physician workload while ensuring that patients get accurate information.

"These early findings … indicate the need to thoroughly evaluate LLMs in their intended clinical contexts, reflecting the precise task and level of human oversight," the study concluded.

female doctor inputs patient info

Researchers concluded that LLM-assisted results displayed a "best-of-both-worlds scenario," reducing physician workload while ensuring consistency of responses and improving the education of patients. (iStock)

Medical billing codes 

Another study from New York’s Mount Sinai Health System evaluated four different types of large language models for performance and error patterns when querying medical billing codes.

GOOGLE BARD TRANSITIONS TO GEMINI: WHAT TO KNOW ABOUT THE AI UPGRADE

The research, published in the journal NEJM AI, found that all tested LLMs performed poorly on medical code querying, "often generating codes conveying imprecise or fabricated information." 

The study concluded, "LLMs are not appropriate for use on medical coding tasks without additional research." The study was funded by the AGA Research Foundation and National Institutes of Health (NIH).

high-tech doctor illustration

One study found that all tested LLMs performed poorly on medical code querying — and that the issue needs further research.  (iStock)

Researchers noted that although these models can "approximate the meaning of many codes," they also "display an unacceptable lack of precision and a high propensity for falsifying codes." 

"This has significant implications for billing, clinical decision-making, quality improvement, research and health policy," the researchers wrote.

Patient messages and physicians' time

A third JAMA Network-published study, from the University of California San Diego School of Medicine, evaluated AI-drafted replies to patient messages and physicians' time spent editing them.

CHATGPT FOUND BY STUDY TO SPREAD INACCURACIES WHEN ANSWERING MEDICATION QUESTIONS

The assumption was that generative AI drafts would lessen a physician's time spent doing these tasks — yet the results showed otherwise.

"Generative AI-drafted replies were associated with significantly increased read time, no change in reply time, significantly increased reply length and [only] some perceived benefits," the study found.

Researchers suggested that "rigorous empirical tests" are needed to further assess AI’s performance and patients' experiences.

stressed doctor surrounded by tools and technology

In the UC San Diego study, generative AI was found to cause "increased read time, no change in reply time [and] significantly increased reply length" in patient messages. (iStock)

Doctor's thoughts on AI

David Atashroo, M.D., chief medical officer of Qventus, an AI-powered surgical management solution in Mountain View, California, reacted to the research findings in an interview with Fox News Digital. (He was not involved in the research.)

"We see an immense potential for AI to take on lower-risk, yet highly automatable tasks that traditionally fall on the essential yet often overlooked ‘glue roles’ in health care — such as schedulers, medical assistants, case managers and care navigators," he said.

"It's crucial to set realistic expectations about [AI's] performance.'

"These professionals are crucial in holding together processes that are directly tied to clinical outcomes, yet spend a substantial portion of their time on administrative tasks like parsing faxes, summarizing notes and securing necessary documentation."

CLICK HERE TO SIGN UP FOR OUR HEALTH NEWSLETTER

In automating these tasks, Atashroo suggested that generative AI could help improve the efficiency and effectiveness of clinical care.

"When considering the deployment of generative AI, it's crucial to set realistic expectations about its performance," he said. 

"The standard cannot always be perfection, as even the humans currently performing these tasks are not infallible."

people and doctors walking around a hospital

"The standard cannot always be perfection, as even the humans currently performing these tasks are not infallible," an AI expert said. (iStock)

In some scenarios, he suggested, AI could help serve as a "safety net" to catch any oversights of team members.

Tasks may sometimes go unaddressed "simply because there isn't enough time to tackle them," Atashroo noted.

"Generative AI can help manage cases more consistently than our current capacity allows."

"When considering the deployment of generative AI, it's crucial to set realistic expectations about its performance."

Safety and efficacy are "paramount" in AI applications, the doctor also noted.

"This means not only developing models with rigorous quality checks, but also incorporating regular assessments by human experts to validate their performance," he said. 

CLICK HERE TO GET THE FOX NEWS APP

"This dual-layer verification ensures that our AI solutions are both responsible and reliable before they are scaled."

Atashroo also noted that "transparency in the development and implementation of AI technologies is essential in building trust among hospital partners and patients."

For more Health articles, visit www.foxnews.com/health.