Analysis of ChatGPT's Generative AI Technology

The Unsung Heroes Behind ChatGPT: Chips, Domestic GPU, CPU, FPGA, AI Chips, and Optics Modules Industry Chain Gearing Up.

ChatGPT, and the World of Generative AI

In December 2022, computational biologists Casey Greene and Milton Pividori embarked on an unusual experiment. They enlisted the help of a non-scientific assistant to improve three research papers. Their diligent assistant suggested modifications to various parts of the documents in a matter of seconds, requiring only about five minutes to review each manuscript. In one biology paper, the assistant even identified an error in a mathematical equation. While the process wasn't always smooth, the final manuscripts were easier to read, cost-effective, and cost less than $0.50 per document.

However, this assistant was not a human but rather an artificial intelligence (AI) algorithm called GPT-3, which was first released in 2020. It is one of the widely-publicized generative AI chatbot-style tools capable of producing convincing and coherent text for tasks such as composing prose, poetry, computer code, or editing research papers.

Among these tools, the most prominent ones, known as Large Language Models (LLMs), include ChatGPT. It gained significant recognition after its release in November 2022 for being free and easily accessible. Unlike some other generative AIs, ChatGPT focuses on text generation and lacks the capability to produce images or audio.

"I'm very impressed," said Pividori, who works at the University of Pennsylvania in Philadelphia. "It will help us improve the efficiency of researchers." Other scientists have reported frequent use of LLMs in their work, not just for editing manuscripts but also for assisting in code writing, code verification, and collaborative brainstorming.

"I use LLMs every day now," said Hafsteinn Einarsson, a computer scientist at the University of Iceland. He started with GPT-3 but switched to ChatGPT, which helps him create presentation slides, grade exams, complete coursework, and transform student papers into research articles. "Many people use it as a digital secretary or assistant," he added.

LLMs are becoming integrated into search engines, code-writing assistants, and even chatbots, allowing them to negotiate for better product prices with other company chatbots. OpenAI, the creator of ChatGPT, introduced a subscription service for ChatGPT at $20 per month, promising faster response times and priority access to new features (though a free trial version is still available). Tech giant Microsoft, which has already invested in OpenAI, announced further investments in January 2023, reportedly amounting to around $10 billion. LLMs are destined to become a part of mainstream text and data processing software, and the future of generative AI seems assured in society, especially as today's tools represent only the nascent stage of this technology.

Nevertheless, LLMs have raised widespread concerns, ranging from their tendency to generate falsehoods to worries about people passing off AI-generated text as their own. When asked about the potential applications of chatbots like ChatGPT, especially in the scientific field, researchers' excitement is mixed with apprehension. "If you believe this technology has transformative potential, then I think you have to be nervous about it," said Casey Greene, from the University of Colorado School of Medicine. Researchers believe that much will depend on future regulations and guidelines to restrict the use of AI chatbots.

Fluent but Not Always Accurate

Some researchers believe that LLMs are highly useful for expediting tasks like writing papers or grant proposals, as long as there is human oversight. "Scientists won't sit down to write lengthy introductions for grant applications anymore," said Almira Osmanovic Thunström, a neurobiologist at the Sahlgrenska University Hospital in Gothenburg, Sweden. She co-authored a manuscript using GPT-3 as part of an experiment and noted that they simply instruct the system to do so.

Tom Tumiel, a research engineer at the London-based software consulting firm InstaDeep, uses LLMs daily as assistants to help write code. "It's almost like a better Stack Overflow," he said, referring to the popular community website where programmers answer each other's questions.

However, researchers emphasize that LLMs are inherently unreliable when it comes to answering questions and can sometimes produce incorrect responses. "We need to remain vigilant when using these systems to generate knowledge," Osmanovic Thunström said.

This unreliability is rooted in how LLMs are constructed. ChatGPT and its competitors operate by learning language patterns from massive online text databases, including any unrealistic, biased, or outdated knowledge. When LLMs receive prompts (such as requests to rewrite sections of a manuscript, as in the case of Greene and Pividori), they simply generate text that appears reasonable within the context.

As a result, LLMs can easily produce errors and misleading information, especially for technical subjects where they may have limited training data. LLMs also cannot disclose the sources of their information. If asked to write an academic paper, they might fabricate fictitious citations. "One cannot trust the tool to handle facts correctly or generate reliable references," noted a January 2023 editorial in Nature Machine Intelligence focusing on ChatGPT.

Despite these caveats, ChatGPT and other LLMs can serve as effective assistants to researchers who possess sufficient domain knowledge to identify issues or easily verify answers, such as explanations of computer code or the correctness of suggestions.

However, these tools have the potential to mislead uninformed users. For instance, in December 2022, Stack Overflow temporarily banned the use of ChatGPT because site moderators were inundated with numerous incorrect but seemingly convincing answers generated by enthusiastic users. This could be a nightmare scenario for search engines.

Addressing the Shortcomings

Some search engine tools, like the research-centric Elicit, attempt to address the attribution problem associated with LLMs by first guiding queries to relevant literature and then providing concise summaries of each website or document found by the engine. This approach aims to produce output that explicitly cites content, although LLMs might still inaccurately summarize individual documents.

Companies developing LLMs are also aware of these issues. In September 2022, Google subsidiary DeepMind published a paper on a "dialog agent" called Sparrow. Demis Hassabis, CEO and co-founder of the company, later told Time magazine that the paper outlined plans to incorporate source attribution capabilities, which Google aims to release in the near future. Other competitors, such as Anthropic, claim to have addressed some of ChatGPT's issues.

Some scientists argue that ChatGPT currently lacks specialized content training and cannot be relied upon for technical subjects. Kareem Carr, a biostatistics doctoral student at Harvard University in Cambridge, Massachusetts, found it challenging when he experimented with it at work. "I think ChatGPT struggles to reach the specificity level I need," he said. However, Carr noted that when he asked ChatGPT for 20 ways to solve a research problem, it provided both nonsense and a helpful idea - a statistical term he had never heard of, which led him to a new area of academic literature.

Some tech companies are training chatbots based on scientific literature - though they also encounter their own issues. In November 2022, Meta, the parent company of Facebook, launched Meta Research Engine (MRE), a research-specific search engine aimed at scientists. The engine helps scientists retrieve papers and read PDFs more easily. It can also suggest relevant papers when given a research question. However, it lacks a sophisticated AI writing assistant like ChatGPT.

Ethical and Legal Dilemmas

While LLMs have the potential to enhance scientific productivity, they also bring forth ethical and legal dilemmas. Scientists fear the potential for fraudulent and unethical practices, such as submitting AI-generated manuscripts to journals or using AI to automate the creation of fake peer reviews. Although some believe that current regulations and peer review processes would likely catch such misconduct, others are concerned that the sheer volume of manuscripts and reviews could overwhelm traditional quality control systems.

"It's a very complicated problem," said James Heathers, an applied physiologist at the University of Sydney in Australia. "As these things get better and better, the baseline is going to be, 'Did you use a language model to help you? Yes or no?' But that's not particularly informative. The question is, 'What did you use it for?'"

Concerns also extend to intellectual property. When ChatGPT produces text, it is effectively a machine output, which may or may not be subject to copyright. The question of ownership becomes murky, particularly when AI is involved in the creative process. Some legal experts predict that AI-generated content may be subject to a new legal framework to determine ownership rights.

Data privacy is another issue. Although ChatGPT and other LLMs do not store user data, the input data, prompts, and outputs can be recorded by the companies developing these systems. Researchers have questioned whether the data from scientific queries might be monetized or used for other purposes.

Future Prospects and Challenges

Generative AI, as exemplified by ChatGPT and its contemporaries, represents a double-edged sword in scientific research. While it can significantly accelerate tasks like writing papers, proposals, and code, it also introduces risks of errors, inaccuracies, and ethical dilemmas.

As these tools become more integrated into the scientific workflow, researchers will need to adopt best practices for their use, which may involve careful validation, fact-checking, and oversight. Regulatory bodies, academic institutions, and publishers may need to develop guidelines and mechanisms to ensure the responsible use of generative AI in research.

Moreover, researchers and developers must continue working on improving the reliability, accuracy, and ethical aspects of LLMs. This includes refining their training data, addressing biases, enhancing source attribution, and providing users with more control and transparency over the generative process.

In conclusion, ChatGPT and other generative AI models hold great promise for revolutionizing scientific research and other fields. However, their potential must be harnessed responsibly to avoid pitfalls and ensure that they contribute positively to the advancement of knowledge and innovation.