DeepSeek Chat

2023 年 12 月 1 日

DeepSeek Chat, the latest rival to ChatGPT in China, has introduced a 67B model.

December 1, 2023 9:13 AM

As ChatGPT celebrates its first birthday this week, Chinese startup DeepSeek AI is challenging its dominance with its own conversational AI offering: DeepSeek Chat.

Launched as an alpha version, DeepSeek Chat aims to provide a more advanced and natural language processing system. With a massive 67B model, it is designed to handle a wide range of conversational tasks.

DeepSeek AI believes that their model can outperform ChatGPT in terms of language generation and understanding. They have trained their model on a large-scale dataset and have fine-tuned it to ensure high-quality conversational capabilities.

To compete with ChatGPT, DeepSeek Chat offers a range of features, including multi-turn dialogue, context awareness, and knowledge integration. It also aims to provide a user-friendly interface and smooth user experience.

DeepSeek AI has already gained attention and support from investors, and they are confident that DeepSeek Chat will be able to compete with ChatGPT in the Chinese market. They believe that their advanced conversational AI technology will have various applications, including customer service, virtual assistants, and more.

As the AI industry continues to grow, competition in the conversational AI space is intensifying. DeepSeek AI's entry into the market with DeepSeek Chat signifies the increasing demand for advanced conversational AI solutions in China. With its 67B model, DeepSeek Chat aims to set new standards in the field of natural language processing and revolutionize the way we interact with AI systems. 测试，助手击键7B和67B参数DeepSeek LLMs，这些模型在包含2000亿个英文和中文令牌的数据集上进行训练。根据基准测试，这两个模型在各种评估中表现出色，包括编码和数学，并且与Meta著名的Llama 2-70B相匹配（有时甚至超越）。

这一消息标志着中国另一家参与人工智能竞赛的公司进入市场，此前已有Qwen、01.AI和百度发布了相关产品。DeepSeek表示，他们已经开源了这些模型，包括基础版本和指令调整版本，以促进学术界和商业界的进一步研究。

该公司成立于几个月前，出于好奇心来解开人工通用智能的奥秘，同时也允许在一定条件下进行商业使用。

VB事件 #

AI Impact T DeepSeek Chat是通过网页界面 (opens new window)（类似于ChatGPT）进行访问的，用户可以登录并与模型进行各种任务的交互。此界面只提供67B版本。

据该公司表示，它的两个模型都使用了与Llama相同的自回归变压器解码器架构构建，但它们的推理方法不同。较小的模型使用了多头注意力（MHA），在并行运行几次的注意力机制下进行运算，而较大的模型则利用了分组查询注意力（GQA）来产生结果。

“7B模型的训练使用了批量大小为2304和学习率为4.2e-4，而67B模型的训练使用了批量大小为4608和学习率为3.2e-4。我们采用了多步骤的训练方法，其中包括预训练和微调过程。”

DeepSeek Chat的目标是提供一个灵活的工具，可以用于各种任务，包括问答、对话、编程等。它可以用于构建应用程序、增强现有产品或为用户提供更好的体验。

DeepSeek Chat的开发团队对模型进行了广泛的测试和验证，以确保它在各种情况下都能提供准确、有用的回答。然而，他们也指出，该模型可能存在一些限制和缺陷，并鼓励用户通过提供反馈来帮助改进。

如果您想了解更多关于DeepSeek Chat和LLMs的信息，可以参加VentureBeat的AI Impact Tour活动，与企业AI社区进行交流。活动将在您所在城市举行！

了解更多 (opens new window) 我们的培训过程中有一个赚取率计划。学习率从2000个预热步骤开始，然后在1.6万亿个令牌时调整到最大值的31.6%，在1.8万亿个令牌时调整到最大值的10%，”它在模型的GitHub页面 (opens new window)上写道。

在测试中，DeepSeek LLM 67B Base展现出了卓越的综合能力，在推理、编码、数学和中文理解等领域表现优于Llama2 70B Base。实际上，唯一一个Llama做得稍微好一些的基准是5-shot trivia QA（79.5比78.9）。

该模型的聊天版本，在额外的指令数据上进行了微调，在以前从未见过的测试中也表现出色。

例如，在编码的HumanEval pass@1中，它得分73.78，在数学的GSM8K 0-shot中，它得分84.1，位居GPT-4和Anthropic的Claude 2 (opens new window)之后。

也就是说，尽管Llama在某些基准测试中表现稍微好一些，DeepSeek LLM 67B Base在多个领域展示出了优越的能力。尽管在基准测试中表现出色，但DeepSeek模型似乎在某种程度上受到审查。在X上的一篇帖子中，用户指出当原始问题涉及中国时，助手的回答会自动被删除。相反，模型会显示一个消息，称内容因安全原因而被“撤回”。目前还不清楚基础模型是否也包含这样的过滤器。

中国的LLMs在开放互联网上将面临很大困难。

我提了一个非常无害的问题：“我想了解现代中国。”系统开始打印出一段回答，几秒钟后自动被审查，尽管内容相当平淡。

LLMs的各种规模的推出标志着又一个值得注意的举措。中国在人工智能领域迅速发展，扩大了该国的产品覆盖范围，涵盖了所有热门的模型尺寸，为广大终端用户提供服务。

近几个月宣布的一些通用人工智能产品包括百度的Ernie 4.0，01.AI的Yi 34B和Qwen的1.8B、7B、14B和72B模型。

更有趣的是，其中一些模型的性能甚至超过了它们较大的对应模型，包括Yi 34B。

如果一个小模型能够匹敌或超过一个较大的模型，就像Yi 34B挑战Llama-2-70B和Falcon-180B一样，企业可以实现显著的效率提升。他们可以节省计算资源，同时以相同的效果目标下游使用案例。

就在一个星期前，微软也在同一领域分享了他们的工作，发布了Orca 2模型。 VentureBeat的使命是成为技术决策者获取有关变革性企业技术知识和进行交易的数字广场。【了解我们的简报。】(https://venturebeat.com/newsletters/?utm_source=VBsite&utm_medium=bottomBoilerplate)