this post was submitted on 10 Feb 2024
2 points (75.0% liked)

Beyond app

129 readers
1 users here now

Suppport on Patreon: https://patreon.com/beyond_for_lemmy (thank you!)

Beyond is an upcoming Android and iOS client for Lemmy.

We are aiming to make Beyond to be the best way to experience Lemmy out there.

The alpha version is available for open testing on Google Play, feel free to install and provide your suggestions and feedback!

We use this public board to keep track of the roadmap and community suggestions: https://www.notion.so/brunofinger/Beyond-45cabaae7f724cd5ad2b77d902e9a97e?pvs=4

Twin community: https://lemmy.world/c/beyond

founded 1 year ago
MODERATORS
 

https://arxiv.org/abs/2402.03300

Abstract

Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO), that enhances mathematical reasoning abilities while concurrently optimizing the memory usage of PPO.

https://twitter.com/deepseek_ai/status/1754701472363958581

🚀 DeepSeekMath: Approaching Mathematical Reasoning Capability of GPT-4 with a 7B Model.

Highlights:

Continue pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math tokens from Common Crawl.

Introduce GRPO, a variant of PPO, that enhances mathematical reasoning and reduces training resources.

More Details:https://arxiv.org/abs/2402.03300

Model Download:https://huggingface.co/deepseek-ai

GitHub Repo:https://github.com/deepseek-ai/DeepSeek-Math

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here