Machine Learning

19 readers
1 users here now

This subreddit is temporarily closed in protest of Reddit killing third party apps, see /r/ModCoord and /r/Save3rdPartyApps for more information.

founded 1 year ago
MODERATORS
26
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Seankala on 2024-04-01 05:12:58.


I recently made custom BERT and ELECTRA models for the fashion domain that could also handle English and my own native language (I'm not in the US). I noticed that performance wasn't as good as I anticipated and felt that it wasn't worth it.

Are there any papers or resources regarding when it's worth it to create your own pre-trained LM from scratch? I recall reading a paper for the biomedical domain a long time ago titled Pretrained Language Models for Biomedical and Clinical Tasks: Understanding and Extending the State-of-the-Art (Lewis et al., 2020) that seems to show that pre-training from scratch can help with biomedical and clinical tasks but am not sure if there are any other papers out there.

Also, are there any tips or good-to-know things when assessing a newly pre-trained LM? For example, checking OOV rate etc.

Thanks in advance.

27
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/ninvibe on 2024-03-31 18:51:23.


Hey guys, what do hiring managers in companies would prefer more from your experience, having a great implementation of papers or great practical projects? I know both have great benefits, pros and cons etc. But, what do managers here on reddit like to see when going through repos? Would one of these be better than the other when going through the skills of a candidate?

28
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/ghosthamlet on 2024-03-31 06:32:28.


Post:

Here we’ll discuss:

The advantages (and disadvantages) of Mamba (🐍) vs Transformers (🤖),

Analogies and intuitions for thinking about Mamba, and

What Mamba means for Interpretability, AI Safety and Applications.

This post was originally posted on Kola's personal blog.

29
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/we_are_mammals on 2024-03-31 06:06:43.


... In a presentation earlier this month, the venture-capital firm Sequoia estimated that the AI industry spent $50 billion on the Nvidia chips used to train advanced AI models last year, but brought in only $3 billion in revenue.

Source: WSJ (paywalled)

30
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Amgadoz on 2024-03-30 21:23:22.


Hey everyone!

I recently compared all the open source whisper-based packages that support long-form transcription.

Long-form transcription is basically transcribing audio files that are more than 30 seconds.

This can be useful if you want to chat with a youtube video or podcast etc.

I compared the following packages:

  1. OpenAI's official whisper package
  2. Huggingface Transformers
  3. Huggingface BetterTransformer
  4. FasterWhisper
  5. WhisperX
  6. Whisper.cpp

I compared between them in the following areas:

  1. Accuracy - using word error rate (wer) and character error rate (cer)
  2. Efficieny - using vram usage and latency

I've written a detailed blog post about this. If you just want the results, here they are:

I hope you find it useful!

31
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/DIAMBRA_AIArena on 2024-03-30 13:49:08.

32
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/crp1994 on 2024-03-30 13:01:59.

33
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Responsible-Win3865 on 2024-03-30 07:36:50.

34
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/milaworld on 2024-03-30 06:13:48.


forbes article:

archive no paywall:

How Stability AI’s Founder Tanked His Billion-Dollar Startup

Mar 29, 2024

Stability AI founder Emad Mostaque took the stage last week at the Terranea Resort in Palos Verdes, California to roaring applause and an introduction from an AI-generated Aristotle who announced him as “a modern Prometheus” with “the astuteness of Athena and the vision of Daedalus.”

“Under his stewardship, AI becomes the Herculean force poised to vanquish the twin serpents of illness and ailment and extend the olive branch of longevity,” the faux Aristotle proclaimed.

“I think that’s the best intro I’ve ever had,” Mostaque said.

But behind Mostaque's hagiographic introduction lay a grim and fast metastasizing truth. Stability, once one of AI’s buzziest startups, was floundering. It had been running out of money for months and Mostaque had been unable to secure enough additional funding. It had defaulted on payments to Amazon whose cloud service undergirded Stability’s core offerings. The star research team behind its flagship text-to-image generator Stable Diffusion had tendered their resignations just three days before — as Forbes would first report — and other senior leaders had issued him an ultimatum: resign, or we walk too.

Still, onstage before a massive audience of peers and acolytes, Mostaque talked a big game. “AI is jet planes for the mind,” he opined. “AI is our collective intelligence. It's the human Colossus.” He claimed a new, faster version of the Stable Diffusion image generator released earlier this month could generate “200 cats with hats per second.” But later, when he was asked about Stability’s financial model, Mostaque fumbled. “I can’t say that publicly,” he replied. “But it’s going well. We’re ahead of forecast.”

Four days later, Mostaque stepped down as CEO of Stability, as Forbes first reported. In a post to X, the service formerly known as Twitter, he claimed he’d voluntarily abdicated his role to decentralize “the concentration of power in AI.” But sources told Forbes that was hardly the case. Behind the scenes, Mostaque had fought to maintain his position and control despite mounting pressure externally and internally to step down. Company documents and interviews with 32 current and former employees, investors, collaborators and industry observers suggest his abrupt exit was the result of poor business judgment and wild overspending that undermined confidence in his vision and leadership, and ultimately kneecapped the company.

Mostaque, through his attorneys, declined to comment on record on a detailed list of questions about the reporting in this story. But in an email to Forbes earlier this week he broadly disputed the allegations. “Nobody tells you how hard it is to be a CEO and there are better CEOs than me to scale a business,” he said in a statement. “I am not sure anyone else would have been able to build and grow the research team to build the best and most widely used models out there and I’m very proud of the team there. I look forward to moving onto the next problem to handle and hopefully move the needle.”

In an emailed statement, Christian Laforte and Shan Shan Wong, the interim co-CEOs who replaced Mostaque, said, "the company remains focused on commercializing its world leading technology” and providing it “to partners across the creative industries."

After starting Stability in 2019, Mostaque built the company into an early AI juggernaut by seizing upon a promising research project that would become Stable Diffusion and funding it into a business reality. The ease with which the software generated detailed images from the simplest text prompts immediately captivated the public: 10 million people used it on any given day, the company told Forbes in early 2023. For some true believers, Mostaque was a crucial advocate for open-source AI development in a space dominated by the closed systems of OpenAI, Google and Anthropic.

But his startup’s rise to one of the buzziest in generative AI was in part built on a series of exaggerations and misleading claims, as Forbes first reported last year (Mostaque disputed some points at the time). And they continued after he raised $100 million at a $1 billion valuation just days after launching Stable Diffusion in 2022. His failure to deliver on an array of grand promises, like building bespoke AI models for nation states, and his decision to pour tens of millions into research without a sustainable business plan, eroded Stability’s foundations and jeopardized its future.

"He was just giving shit away,” one former employee told Forbes. “That man legitimately wanted to transform the world. He actually wanted to train AI models for kids in Malawi. Was it practical? Absolutely not."

By October 2023, Stability would have less than $4 million left in the bank, according to an internal memo prepared for a board meeting and reviewed by Forbes. And mounting debt, including months of overdue Amazon Web Services payments, had already left it in the red. To avoid legal penalties for skipping Americans staff’s payroll, the document explained, the London-based startup was considering delaying tax payments to the U.K. government.

It was Stability’s armada of GPUs, the wildly powerful and equally expensive chips undergirding AI, that were so taxing the company’s finances. Hosted by AWS, they had long been one of Mostaque’s bragging points; he often touted them as one of the world’s 10 largest supercomputers. They were responsible for helping Stability’s researchers build and maintain one of the top AI image generators, as well as break important new ground on generative audio, video and 3D models. “Undeniably, Stability has continued to ship a lot of models,” said one former employee. “They may not have profited off of it, but the broader ecosystem benefitted in a huge, huge way.”

But the costs associated with so much compute were now threatening to sink the company. According to an internal October financial forecast seen by Forbes, Stability was on track to spend $99 million on compute in 2023. It noted as well that Stability was “underpaying AWS bills for July (by $1M)” and “not planning to pay AWS at the end of October for August usage ($7M).” Then there were the September and October bills, plus $1 million owed to Google Cloud and $600,000 to GPU cloud data center CoreWeave. (Amazon, Google and CoreWeave declined to comment.)

With an additional $54 million allocated to wages and operating expenses, Stability’s total projected costs for 2023 were $153 million. But according to its October financial report, its projected revenue for the calendar year was just $11 million. Stability was on track to lose more money per month than it made in an entire year.

The company’s dire financial position had thoroughly soured Stability’s current investors, including Coatue, which had invested tens of millions in the company during its $101 million funding round in 2022. In the middle of 2023, Mostaque agreed to an independent audit after Coatue raised a series of concerns, according to a source with direct knowledge of the matter. The outcome of the investigation is unclear. Coatue declined to comment.

Within a week of an early October board meeting where Mostaque shared that financial forecast, Lightspeed Venture Partners, another major investor, sent a letter to the board urging them to sell the company. The distressing numbers had “severely undermined” the firm’s confidence in Mostaque’s ability to lead the company.

“In particular, we are surprised and deeply concerned by a cash position just now disclosed to us that is inconsistent with prior discussions on this topic,” Lightspeed’s general counsel Brett Nissenberg wrote in the letter, a copy of which was viewed by Forbes. “Lightspeed believes that the company is not likely financeable on terms that would assure the company’s long term sound financial position.” (Lightspeed declined a request for comment.)

The calls for a sale led Stability to quietly begin looking for a buyer. Bloomberg reported in November that Stability approached AI startups Cohere and Jasper to gauge their interest. Stability denied this, and Jasper CEO Timothy Young did the same when reached for comment by Forbes. A Cohere representative declined to comment.

But one prominent AI company confirmed that Mostaque’s representatives had reached out to them to test the waters. Those talks did not advance because “the numbers didn’t add up,” this person, who declined to be named due to the confidential nature of the talks, told Forbes. Stability also tried to court Samsung as a buyer, going so far as to redecorate its office in advance of a planned meeting with the Korean electronics giant. (Samsung said that it invested in Stability in 2023 and that it does not comment on M&A discussions.)

Coatue had been calling for Mostaque’s resignation for months, according to a source with direct knowledge. But it and other investors were unable to oust him because he was the company’s majority shareholder. When they tried a different tact by rallying other investors to offer him a juicy equity package to resign, Mostaque refused, said two sources. By October, Coatue and Lightspeed had had enough. Coatue left the board and Lightspeed resigned its observer s...


Content cut off. Read original on https://old.reddit.com/r/MachineLearning/comments/1br9vxr/n_how_stability_ais_founder_tanked_his/

35
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Aggressive-Plate6873 on 2024-03-29 22:05:02.


Is the direction of this inequality wrong? Looks like standard ELBO but with the wrong direction.

(page 2, )

thanks!

36
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/bnqj on 2024-03-29 13:04:29.


Yes, with VAPE - Vector Addition Positional Encoding.

I’ve been exploring a new approach to positional encoding that I’m calling VAPE - Vector Addition Positional Encoding.

The Method:

  • borrow some number of channels from queries and keys,
  • run a cumulative (prefix) sum across sequence length on these borrowed channels (add vectors together),
  • normalize - divide by the square root of the vector's magnitude,
  • we now have position-aware channels,
  • so concatenate them back to queries and keys.

What’s intriguing is that this method can work effectively with just a single channel per head. Using a single channel means that we're running prefix sum on scalars not vectors and the method still works.

VAPE features:

  • No Extra Parameters: VAPE introduces positional information without adding any new parameters to the model, preserving its simplicity and efficiency.
  • Performance: Early tests indicate that VAPE outperforms methods like RoPE in final perplexity.
  • Extrapolation: Early tests suggest that VAPE can extrapolate beyond training context length quite nicely, there is no explicit positional information added like in RoPE.
  • Compatibility with Flash Attention: It's fully compatible with Flash Attention.
  • Efficiency: By leveraging just a small number of channels for positional encoding, VAPE maintains model efficiency.
  • Inference Speed: VAPE caches the last positional states for queries and keys - it's a bit like SSM/RNN, you only need the last state to compute the next.

Seeking Your Insight:

  • What benchmarks or specific comparisons would best demonstrate VAPE's value to you?
  • Do you know of any methods similar to VAPE?

Benchmarks:

I've run some early tests that look very promising for causal language modeling tasks, but I have quite limited resources for doing benchmarks, thus before I put any effort into them I think it's better to ask the community how to go about it.

37
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Conscious_Giraffe453 on 2024-03-29 04:56:18.


Not sure if this is a “career question” as per the rules but I was recently asked this interview question:

In an F1 car race with 10 cars, how would you calculate/predict the probability of the second- place car overtaking the first-place car? What algorithms, data, and models are needed for this calculation? Explain each step.

How would you answer this? (No other information is given)

38
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/ghosthamlet on 2024-03-29 04:39:56.


Post:

We are thrilled to announce Jamba, the world’s first production-grade Mamba based model. By enhancing Mamba Structured State Space model (SSM) technology with elements of the traditional Transformer architecture, Jamba compensates for the inherent limitations of a pure SSM model. Offering a 256K context window, it is already demonstrating remarkable gains in throughput and efficiency—just the beginning of what can be possible with this innovative hybrid architecture. Notably, Jamba outperforms or matches other state-of-the-art models in its size class on a wide range of benchmarks.

39
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/dxtros on 2024-03-28 19:55:31.


Abstract: We demonstrate a technique which allows to dynamically adapt the number of documents in a top-k retriever RAG prompt using feedback from the LLM. This allows a 4x cost reduction of RAG LLM question answering while maintaining the same level of accuracy. We also show that the method helps explain the lineage of LLM outputs. The reference implementation works with most models (GPT4, many local models, older GPT-3.5 turbo) and can be adapted to work with most vector databases exposing a top-k retrieval primitive.

Blog paper:

Reference implementation:

40
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/aadityaura on 2024-03-28 18:32:07.


Stanford releases #BioMedLM, a 2.7B parameter language model trained on biomedical data. However, the results do not seem to make sense.

Here is the evaluation report using the LM Evaluation Harness framework on MultiMedQA (MedMCQA, MedQA, MMLU, PubMed).

41
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Thomjazz on 2024-03-28 17:26:57.


I finally recorded this lecture I gave two weeks ago because people kept asking me for a video.

So here it is, I hope you'll enjoy it "A Little guide to building Large Language Models in 2024".

I tried to keep it short and comprehensive – focusing on concepts that are crucial for training good LLM but often hidden in tech reports.

In the lecture, I introduce the students to all the important concepts/tools/techniques for training good performance LLM:- finding, preparing and evaluating web scale data- understanding model parallelism and efficient training- fine-tuning/aligning models- fast inference

There is of course many things and details missing and that I should have added to it, don't hesitate to tell me you're most frustrating omission and I'll add it in a future part. In particular I think I'll add more focus on how to filter topics well and extensively and maybe more practical anecdotes and details.

Now that I recorded it I've been thinking this could be part 1 of a two-parts series with a 2nd fully hands-on video on how to run all these steps with some libraries and recipes we've released recently at HF around LLM training (and could be easily adapted to your other framework anyway):

  • datatrove for all things web-scale data preparation:
  • nanotron for lightweight 4D parallelism LLM training:
  • lighteval for in-training fast parallel LLM evaluations:

Here is the link to watch the lecture on Youtube: And here is the link to the Google slides:

Enjoy and happy to hear feedback on it and what to add, correct, extend in a second part.

42
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/we_are_mammals on 2024-03-28 16:04:00.


DeepMind just published a paper about fact-checking text:

The approach costs $0.19 per model response, using GPT-3.5-Turbo, which is cheaper than human annotators, while being more accurate than them:

They use this approach to create a factuality benchmark and compare some popular LLMs.

Paper and code:

43
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/TheLastMate on 2024-03-28 03:29:11.

44
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/deadknxght on 2024-03-28 00:22:56.


Title

45
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/DocBrownMS on 2024-03-27 11:29:36.


Hey all, I've recently published a tutorial at Towards Data Science that explores a somewhat overlooked aspect of Retrieval-Augmented Generation (RAG) systems: the visualization of documents and questions in the embedding space:

While much of the focus in RAG discussions tends to be on the algorithms and data processing, I believe that visualization can help to explore the data and to gain insights into problematic subgroups within the data.

This might be interesting for some of you, although I'm aware that not everyone is keen on this kind of visualization. I believe it can add a unique dimension to understanding RAG systems.

46
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/CheekProfessional146 on 2024-03-27 13:11:31.


Project:

A transformer-based hybrid multimodal model, various transformer models address different problems in the field of music information retrieval, these models generate corresponding information dependencies that mutually influence each other.

An AI-powered multimodal project focused on music, generate chords, beats, lyrics, melody, and tabs for any song.

47
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/artificial_intelect on 2024-03-27 14:35:33.


Shill disclaimer: I was the pretraining lead for the project

DBRX deets:

  • 16 Experts (12B params per single expert; top_k=4 routing)
  • 36B active params (132B total params)
  • trained for 12T tokens
  • 32k sequence length training
48
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Data_Nerd1979 on 2024-03-27 04:49:41.


"The most obvious advantage of synthetic data is that it contains no personally identifiable information (PII). Consequently, it doesn’t pose the same cybersecurity risks as conventional data science projects. However, the big question for machine learning is whether this information is reliable enough to produce functioning ML models."

Very informative blog regarding Using Synthetic Data in Machine Learning, source here

49
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/MuscleML on 2024-03-27 01:13:41.


What are some optimizations that one could use for the data loader in PyTorch? The data type could be anything. But I primarily work with images and text. We know you can define your own. But does anyone have any clever tricks to share? Thank you in advance!

50
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/EDEN1998 on 2024-03-26 18:55:26.


Discussion thread of ACL 2024 (ARR Feb) reviews.

I got 3, 3, 4 for soundness. How about you guys?

view more: ‹ prev next ›