Machine Learning

1

[R] The Missing U for Efficient Diffusion Models (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Successful-Western27 on 2024-04-09 03:26:11.

Anew paper proposes replacing the standard discrete U-Net architecture in diffusion models with a continuous U-Net leveraging neural ODEs. This reformulation enables modeling the denoising process continuously, leading to significant efficiency gains:

Up to 80% faster inference
75% reduction in model parameters
70% fewer FLOPs
Maintains or improves image quality

Key technical contributions:

Dynamic neural ODE block modeling latent representation evolution using second-order differential equations
Adaptive time embeddings to condition dynamics on diffusion timesteps
Efficient ODE solver and constant-memory adjoint method for faster, memory-efficient training

The authors demonstrate these improvements on image super-resolution and denoising tasks, with detailed mathematical analysis of why the continuous formulation leads to faster convergence and more efficient sampling.

Potential implications:

Makes diffusion models practical for wider range of applications (real-time tools, resource-constrained devices)
Opens up new research directions at intersection of deep learning, differential equations, dynamical systems

Some limitations exist around (1) Added complexity from ODE solver and adjoint method and (2) I think diffusion models still likely to require significant compute even with improvements.

Full summary here. Arxiv here.

TL;DR: New paper proposes replacing discrete U-Nets in diffusion models with continuous U-Nets using neural ODEs, enabling up to 80% faster inference, 75% fewer parameters, and 70% fewer FLOPs while maintaining or improving image quality. Key implications: more efficient and accessible generative models, new research directions in continuous-time deep learning.

2

1

[D] What is your tech stack for research? (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/gokulPRO on 2024-04-08 22:38:54.

I am planning on working on large multiomodal training (1B parameters) for text+audio. As of now I was thinking of going with pytorch, deepspeed, wandb. What do you recommend and what do you use in general for distributed large model training?

Do you use hugginface? I felt it a bit too wrapped that it becomes messy to access the bare backbones, but haven't given it a proper try. For out of shelf models and custom dataset training that does sound useful, but research requires more than that. So How was your experience in terms of research, where you need flexiblity to change the model? And in general whats your tech stack when it comes to research?

3

1

[D] In terms of RAG research, why does it seem like a lot of people aren't working on the retriever? (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Seankala on 2024-04-09 03:38:37.

I'm someone who conducted research in NLP a few years ago, stopped and joined industry, and am recently trying to get back on top of things. I've taken an interested into RAG-related work and have started reading some papers.

My understanding is that for RAG you have the retriever and the generator. For the generator it seems like using various LLMs is standard but the retriever also seems to be set to using something like BM25 or the DPR that was originally used. I would think that the performance of RAG would rely heavily on the retriever but am also a little surprised to see that there doesn't seem to be a lot of research being done in that direction.

Am I just mistaken and haven't looked in the right direction(s)? Or is there a reason why the retriever doesn't seem to be getting as much attention?

Come to think of it, I haven't really seen a lot of work being done for encoder models in general.

4

1

[D] For those of you who have published alone, what was your experience like? (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Seankala on 2024-04-08 16:35:57.

I have some free time on my hands these days and have been trying to catch up on research in my field. I wanted to actually revisit a topic that I was working on during my master's but was never able to come up with a publication. The thing is, I'm not sure how viable that's going to be as a sole author without any real access to resources.

Friends and acquaintances tell me it's possible but extremely difficult. Curious what other people who have successfully done so thought about it.

5

1

[D] Securing Canada’s AI advantage (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/milaworld on 2024-04-08 11:28:13.

(Great news for institutions like Mila!)

Announcement from Prime Minister of Canada:

Securing Canada’s AI advantage

April 7, 2024

Montréal, Quebec

Artificial intelligence (AI) has incredible potential to transform the economy, improve the way we work, and enhance our way of life. The global race to scale up and adopt AI is on, and Canada is at the forefront of this technology. To make sure we can seize every opportunity in the economy of the future, and set every generation up for success, we need to scale up our innovation ambitions. And do it in a way that brings everyone along. For Millennials and Gen Z, who feel their hard work isn’t paying off like it did for previous generations, we must invest in good-paying opportunities that help them get ahead. That’s why we’re focused on creating more good jobs, including in innovation and technology, which are among the highest paying of all industries.

AI is already unlocking massive growth in industries across the economy. Many Canadians are already feeling the benefits of using AI to work smarter and faster. The rapid advance of generative AI today will unlock immense economic potential for Canada, significantly improving productivity and reducing the time workers have to spend on repetitive tasks. Researchers and companies in Canada are also using AI to create incredible new innovations and job opportunities across all facets of the Canadian economy, from drug discovery to energy efficiency to housing innovation. In the past year, job growth in AI increased by nearly one third in Canada – among the highest growth of any sector. And most AI jobs pay well above the average income.

Canada has a world-leading AI ecosystem – from development, to commercialization, to safety. We have an advantage that can make sure Canadian values and Canadian ideas help shape this globally in-demand technology. Canada was the first country in the world to introduce a national AI strategy and has invested over $2 billion since 2017 to support AI and digital research and innovation. Since then, countries around the world have begun investing significant funding and efforts into AI to advance their economies, particularly in computing infrastructure. In order to maintain Canada’s competitive edge, and secure good paying jobs and job security for generations of young Canadians, we must raise the bar.

The Prime Minister, Justin Trudeau, today announced a $2.4 billion package of measures from the upcoming Budget 2024 to secure Canada’s AI advantage. These investments will accelerate job growth in Canada’s AI sector and beyond, boost productivity by helping researchers and businesses develop and adopt AI, and ensure this is done responsibly.

These measures include:

Investing $2 billion to build and provide access to computing capabilities and technological infrastructure for Canada’s world-leading AI researchers, start-ups, and scale-ups. As part of this investment, we will soon be consulting with AI stakeholders to inform the launch of a new AI Compute Access Fund to provide near-term support to researchers and industry. We will also develop a new Canadian AI Sovereign Compute Strategy to catalyze the development of Canadian-owned and located AI infrastructure. Ensuring access to cutting-edge computing infrastructure will attract more global AI investment to Canada, develop and recruit the best talent, and help Canadian businesses compete and succeed on the world stage.
Boosting AI start-ups to bring new technologies to market, and accelerating AI adoption in critical sectors, such as agriculture, clean technology, health care, and manufacturing, with $200 million in support through Canada’s Regional Development Agencies.
Investing $100 million in the NRC IRAP AI Assist Program to help small and medium-sized businesses scale up and increase productivity by building and deploying new AI solutions. This will help companies incorporate AI into their businesses and take on research, product development, testing, and validation work for new AI-based solutions.
Supporting workers who may be impacted by AI, such as creative industries, with $50 million for the Sectoral Workforce Solutions Program, which will provide new skills training for workers in potentially disrupted sectors and communities.
Creating a new Canadian AI Safety Institute, with $50 million to further the safe development and deployment of AI. The Institute, which will leverage input from stakeholders and work in coordination with international partners, will help Canada better understand and protect against the risks of advanced or nefarious AI systems, including to specific communities. (Edit: I'm skeptical of this one.)
Strengthening enforcement of the Artificial Intelligence and Data Act, with $5.1 million for the Office of the AI and Data Commissioner. The proposed Act aims to guide AI innovation in a positive direction to help ensure Canadians are protected from potential risks by ensuring the responsible adoption of AI by Canadian businesses.

Today’s announcement is about investing in innovation and economic growth to secure Canada’s world-leading AI advantage today and for generations to come. This will create good-paying opportunities for every generation, boost innovation across the economy, raise productivity, and accelerate economic growth – and it’s just one of the things that we are going to be doing in Budget 2024. Alongside these measures, we’re building more homes faster, ensuring every kid has the food they need, investing in health care, making life more affordable, and creating good jobs to make sure every generation can get ahead.

6

1

[R] A* venue workshop paper vs lower-rated venue conference paper (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/howtorewriteaname on 2024-04-07 23:32:36.

NeurIPS24 is nearby and I've got a paper that got rejected last year at ICLR (5/5/6/3). While I'm addressing the feedback from last conference (method was received positively, but they asked for more experimentation), I'm still unsure whether the paper is strong enough to make it to an A* conference such as NeurIPS. Also, to be honest, I've been working on it for almost a year and I feel I want to wrap this up already and look at other ideas.

I was wondering which is better from these two:

Submitting to a workshop at NeurIPS or similar (ICLR, ICML..). I assume this should be doable with my paper given the feedback at ICLR but I don't know if that's correct?
Aiming for a conference paper in a "lower-tier" venue such as AISTATS, IJCAI or similar. I assume this is more difficult to pull off than the workshop paper at NeurIPS but again I'm just guessing?

I am still not a PhD student, but I'm applying for PhDs regularly. Therefore I am kinda looking for the option that (in case my paper goes through) would give me more leverage as a PhD candidate in my future applications.

7

1

[D] Why do people upload their work on Arxiv, not submitting conference? (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Shot-Button-9010 on 2024-04-07 19:31:12.

Purely my curiosity.

I know the case, when a paper got accepted at a conference or journal, and one can upload their work on Arxiv. But my question is, some works are uploaded only on Arxiv. Does it mean the author doesn't want to submit their work at the conference but wants to release their work? Or, do they have another plan for submission after their release?

I'm asking because my recent work got rejected at a conference, but I don't want to delve into this anymore. Do people like the same situation used to upload their abandoned work on Arxiv as well?

8

1

[D] Do we know how Gemini 1.5 achieved 10M context window? (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/papaswamp91 on 2024-04-07 18:21:14.

Do we know how Gemini 1.5 achieved its 1.5M context window? Wouldn’t compute go up quadratically as the attention window expands?

9

1

[P] [D] I have created a beginner-friendly quantum machine learning handbook. (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/-THUNDERBOLT- on 2024-04-07 11:04:09.

Hello everyone, for the past few weeks I have been working on creating a right handholding roadmap for a person who doesn't know any quantum concepts and wants to dive into quantum machine learning. I want your opinions on the content and would be grateful if you could contribute to this project. Hoping to have this handbook for everyone.

here is the GitHub repo link:

here is the hosted link:

10

1

[D] what do you do with paper with no code published (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Muhammad_Gulfam on 2024-04-07 08:26:36.

Many papers introduce their own models, mostly variants of existing one with little changes in the original ones (mostly for specific problems). Most of them don't have code published which makes it very difficult to reproduce the results. In some cases (could be even many cases, I only found/checked some) the experiment configuration is not complete, in the paper.

What do you do with such papers?

How do you argue when people quote these papers?

11

1

Any statisticians who decided on a PhD in CS rather than a PhD in Stats? [D] (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/AdFew4357 on 2024-04-06 20:11:36.

I’m currently an MS stats student. Right now I have been kinda bored of the standard classical statistics I’ve been learning. I initially chose this path because I wanted to set myself up for industry well. I’ve got a data scientist internship for the summer, and I’ve considered working fulltime. However, I do want to pursue research after a few years of work experience, and the question I come back to is whether I want to go for a PhD in Stats or a PhD in CS.

To be frank, my programming skills are very sub par compared to most CS students. My undergraduate was pure math and statistics, and while I did take Python, R, some Java I couldn’t say I’m at the level of a software engineer. I know my math and stats theory well, and can use packages in Python and R to do things effectively, and write functions etc, but if you asked me right now “write a class in Python” I’d probably be stuck cause I never write classes.

I’m no longer really interested in stats PhD programs, because if I were to do a PhD in stats I’d have to spend the first two years doing coursework, which frankly I’m just tired of. I don’t want to spend time proving asymptotic results of the MLE under logit models, or spending a semester learning things like theory of the linear model.

I have an MS in Stats now, and I think I’ve beat stats to death enough.

I found a great deal of interest in an area of deep learning that naturally drew me in coming from a statisticians point of view, which are the advances in time series forecasting.

I have taken time series in stats graduate programs where we learn all the classical methods: arima, sarima, garch, and some nonstationary time series models like state space models. I also have a background in classical nonparametric regression, (statistical learning) as this is the topic of my thesis.

These are very fascinating but I have gotten interested in how CS departments are using deep learning methods to extract information from time series. The old school statistician in me is tired of learning “use the ADF test to verify stationary, fit an arima and sarima model to model this time series, and forecast” and I’m now seeing huge advancements in time series coming from cs departments which I want to be in. Furthermore, since I have also had plenty of experience in applied Bayesian analysis, I think my background on this could also be unique addition. Causal inference is something I’ve dabbled into as well and any aspect of this in DL I’d be interested in giving my input as well.

So for anyone here, was there anyone like me whose background came up through old school statistics, like an MS in Stats, and now made that switch to a PhD in CS to work on more modern topics? I feel my background in fundamental topics like Bayesian inference, time series, statistical learning and causal inference could be something I could add to research in CS.

12

1

[D] How does a MoE router learn when it has made a wrong choice? (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/RepresentativeWay0 on 2024-04-06 17:42:30.

Looking at the code for current mixture of experts models, they seem to use argmax, with k=1 (picking only the top expert) to select the router choice. Since argmax is non differentiable, the gradient cannot flow to the other experts. Thus it seems to me that only the weights of the selected expert will be updated if it performs poorly. However, it could be the case that a different expert was in fact a better choice for the given input, but the router cannot know this because the gradient does not flow to the other experts.

How can the router learn that it has made a wrong choice and use a different expert next time?

13

1

[D] ML researchers who are not in NLP, what are you researching? Please share. (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/20231027 on 2024-04-06 13:12:34.

We would love to know the spectrum of ML research happening.

It would help if you wrote it in as much detail as possible as in what the research actually entails. Thanks!

14

1

[D] I just can't fine tune BERT over 40% accuracy for text-classification task (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Total-Opposite-8396 on 2024-04-06 09:33:16.

Hi everyone, this is the first time I'm fine tuning an LLM and I just can't get over 40% accuracy for the text-classification task.

I'm using BERT from transformers library to load and train the model and peft for LoRA implementation. My data set contains English written summaries of news articles and with each article there is a label such as Economics, Politics, Science, Entertainment, etc... (14 unique labels). The maximum length of summaries can extend up to 250-300 tokens. My training set has 800 examples and validation set has 200 examples.

At first the training loss was reaching very low but the validation loss was not going too low with validation accuracy going maximum up to 45%. Since it was overfitting, I changed dropout rate form 0.1 to 0.5. After that the model is not overfitting now, but it is underfitting, with validation and training loss being almost the same and validation accuracy still reaching 45% max.

I tried removing LoRA implementation but nothing changed, except for the training time. At this point I'm confused as to what should I do. I've tried tuning hyperparameters but nothing changes.

Can anyone help me out in understanding what possibly could I be missing here. I can share stats and code implementation or I can even get on call if that's possible. Any help will be very much appreciated.

15

1

[D] Is RAG just glorified prompt Engineering? (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/RiseWarm on 2024-04-03 15:32:38.

You get prompt, IR related docs, send them to prompt and then, LLM just generates response. We just engineered the prompt to be more informative.

16

1

[P] Making KNN more efficient with pre-elimination (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/JakeStBu on 2024-04-03 12:47:01.

Hi all, I've been working on a little project which I would like some completely honest feedback on. KNN can be an incredibly powerful algorithm, but unfortunately, it can also be incredibly inefficient and can be very resource intensive to run. I've been working on a project which according to some basic testing, has shown to be on average about 20% faster than normal KNN, with little to no quality decrease. You can check out the github repo here:

I've also written a mini-paper for it, which is linked on the repo. Sorry if the code's a little messy, but beside that, I'd really love your completely honest feedback (which I know Reddit is great at). Thanks in advance!

17

1

[D] For those who consider buying AMD GPUs (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/YYY_333 on 2024-04-02 17:37:20.

Thought this info might help you to make a decision.

P.S. Even George Hotz wouldn't recommend buying 7900 XTX anymore, switching to NVIDIA due to the instability of the drivers. From the latest stream:

18

1

[D] GPT-3.5-Turbo is most likely the same size as Mixtral-8x7B! (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/NixTheFolf on 2024-04-03 01:35:24.

The paper authors of "Logits of API-Protected LLMs Leak Proprietary Information" describe how they figured out and exploited a "softmax bottleneck" when calling on an API-Protected LLM over a ton of API calls, which they then used to get a close estimate that GPT-3.5-Turbo's embedding size of around 4096 ± 512. They then mention how this makes GPT-3.5-Turbo either a 7B dense model (by looking at other models with a known embedding size of ~4096), or a MoE that is Nx7B.

I have done some thinking and I make the prediction that GPT-3.5-Turbo (the one that has been used since early 2023, not the original GPT-3.5) is almost CERTAINLY a 8x7B model.

Evidence points to this too indirectly when we take a look at Mixtral-8x7B. Mixtral-8x7B has used by many with the general consensus of this model being on-par or slightly exceeding GPT-3.5-Turbo on most things.

GPT-3.5-Turbo-0613 & Mixtral-8x7B-Instruct-v0.1 on the LMSYS Chatbot Arena Leaderboard having an averaged ELO difference of ~1 point, though there could be a deviation with GPT-3.5-Turbo-0613 by either +3 or -4 points.

While the evidence points to this, some still might not think that GPT-3.5-Turbo is the same size as Mixtral-8x7B because of difference in other language performance, but this could be due to differences in training data. We have no idea what training data was used for both Mixtral-8x7B or GPT-3.5-Turbo, so differences in their performance in relation to training data can be there.

Differences can also be found in tuning of these two models as GPT-3.5-Turbo is fine-tuned on RLHF data that has a LOT of human feedback by including the feature for people to vote on an answer that the LLM gives out (ChatGPT), while Mixtral-8x7B-Instruct is a more general Instruction fine-tune.

The use of a MoE by OpenAI makes a lot of sense too. They originally released GPT-3.5 back in November of 2022 inside of ChatGPT, which they thought not many people would use, so compute was not much of a concern. When ChatGPT blew up in the next two months, compute was now the MAIN concern as 10s of millions of people were now using ChatGPT and that model, with more and more people jumping on it in the coming months. They needed a new model that can be as smart as the original GPT-3.5, but able to be served to millions of people at the same time to keep up with heavy demand.

OpenAI had just finish training GPT-4 not too long ago, which used a 8x MoE (based on indirect knowledge), it showed great promise for the power it can give but also for efficiency in running it compared to a fully dense 1T+ model. They possibly figured that a smaller MoE could possibly get similar performance to GPT-3.5 while costing much less compute to serve to many people, only needing the VRAM to load the model into GPU memory. If we assume they loaded 2 experts out of a possible 8 experts for serving to people (similar to default Mixtral-8x7B), this would basically quadruple their existing computing power to serve to the growing user base of ChatGPT.

They first released GPT-3.5-Turbo in the API and in ChatGPT Plus to get a better idea of the performance of the model from the public compared to the original GPT-3.5, as well as set aside enough compute to run this model at full ChatGPT scale, which they then did a little while later. They possibly used RLHF data from the public as well to tune the new GPT-3.5-Turbo model to act a lot like ChatGPT as well in most cases, which helped them seamlessly change the model in ChatGPT without most users noticing directly.

Based on ALL of this, I can very confidently say that I think that GPT-3.5-Turbo is a 8x7B model, basically the same size as Mixtral.

One thing I also want to note is that the paper I mentioned was a newer paper compared to one about a month or two ago that described a similar technique to this? (I forgot the name of that earlier paper). They did something similar and found the embedding size of other smaller OpenAI models, but they did not give GPT-3.5-Turbo's embedding size on request of OpenAI. Them not giving that information might be due to a model with the same specifications as GPT-3.5-Turbo already existing, and that model is Mixtral!

19

1

[P] SWE-agent: an open source coding agent that achieves 12.29% on SWE-bench (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/ofirpress on 2024-04-02 13:42:02.

We just made SWE-agent public, it's an open source agent that can turn any GitHub issue into a pull request, achieving 12.29% on SWE-bench (the same benchmark that Devin used).

We've been working on this for the past 6 months. Building agents that work well is much harder than it seems- our repo has an overview of what we learned and discovered. We'll have a preprint soon.

We'll hang out in this thread if you have any questions

20

1

[D] LLMs causing more harm than good for the field? (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Stevens97 on 2024-04-02 11:37:50.

This post might be a bit ranty, but i feel more and more share this sentiment with me as of late. If you bother to read this whole post feel free to share how you feel about this.

When OpenAI put the knowledge of AI in the everyday household, I was at first optimistic about it. In smaller countries outside the US, companies were very hesitant before about AI, they thought it felt far away and something only big FANG companies were able to do. Now? Its much better. Everyone is interested in it and wants to know how they can use AI in their business. Which is great!

Pre-ChatGPT-times, when people asked me what i worked with and i responded "Machine Learning/AI" they had no clue and pretty much no further interest (Unless they were a tech-person)

Post-ChatGPT-times, when I get asked the same questions I get "Oh, you do that thing with the chatbots?"

Its a step in the right direction, I guess. I don't really have that much interest in LLMs and have the privilege to work exclusively on vision related tasks unlike some other people who have had to pivot to working full time with LLMs.

However, right now I think its almost doing more harm to the field than good. Let me share some of my observations, but before that I want to highlight I'm in no way trying to gatekeep the field of AI in any way.

I've gotten job offers to be "ChatGPT expert", What does that even mean? I strongly believe that jobs like these don't really fill a real function and is more of a "hypetrain"-job than a job that fills any function at all.

Over the past years I've been going to some conferences around Europe, one being last week, which has usually been great with good technological depth and a place for Data-scientists/ML Engineers to network, share ideas and collaborate. However, now the talks, the depth, the networking has all changed drastically. No longer is it new and exiting ways companies are using AI to do cool things and push the envelope, its all GANs and LLMs with surface level knowledge. The few "old-school" type talks being sent off to a 2nd track in a small room

The panel discussions are filled with philosophists with no fundamental knowledge of AI talking about if LLMs will become sentient or not. The spaces for data-scientists/ML engineers are quickly dissapearing outside the academic conferences, being pushed out by the current hypetrain.

The hypetrain evangelists also promise miracles and gold with LLMs and GANs, miracles that they will never live up to. When the investors realize that the LLMs cant live up to these miracles they will instantly get more hesitant with funding for future projects within AI, sending us back into an AI-winter once again.

EDIT: P.S. I've also seen more people on this reddit appearing claiming to be "Generative AI experts". But when delving deeper it turns out they are just "good prompters" and have no real knowledge, expertice or interest in the actual field of AI or Generative AI.

21

1

[D] SOTA BERT-like model? (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Amgadoz on 2024-04-02 08:26:33.

So we are all probably aware of state-of-the-art decoder only LLMs like GPT-4, Claude etc. These are great for generating text.

But what I am not aware of is the SOTA BERT-like model. You know, things that can be used for taks like NER, POS tagging, token classification.

Are there models that are significantly better than say Roberta?

22

1

[D] What was the best math book/course you had for deep learning? (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/cyb0rg14_ on 2024-04-02 06:54:43.

I had done my high school with stem but it had been a year gap when I started learning Deep Learning after that.

I started consuming Deep Learning videos on YouTube without actually knowing the matrix calculus, many probability theories and so so. It felt like I was skipping some parts just to write the code down in PyTorch.

Then I decided to first understand all the maths behind deep learning. The first book I read was: Math for Deep Learning by Ronald T. Kneusel. It was an excellent book and covered all the topics need to start understanding maths behind deep learning.

I wanted to know, if you also had some similar encounter.

23

1

[D] UAI 2024 Reviews Waiting Place (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/elemintz on 2024-04-01 12:57:02.

A place to share your thoughts, prayers, and, most importantly (once the reviews are out), rants or maybe even some relieved comments.

24

1

[D] Can't escape OpenAI in my workplace, anyone else? (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/AlphaSquared_io on 2024-04-01 19:31:44.

Can we talk about how OpenAI specifically is being shoved down our throats by every single workplace, client, and their grandma right now? The number of requests I’m getting of to specifically work with the OpenAI API has just been skyrocketing lately. What are your guys’ experience with this? How do you navigate it? I've tried pitching other alternatives but nope, they're hellbent on using OpenAI.

OpenAI was founded for the explicit purpose of democratizing access to AI and acting as a counterbalance to the closed off world of big tech by developing open source tools. They have abandoned this idea entirely. In this space, the one approach that is horrifying (and the one that OpenAI was LITERALLY created to prevent) is a singular or oligarchy of for profit corporations making this decision for us.

Don't even get me started on the fact that their models were trained using the work of unassuming individuals who will never see a penny for it.

I feel forced to work with this abomination of a model, but I also have no real choice. This is how many of us pay our bills. Am I alone in this? Should I just swallow my pride?

25

1

[P] Stale Diffusion: Hyper-Realistic 5D Movie Generation Using Old-School Methods (lemmit.online)

submitted 5 months ago by [email protected] to c/[email protected]

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/brainggear on 2024-04-01 19:27:22.

Hi all,

I'd like to ask for feedback on our new paper on Stale Diffusion, a limiting case of Stable Diffusion when a distribution is mixed a little bit too much.

https://www.robots.ox.ac.uk/~joao/publications/sigbovik24.pdf

Somehow it's not being taken seriously by even unserious venues and even arXiv is thumbing its fancy nose at it. What gives?!

Best,

An author