Technology

37691 readers

296 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago

MODERATORS

[email protected]

100

A jargon-free explanation of how AI large language models work (arstechnica.com)

submitted 1 year ago by [email protected] to c/[email protected]

14 comments fedilink hide all child comments

top 14 comments

sorted by: hot top controversial new old

[–] [email protected] 6 points 1 year ago (4 children)

Does anyone else start freaking out when we have such complex programs that researchers don't fully understand how they work?

[–] [email protected] 11 points 1 year ago

For what is worth a lot of medicine works this way. I'm fairly certain this isn't the only field, either. I'd imagine studying ecology or space feels similar

[–] [email protected] 5 points 1 year ago (1 children)

It does make me vaguely curious what happens if you try to make one of these on the more powerful end explain step by step how its own program works. I dont really expect it to be accurate, given that if people dont know how the thing works, it probably wont find much about that in it's training data, but if what it learns ultimately enables it to make connections about how the real world works to some degree, could it figure out enough to give even marginally useful hints?

[–] [email protected] 3 points 1 year ago (2 children)

Not really, it’s super fucking expensive to train one of these, on-line training would simply not be economically feasible.

Even if it was, the models don’t really have any agency. You prompt, they respond. There’s not much prompting going on from the model, and if there was, you can choose to not respond, which the model can’t really do.

[–] [email protected] 1 points 1 year ago

Wrong, the cat is out of bag, it takes one leak to do some serious impact to the whole industry.

https://www.semianalysis.com/p/google-we-have-no-moat-and-neither

You can try the various free open source version trained by community here: https://chat.lmsys.org/

[–] [email protected] 1 points 1 year ago

You can train an effective one for a few hundred bucks now.

https://crfm.stanford.edu/2023/03/13/alpaca.html

[–] [email protected] 4 points 1 year ago

We know how they work, otherwise we couldn’t design and implement them. What we don’t really know, and we don’t really have to know is the exact parameters the model trains to.

The issue you’re thinking of is that any one parameter does not necessarily map to one aspect, but they are a coherent collection that makes the whole work. Some interesting insights can be gleaned from trying to figure out these relationships, but due to the massive amount of parameters (billions!) it gets a little much to get your head around.

[–] [email protected] 1 points 1 year ago (1 children)

The whole "we don't know how they work" thing is a bit overblown. We have all the formulas, we know exactly how the math and code works. You can go and look at the weights for every node, you're just not going to derive any meaning or necessarily explain why one number works better than another.

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago)

This is the definition of complexity, isn't it? The fact here is that we can't scale up our understanding at a small level to make sense of the bigger picture. Having worked myself with (much simpler) artificial neural networks, I think it's very much correct and to the point to say that "we don't know how it works". I would even go further and claim that we will never know how it works fully: the weights in the network in essence form structures that do what they do, that we can recognize by analogy (e.g. logic gates, contour extractors, ...), but this is an anthropomorphic approximation which moreover only works in a certain range of values/set of conditions. Had we a formal definition of what the weights represent, we would then be dealing with a (much simpler and efficient) algorithm in the traditional sense (with cleanly delineated and rigorously defined specialized functions).

[–] [email protected] 5 points 1 year ago* (last edited 1 year ago) (1 children)

So fully explaining how these systems work will be a huge project that humanity is unlikely to complete any time soon.

Great read. This quote really stuck out to me and gave me chills. Reading about AI is so fascinating. Feels like we're on the cusp of something big.

[–] [email protected] 1 points 1 year ago (1 children)

cause in the end it's all statistics and math, human are full of mistakes(intentional or not), living language evolve over time(even the grammar), so whatever we are building "now" is a contemporary "good enough" representation.

[–] [email protected] 1 points 1 year ago

Also, humans tend to be notoriously bad at both statistics and math :^)

[–] [email protected] 4 points 1 year ago

This is fascinating, thanks so much for sharing!

[–] [email protected] 3 points 1 year ago

Good article :) it makes me happy to see this being explained in such a basic way because I sure as hell can't manage.