[-] [email protected] 14 points 1 day ago

Summizing Emails is a valid purpose.

Or it would have been if LLMs were sufficiently dependable anyway.

[-] [email protected] 10 points 1 day ago* (last edited 1 day ago)

But “It’s Greek to me” goes right back to the Romans.

The wiki seems to say the aphorism originates with medieval scribes and Shakespeare's Julius Caesar.

The actual ancient Romans are unlikely to have had such qualms, since at the time Greek was much more widely understood than Latin, so much so that many important roman works like Caesar's Memoirs and Marcus Aurelius' Meditations were originally written in Greek, with the Latin versions being translations.

[-] [email protected] 7 points 1 day ago

You’d think AI companies would have wised up by this point and gone through all their pre-recorded demos with a fine comb so that ~~marks~~ users at least make it past the homepage, but I guess not.

The target group for their pitch probably isn't people who have a solid grasp of coding, I'd bet quite the opposite.

[-] [email protected] 21 points 3 days ago* (last edited 3 days ago)

On each step, one part of the model applies reinforcement learning, with the other one (the model outputting stuff) “rewarded” or “punished” based on the perceived correctness of their progress (the steps in its “reasoning”), and altering its strategies when punished. This is different to how other Large Language Models work in the sense that the model is generating outputs then looking back at them, then ignoring or approving “good” steps to get to an answer, rather than just generating one and saying “here ya go.”

Every time I've read how chain-of-thought works in o1 it's been completely different, and I'm still not sure I understand what's supposed to be going on. Apparently you get a strike notice if you try too hard to find out how the chain-of-thinking process goes, so one might be tempted to assume it's something that's readily replicable by the competition (and they need to prevent that as long as they can) instead of any sort of notably important breakthrough.

From the detailed o1 system card pdf linked in the article:

According to these evaluations, o1-preview hallucinates less frequently than GPT-4o, and o1-mini hallucinates less frequently than GPT-4o-mini. However, we have received anecdotal feedback that o1-preview and o1-mini tend to hallucinate more than GPT-4o and GPT-4o-mini. More work is needed to understand hallucinations holistically, particularly in domains not covered by our evaluations (e.g., chemistry). Additionally, red teamers have noted that o1-preview is more convincing in certain domains than GPT-4o given that it generates more detailed answers. This potentially increases the risk of people trusting and relying more on hallucinated generation.

Ballsy to just admit your hallucination benchmarks might be worthless.

The newsletter also mentions that the price for output tokens has quadrupled compared to the previous newest model, but the awesome part is, remember all that behind-the-scenes self-prompting that's going on while it arrives to an answer? Even though you're not allowed to see them, according to Ed Zitron you sure as hell are paying for them (i.e. they spend output tokens) which is hilarious if true.

[-] [email protected] 19 points 3 days ago

"When asked about buggy AI [code], a common refrain is ‘it is not my code,’ meaning they feel less accountable because they didn’t write it.”

Strong they cut all my deadlines in half and gave me an OpenAI API key, so fuck it energy.

He stressed that this is not from want of care on the developer’s part but rather a lack of interest in “copy-editing code” on top of quality control processes being unprepared for the speed of AI adoption.

You don't say.

[-] [email protected] 16 points 6 days ago* (last edited 6 days ago)

OpenAI manages to do an entire introduction of a new model without using the word "hallucination" even once.

Apparently it implements chain-of-thought, which either means they changed the RHFL dataset to force it to explain its 'reasoning' when answering or to do self questioning loops, or that it reprompts itsefl multiple times behind the scenes according to some heuristic until it synthesize a best result, it's not really clear.

Can't wait to waste five pools of drinkable water to be told to use C# features that don't exist, but at least it got like 25.2452323760909304593095% better at solving math olympiads as long as you allow it a few tens of tries for each question.

[-] [email protected] 31 points 2 months ago

I'm not spending the additional 34min apparently required to find out what in the world they think neural network training actually is that it could ever possibly involve strategy on the part of the network, but I'm willing to bet it's extremely dumb.

I'm almost certain I've seen EY catch shit on twitter (from actual ml researchers no less) for insinuating something very similar.

[-] [email protected] 27 points 2 months ago* (last edited 2 months ago)

It's a sad fate that sometimes befalls engineers who are good at talking to audiences, and who work for a big enough company that can afford to have that be their primary role.

edit: I love that he's chief evangelist though, like he has a bunch of little google cloud clerics running around doing chores for him.

76
submitted 2 months ago by [email protected] to c/[email protected]

AI Work Assistants Need a Lot of Handholding

Getting full value out of AI workplace assistants is turning out to require a heavy lift from enterprises. ‘It has been more work than anticipated,’ says one CIO.

aka we are currently in the process of realizing we are paying for the privilege of being the first to test an incomplete product.

Mandell said if she asks a question related to 2024 data, the AI tool might deliver an answer based on 2023 data. At Cargill, an AI tool failed to correctly answer a straightforward question about who is on the company’s executive team, the agricultural giant said. At Eli Lilly, a tool gave incorrect answers to questions about expense policies, said Diogo Rau, the pharmaceutical firm’s chief information and digital officer.

I mean, imagine all the non-obvious stuff it must be getting wrong at the same time.

He said the company is regularly updating and refining its data to ensure accurate results from AI tools accessing it. That process includes the organization’s data engineers validating and cleaning up incoming data, and curating it into a “golden record,” with no contradictory or duplicate information.

Please stop feeding the thing too much information, you're making it confused.

Some of the challenges with Copilot are related to the complicated art of prompting, Spataro said. Users might not understand how much context they actually need to give Copilot to get the right answer, he said, but he added that Copilot itself could also get better at asking for more context when it needs it.

Yeah, exactly like all the tech demos showed -- wait a minute!

[Google Cloud Chief Evangelist Richard Seroter said] “If you don’t have your data house in order, AI is going to be less valuable than it would be if it was,” he said. “You can’t just buy six units of AI and then magically change your business.”

Nevermind that that's exactly how we've been marketing it.

Oh well, I guess you'll just have to wait for chatgpt-6.66 that will surely fix everything, while voiced by charlize theron's non-union equivalent.

[-] [email protected] 26 points 3 months ago* (last edited 3 months ago)

Honestly, the evident plethora of poor programming practices is the least notable thing about all this; using roided autocomplete to cut corners was never going to be a well calculated decision, it's always the cherry on top of a shit-cake.

[-] [email protected] 32 points 3 months ago* (last edited 3 months ago)

There's an actual explanation in the original article about some of the wardrobe choices. It's even dumber, and it involves effective altruism.

It is a very cold home. It’s early March, and within 20 minutes of being here the tips of some of my fingers have turned white. This, they explain, is part of living their values: as effective altruists, they give everything they can spare to charity (their charities). “Any pointless indulgence, like heating the house in the winter, we try to avoid if we can find other solutions,” says Malcolm. This explains Simone’s clothing: her normal winterwear is cheap, high-quality snowsuits she buys online from Russia, but she can’t fit into them now, so she’s currently dressing in the clothes pregnant women wore in a time before central heating: a drawstring-necked chemise on top of warm underlayers, a thick black apron, and a modified corset she found on Etsy. She assures me she is not a tradwife. “I’m not dressing trad now because we’re into trad, because before I was dressing like a Russian Bond villain. We do what’s practical.”

46
submitted 5 months ago by [email protected] to c/[email protected]

An AI company has been generating porn with gamers' idle GPU time in exchange for Fortnite skins and Roblox gift cards

"some workloads may generate images, text or video of a mature nature", and that any adult content generated is wiped from a users system as soon as the workload is completed.

However, one of Salad's clients is CivitAi, a platform for sharing AI generated images which has previously been investigated by 404 media. It found that the service hosts image generating AI models of specific people, whose image can then be combined with pornographic AI models to generate non-consensual sexual images.

Investigation link: https://www.404media.co/inside-the-ai-porn-marketplace-where-everything-and-everyone-is-for-sale/

80
submitted 5 months ago* (last edited 5 months ago) by [email protected] to c/[email protected]

For thursday's sentencing the us government indicated they would be happy with a 40-50 prison sentence, and in the list of reasons they cite there's this gem:

  1. Bankman-Fried's effective altruism and own statements about risk suggest he would be likely to commit another fraud if he determined it had high enough "expected value". They point to Caroline Ellison's testimony in which she said that Bankman-Fried had expressed to her that he would "be happy to flip a coin, if it came up tails and the world was destroyed, as long as if it came up heads the world would be like more than twice as good". They also point to Bankman-Fried's "own 'calculations'" described in his sentencing memo, in which he says his life now has negative expected value. "Such a calculus will inevitably lead him to trying again," they write.

Turns out making it a point of pride that you have the morality of an anime villain does not endear you to prosecutors, who knew.

Bonus: SBF's lawyers' list of assertions for asking for a shorter sentence includes this hilarious bit reasoning:

They argue that Bankman-Fried would not reoffend, for reasons including that "he would sooner suffer than bring disrepute to any philanthropic movement."

[-] [email protected] 33 points 6 months ago

This was such a chore to read, it's basically quirk-washing TREACLES. This is like a major publication deciding to take an uncritical look at scientology focusing on the positive vibes and the camaraderie, while stark in the middle of operation snow white, which in fact I bet happened a lot at the time.

The doomer scene may or may not be a delusional bubble—we’ll find out in a few years

Fuck off.

The doomers are aware that some of their beliefs sound weird, but mere weirdness, to a rationalist, is neither here nor there. MacAskill, the Oxford philosopher, encourages his followers to be “moral weirdos,” people who may be spurned by their contemporaries but vindicated by future historians. Many of the A.I. doomers I met described themselves, neutrally or positively, as “weirdos,” “nerds,” or “weird nerds.” Some of them, true to form, have tried to reduce their own weirdness to an equation. “You have a set amount of ‘weirdness points,’ ” a canonical post advises. “Spend them wisely.”

The weirdness is eugenics and the repugnant conclusion, and abusing bayes rule to sidestep context and take epistimological shortcuts to cuckoo conclusions while fortifying a bubble of accepted truths that are strangely amenable to allowing rich people to do whatever the hell they want.

Writing a 7-8000 word insider expose on TREACLES without mentioning eugenics even once throughout should be all but impossible, yet here we are.

[-] [email protected] 26 points 6 months ago

birdsite stuff:

A rationalist organization offered a James Randi-style $100k prize to anyone who could defeat them in a structured longform debate and prove COVID had a natural origin, so a rando Slate Star Codex commenter took them up on it and absolutely destroyed them. You won't believe what happened next (they wrote a pissy blogpost claiming the handpicked judges had "errors in ... probabilistic inference" for not agreeing with their conclusion and grew even more confident in their incorrect opinion)

57
submitted 6 months ago* (last edited 6 months ago) by [email protected] to c/[email protected]

rootclaim appears to be yet another group of people who, having stumbled upon the idea of the Bayes rule as a good enough alternative to critical thinking, decided to try their luck in becoming a Serious and Important Arbiter of Truth in a Post-Mainstream-Journalism World.

This includes a randiesque challenge that they'll take a $100K bet that you can't prove them wrong on a select group of topics they've done deep dives on, like if the 2020 election was stolen (91% nay) or if covid was man-made and leaked from a lab (89% yay).

Also their methodology yields results like 95% certainty on Usain Bolt never having used PEDs, so it's not entirely surprising that the first person to take their challenge appears to have wiped the floor with them.

Don't worry though, they have taken the results of the debate to heart and according to their postmortem blogpost they learned many important lessons, like how they need to (checks notes) gameplan against the rules of the debate better? What a way to spend 100K... Maybe once you've reached a conclusion using the Sacred Method changing your mind becomes difficult.

I've included the novel-length judges opinions in the links below, where a cursory look indicates they are notably less charitable towards rootclaim's views than their postmortem indicates, pointing at stuff like logical inconsistencies and the inclusion of data that on closer look appear basically irrelevant to the thing they are trying to model probabilities for.

There's also like 18 hours of video of the debate if anyone wants to really get into it, but I'll tap out here.

ssc reddit thread

quantian's short writeup on the birdsite, will post screens in comments

pdf of judge's opinion that isn't quite book length, 27 pages, judge is a microbiologist and immunologist PhD

pdf of other judge's opinion that's 87 pages, judge is an applied mathematician PhD with a background in mathematical virology -- despite the length this is better organized and generally way more readable, if you can spare the time.

rootclaim's post mortem blogpost, includes more links to debate material and judge's opinions.

edit: added additional details to the pdf descriptions.

38
submitted 7 months ago* (last edited 7 months ago) by [email protected] to c/[email protected]

edited to add tl;dr: Siskind seems ticked off because recent papers on the genetics of schizophrenia are increasingly pointing out that at current miniscule levels of prevalence, even with the commonly accepted 80% heritability, actually developing the disorder is all but impossible unless at least some of the environmental factors are also in play. This is understandably very worrisome, since it indicates that even high heritability issues might be solvable without immediately employing eugenics.

Also notable because I don't think it's very often that eugenics grievances breach the surface in such an obvious way in a public siskind post, including the claim that the whole thing is just HBD denialists spreading FUD:

People really hate the finding that most diseases are substantially (often primarily) genetic. There’s a whole toolbox that people in denial about this use to sow doubt. Usually it involves misunderstanding polygenicity/omnigenicity, or confusing GWAS’ current inability to detect a gene with the gene not existing. I hope most people are already wise to these tactics.

15
submitted 9 months ago by [email protected] to c/[email protected]

... while at the same time not really worth worrying about so we should be concentrating on unnamed alleged mid term risks.

EY tweets are probably the lowest effort sneerclub content possible but the birdsite threw this to my face this morning so it's only fair you suffer too. Transcript follows:

Andrew Ng wrote:

In AI, the ratio of attention on hypothetical, future, forms of harm to actual, current, realized forms of harm seems out of whack.

Many of the hypothetical forms of harm, like AI "taking over", are based on highly questionable hypotheses about what technology that does not currently exist might do.

Every field should examine both future and current problems. But is there any other engineering discipline where this much attention is on hypothetical problems rather than actual problems?

EY replied:

I think when the near-term harm is massive numbers of young men and women dropping out of the human dating market, and the mid-term harm is the utter extermination of humanity, it makes sense to focus on policies motivated by preventing mid-term harm, if there's even a trade-off.

20
submitted 9 months ago by [email protected] to c/[email protected]
148
submitted 10 months ago by [email protected] to c/[email protected]

Sam Altman, the recently fired (and rehired) chief executive of Open AI, was asked earlier this year by his fellow tech billionaire Patrick Collison what he thought of the risks of synthetic biology. ‘I would like to not have another synthetic pathogen cause a global pandemic. I think we can all agree that wasn’t a great experience,’ he replied. ‘Wasn’t that bad compared to what it could have been, but I’m surprised there has not been more global coordination and I think we should have more of that.’

26
submitted 11 months ago* (last edited 11 months ago) by [email protected] to c/[email protected]

original is here, but you aren't missing any context, that's the twit.

I could go on and on about the failings of Shakespear... but really I shouldn't need to: the Bayesian priors are pretty damning. About half the people born since 1600 have been born in the past 100 years, but it gets much worse that that. When Shakespear wrote almost all Europeans were busy farming, and very few people attended university; few people were even literate -- probably as low as ten million people. By contrast there are now upwards of a billion literate people in the Western sphere. What are the odds that the greatest writer would have been born in 1564? The Bayesian priors aren't very favorable.

edited to add this seems to be an excerpt from the fawning book the big short/moneyball guy wrote about him that was recently released.

1
submitted 1 year ago by [email protected] to c/[email protected]

Transcription:

Thinking about that guy who wants a global suprasovereign execution squad with authority to disable the math of encryption and bunker buster my gaming computer if they detect it has too many transistors because BonziBuddy might get smart enough to order custom RNA viruses online.

view more: next ›

Architeuthis

joined 1 year ago