@(Papastalin98) Important to remember that this results are not few-shot learning, while this question asks about few-shot learning. It's very important practical distinction as well, because models have tendency to exploit spurious biases in training datasets. E.g. a model may notice that questions starting with "how many" can be answered with "2" in 80% of cases without even looking at anything else in the question. In few-shot setup there is a lot less opportunity for a model to find spurious biases due to small amount of data. Therefore, a few-shot...
@(isinlor) The rate of improvement is rather disappointing. It seem like they may be halving error rate once in 6 months up to once in 3 months right now. I'm still waiting for FSD 9 to better estimate it, but I would be surprised if it was more than 5x improvement over FSD 8. And that's the easy part. My very rough guess is that they still need to make it some 10000x better. $$log2(10000) * 0.5\ year = 6.64\ years$$ The improvement needed also roughly matches Tesla internal expectations. According to [CJ Moore, Tesla’s director of autopilot softwar...
99% and 95% predictions here are a major overconfidence. The same overconfidence was present during the classical era of physics. The same was with LHC beyond Higgs boson. I could go so high only if there was no more fundamental phenomena to explain and everyone agreed on interpretation of what we would know at that point. Physics works because we constantly verify it against reality. In absence of constant checks against reality physics is not that much better at making predictions than philosophy. Well, it is philosophy at that point. It's just made b...

@Anthony It's about 10 days of Italian lock-down. The cases were still growing yesterday. I have a suspicion that there will be longer delay as people infected before the lock-down will be infecting their families during the lock down.

@Glossy I assign to it significantly less than 1% chance.

— edited by isinlor

A person I know from seeing is confirmed COVID-19 case.

I haven't seen this person in 2 weeks, so it's extremely unlikely I got infected trough that person directly.

— edited by isinlor

The Loebner Prize *was* an annual competition in artificial intelligence. [The prize is reported as defunct since 2020: ](https://www.bbc.com/news/technology-54718671) > While Blenderbot is the brainchild of one of the world's largest corporations, Kuki began life as a hobby. Formerly known as Mitsuku, she was originally designed by UK-based Steve Worswick in his spare time. > Mitsuku was showcased for years at the Loebner Prize, winning five times. > That competition, now defunct, is a version of the Turing Test: an "imitation game" devised by Alan...
I'm certainly interested. I would actually be interested in more than just clinical trials. I was for example looking at [OSF Registry](https://osf.io/registries). The researches registered there seem to have well defined hypotheses. We could predict whether researchers will retain or reject some interesting ones. Recently, I heard about interesting proposition for a solution to the reproducibility crisis involving prediction markets. The idea is that people get to bet on whether research will be reproducible and gatekeepers use that as a signal in re...
> [RECOVERY Trial lead by Oxford University. Statement from the Chief Investigators: low-cost dexamethasone reduces death by up to one third in hospitalised patients with severe respiratory complications of COVID-19](https://www.recoverytrial.net/results) Also, [BBC article](https://www.bbc.com/news/health-53061281). > In the trial, led by a team from Oxford University, around 2,000 hospital patients were given dexamethasone and were compared with more than 4,000 who did not receive the drug. > For patients on ventilators, it cut the risk of death fro...

We have already second successful attempt at fooling HackerNews with GPT-3.

It's interesting because it's not just a failure to differentiate GPT-3 content from random article. But people actually liked it and engaged with it.

— edited by isinlor

Here you can track progress of Leela Chess Zero: https://docs.google.com/spreadsheets/d/18UWR4…

As of today, the estimated date to beat Stockfish 8 3.8GHz 1 core with LC0 running on GTX 1060 is December 2018.

Not sure when TCEC will be held next year, but I give 70% that Leela Zero will win it. Although, I think it may also depends on access to GPU.

@(Jgalt) From watching Tesla FSD demo videos I would estimate that initial release was failing on maybe 1 in 10 turns. Now, a month later, it's maybe 1 in 20 turns. If they keep that pace then it will take them still at least a year before they get to one failure in 100 000 turns. And these are just the failures due to turns. I don't believe they will keep that pace since it seems like the latest improvements are literally due to debugging their driving logic. Stuff like vision correctly detects roundabout, but driving logic still tells the car to go s...

@notany It's funny to read articles like that. People seem to be all over the place with their opinions. Only 5 month after the article was written BERT managed to paradigm shift the whole NLP. And since then we had at least two more major breakthroughs: AlphaFold and GPT series. Winter is coming! :)

But I can agree on one point: Embedded AI and RL are wicked difficult.

Duplicated question: Zettascale computing when?

Other prediction: lower 25% 2034, median 2038, upper 75% 2042

I'm really surprised that prediction did not plummet after my previous comment. Seems like competition is still running, but based on [BBC article](https://www.bbc.com/news/technology-49578503) the format of the competition changed: > There will no longer be a panel of judges. Instead, the chatbots will be judged by the public and there will be no human competitors. Not sure if winning Silver medal should count for positive resolution anymore. Actually, I don't know whether concept of Silver medal even exists anymore. BTW - [Mitsuku won this year as w...
New SOTA on few-shot SuperGLUE of 75.4 with only 223M parameters. [It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners](https://arxiv.org/abs/2009.07118) > When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance on challenging natural language understanding benchmarks. In this work, we show that performance similar to GPT-3 can be obtained with language models whose parameter count is several orders of magnitude smaller. This is...
New SOTA on few-shot SuperGLUE of 75.4 with only 223M parameters. [It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners](https://arxiv.org/abs/2009.07118) > When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance on challenging natural language understanding benchmarks. In this work, we show that performance similar to GPT-3 can be obtained with language models whose parameter count is several orders of magnitude smaller. This is...

@mumpskin The problem with judging troops buildup is that bluff must look convincing in order to work.

The best way to produce convincing bluff is to order actual preparation for invasion pending order to attack.

@(kwathomas0) Indeed, the USA seems to not be doing great. I hope once they start testing, they will realize that mobilization is necessary. I'm actually reasonably happy about Europe response, at least we don't seem to be worse than China. - Italy = 4.2% CFR - France = 1.4% CFR - Spain = 1.3% CFR - UK = 1.2% CFR - Netherlands = 0.8% CFR - Switzerland = 0.5% CFR - No deaths in other major EU countries. Outside Italy, in Europe, we have 0.7% CFR. With Italy that jump to 1.6% - 2.9% with 2000 - 7400 cases respectively. China had 2.4% - 2.9% CFR while...