Sebastian Raschka

@rasbt

Python, ML & open source! Lead AI Educator @gridai_ | @PyTorchLightnin. Statistics Prof @UWMadison. Author of "Machine Learning with PyTorch and Scikit-Learn"

Madison, Wisconsin

sebastianraschka.com

567 Photos and videos Photos and videos

Tweets

Tweets, current page.
Tweets & replies
Media

You blocked @rasbt

Are you sure you want to view these Tweets? Viewing Tweets won't unblock @rasbt

Pinned Tweet
Sebastian Raschka‏ @rasbt Feb 25
If you are looking for something for this weekend, my new book *Machine Learning with PyTorch and Scikit-Learn* just came out today:https://www.amazon.com/Machine-Learning-PyTorch-Scikit-Learn-scikit-learn-ebook-dp-B09NW48MR1/dp/B09NW48MR1/ …

53 replies 222 retweets 1,450 likes

Show this thread

Show this thread
Thanks. Twitter will use this to make your timeline better. Undo

Undo
Sebastian Raschka‏ @rasbt 11 hours ago
Just discovered this RNN gem: "LSTM: A Search Space Odyssey" https://arxiv.org/abs/1503.04069 Basically it's a large scale study looking into LSTM variants. While things can be simplified, the forget gate and the output activation function are the most critical parts of the LSTM cell

1 reply 9 retweets 70 likes
Thanks. Twitter will use this to make your timeline better. Undo

Undo
Sebastian Raschka‏ @rasbt 12 hours ago
Personally, I think that class projects are really worthwhile. They are a lot of work but are very rewarding for both the students (practical experience & sth for the CV) and the teacher (it's always sth unique & interesting vs the standard HW assignments)https://proceedings.mlr.press/v170/raschka22a.html …

1 reply 14 likes

Show this thread

Show this thread
Thanks. Twitter will use this to make your timeline better. Undo

Undo
Sebastian Raschka‏ @rasbt 12 hours ago
The proceedings of the 2nd Teaching Machine Learning and Artificial Intelligence Workshop are now online: https://proceedings.mlr.press/v170/ It was a fun workshop catching up with colleagues and exchanging tips & tricks about teaching

4 replies 30 retweets 194 likes

Show this thread

Show this thread
Thanks. Twitter will use this to make your timeline better. Undo

Undo
Sebastian Raschka‏ @rasbt Mar 20
PS: credits go to @cHHillee for this excellent article!

4 likes

Show this thread

Show this thread
Thanks. Twitter will use this to make your timeline better. Undo

Undo
Sebastian Raschka‏ @rasbt Mar 20
9/9 Python can perform 32 million additions per sec, however, PyTorch can only do 280 thousand operations per sec. Interesting, but again not an issue in practice, because we usually use few, large array operations when we work in PyTorch

1 retweet 8 likes

Show this thread

Show this thread
Thanks. Twitter will use this to make your timeline better. Undo

Undo
Sebastian Raschka‏ @rasbt Mar 20
8/ Interesting tidbit about overhead. Python can perform 32 million additions per sec. But in the time Python requires for 1 FLOP, an A100 GPU can perform 9.75 million FLOPS (but in practice Python overhead is not an issue because the percentage of time spent there is so small)

1 reply 7 likes

Show this thread

Show this thread
Thanks. Twitter will use this to make your timeline better. Undo

Undo
Sebastian Raschka‏ @rasbt Mar 20
7/ In the middle panel we can see that the FLOPS ramp up, and in the right panel we see that the memory bandwidth goes down as we reach 128ish repeats. Meaning that the code switches from being memory-bound to being CPU-bound.pic.twitter.com/cipIwoj283

1 reply 4 likes

Show this thread

Show this thread
Thanks. Twitter will use this to make your timeline better. Undo

Undo
Sebastian Raschka‏ @rasbt Mar 20
6/ For the following plot we can see that the runtime (left panel) doesn't increase until we reach 32 repeats. So the code is probably mostly memory-bandwidth- and overhead-bound.pic.twitter.com/RglKfkNUCv

2 replies 6 likes

Show this thread

Show this thread
Thanks. Twitter will use this to make your timeline better. Undo

Undo
Sebastian Raschka‏ @rasbt Mar 20
5/ Anyways, in general, one of the main take-aways is to find out whether your (GPU) code is (1) compute-, (2) memory-, (3) or overhead-bound. There's an interesting analysis using the following code: def f(x: Tensor[N]): for _ in range(repeat): x=x*2 return x

1 reply 5 likes

Show this thread

Show this thread
Thanks. Twitter will use this to make your timeline better. Undo

Undo
Sebastian Raschka‏ @rasbt Mar 20
4/ There is the concept of operator fusion -- eg via custom CUDA kernels. I know many mathematically inclined folks prefer the elegance of composition: things like torch.log(F.softmax(a)). But F.log_softmax(a) is not only better for numeric stability but also efficiency.

1 reply 5 likes

Show this thread

Show this thread
Thanks. Twitter will use this to make your timeline better. Undo

Undo
Sebastian Raschka‏ @rasbt Mar 20
3/ Because of the memory read & writes, running successive operations can be expensive, even if you already have your data transferred to the GPU Eg 4 read & writes: x1 = x.cos() x2 = x1.cos() vs 2: x2 = x.cos().cos() The latter is still inefficient (unless compiled) though.

1 reply 1 retweet 5 likes

Show this thread

Show this thread
Thanks. Twitter will use this to make your timeline better. Undo

Undo
Sebastian Raschka‏ @rasbt Mar 20
2/ Different things to consider: 1. Compute: Time the GPU spends on floating point operations (FLOPS) 2. Memory: Time spent transferring tensors within a GPU 3. Overhead: Everything else (like Python, and the PyTorch API)pic.twitter.com/MNQkm61QSI

1 reply 3 retweets 13 likes

Show this thread

Show this thread
Thanks. Twitter will use this to make your timeline better. Undo

Undo
Sebastian Raschka‏ @rasbt Mar 20
Ever had a colleague who wants to apply linear regression to thousands of small datasets and asked you about a PyTorch-GPU implementation to speed things up? This would be a great article to share, illustrating the different performance bottle necks: https://horace.io/brrr_intro.html 1/

3 replies 66 retweets 440 likes

Show this thread

Show this thread
Thanks. Twitter will use this to make your timeline better. Undo

Undo
Sebastian Raschka‏ @rasbt Mar 19
Sebastian Raschka Retweeted PyTorch Lightning

Is there a better way to spend the weekend ? Will join the stream to check in on @marksaroufim around 4 pm CST (2 pm PST). Help me to prepare some tricky questions (just kidding ... or maybe not? )https://twitter.com/PyTorchLightnin/status/1505189852584853507 …

Sebastian Raschka added,

PyTorch Lightning @PyTorchLightnin

Good luck today @marksaroufim! Join @ 12pm PST on Twitch https://twitch.tv/marksaroufim https://twitter.com/marksaroufim/status/1503902965802831873 …

3 replies 1 retweet 18 likes
Thanks. Twitter will use this to make your timeline better. Undo

Undo
Sebastian Raschka Retweeted
Thomas G. Dietterich‏ @tdietterich Mar 18
ML Twitter: What is the current best practice for the following setting? Problem: Image classification. Setting: I'm given an initial supervised training set of labeled images drawn from P(x,y), and I train a net. Then I'm given a second set of labeled images also from P(x,y). 1/

44 replies 47 retweets 321 likes

Show this thread

Show this thread
Thanks. Twitter will use this to make your timeline better. Undo

Undo
Sebastian Raschka‏ @rasbt Mar 18
Kudos to the authors for providing all the code to reproduce the results on GitHub: https://github.com/Yura52/tabular-dl-num-embeddings#how-to-reproduce-results …pic.twitter.com/qVTsgX2W3S

4 retweets 36 likes

Show this thread

Show this thread
Thanks. Twitter will use this to make your timeline better. Undo

Undo
Sebastian Raschka‏ @rasbt Mar 18
3/3 The second one is based on periodic activation functions (inspired by positional encodings). Besides having competitive performance to gradient boosting, it is also interesting to note that with these embeddings, the MLPs are also competitive to transformer-architectures.

2 replies 1 retweet 9 likes

Show this thread

Show this thread
Thanks. Twitter will use this to make your timeline better. Undo

Undo
Sebastian Raschka‏ @rasbt Mar 18
2/ The authors employ two embedding schemes that are competitive with gradient boosting and apparently result in SOTA performance. The first embedding scheme is a piecewise-linear encoding involving feature binning. red by positional encodings)pic.twitter.com/z80lB2bDqB

1 reply 1 retweet 19 likes

Show this thread

Show this thread
Thanks. Twitter will use this to make your timeline better. Undo

Undo
Sebastian Raschka‏ @rasbt Mar 18
*No confidence intervals in the table above , but I think results are averaged over 15 random seeds at least

1 reply 7 likes

Show this thread

Show this thread
Thanks. Twitter will use this to make your timeline better. Undo

Undo
Sebastian Raschka‏ @rasbt Mar 18
Let the "Deep Learning for Tabular Data" saga continue -- it's been a while! "On Embeddings for Numerical Features in Tabular Deep Learning" (https://arxiv.org/abs/2203.05556 ) This time, the focus is on the embeddings for numerical features rather than proposing a new architecture.*pic.twitter.com/2NcpYvnYlS

8 replies 102 retweets 542 likes

Show this thread

Show this thread
Thanks. Twitter will use this to make your timeline better. Undo

Undo

Loading seems to be taking a while.

Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.

Country	Code	For customers of
United States	40404	(any)
Canada	21212	(any)
United Kingdom	86444	Vodafone, Orange, 3, O2
Brazil	40404	Nextel, TIM
Haiti	40404	Digicel, Voila
Ireland	51210	Vodafone, O2
India	53000	Bharti Airtel, Videocon, Reliance
Indonesia	89887	AXIS, 3, Telkomsel, Indosat, XL Axiata
Italy	4880804	Wind
Italy	3424486444	Vodafone
» See SMS short codes for other countries

Sebastian Raschka

@rasbt

Tweets

You blocked @rasbt

Loading seems to be taking a while.

You may also like

false

Saved searches

Sebastian Raschka

@rasbt

Tweets

You blocked @rasbt

Sebastian Raschka followed

Loading seems to be taking a while.

New to Twitter?

You may also like

false

Choose a trend location

Go to a person's profile

Saved searches

Promote this Tweet

Block

Tweet with a location

Your lists

Create a new list

Copy link to Tweet

Embed this Tweet

Embed this Video

Preview

Why you're seeing this ad

Log in to Twitter

Sign up for Twitter

Not on Twitter? Sign up, tune into the things you care about, and get updates as they happen.

Two-way (sending and receiving) short codes:

Confirmation

Welcome home!

Tweets not working for you?

Say a lot with a little

Spread the word

Join the conversation

Learn the latest

Get more of what you love

Find what's happening

Never miss a Moment