ALTERNATE UNIVERSE DEV

Software at Scale

Software at Scale 6 - Distributed Systems with Indranil Gupta

Indranil Gupta (Indy) is a Professor of Computer Science at the University of Illinois, Urbana Champaign. He leads the DPRG (Distributed Protocols Research Group) and runs popular Cloud Computing MOOCs on Coursera. His work has inspired software that runs in many production services, like Serf in Nomad by Hashicorp. and Uber’s Ringpop.

Apple Podcasts | Spotify | Google Podcasts

Subscribe now

Share Software at Scale

We discussed how academia drives progress in distributed systems, a bit of blockchains, distributed systems and machine learning, doing quality research, working as a visiting researcher in industry, cluster schedulers, and how to choose between going into industry vs. academia.

Highlights

0:00 - Going into academia. “Make a list of pros and cons, throw away that list, and go with your gut”.

6:30 - The acceptance of distributed systems in early 2000s vs. today

7:30 - The emergence of blockchain and how the world treats it. “Re-looking the wheel, vs re-inventing the wheel”

12:30 - Differences in Computer Science research in industry and academia, especially with distributed systems.

21:00 - The inspiration for solving the reconfiguration problem came from open source bug reports. “Directions that seem daunting for folks in industry since the solutions are unclear can be tackled by academia”. Making progress towards solving online reconfiguration problem.The similarity of research to intern projects.

25:30 - How to pick an area to do research in.

31:00 - Writing a good paper is “90% good communication and 90% good ideas”.

31:20 - SWIM - a paper that’s had an outsized industry impact due to good ideas that were well explained

38:00 - “What are you excited about in distributed systems today?” - Distributed Systems + X. Machine Learning, Agriculture, etc. For examples, dealing with malicious workers in distributed machine learning (example)

47:00 - Training and inference with new machine learning systems like GPT3 becomes a distributed systems problem. Partitioning of data in Tensorflow.

56:00 - How can industry and academia collaborate so that industry produces more research?

66:00 - What Indy learnt by working in industry for a year (at Google)? Borg at Google (a predecessor to Kubernetes). Omega + Mesos

73:00 - Advice for a student evaluating a choice between academia and industry

Episode source