Data Scientist. I write about DS & ML. Sometimes.

Data Scientist’s Toolbox

Setting up Miniconda instead and connecting it to Jupyter Notebook.

Photo by Maxx Rush on Unsplash

I remember when I started learning Python in my university days and found this “thing” called Anaconda.

To me, it was basically an all-in-one Python package that comes bundled with a lot of packages. Goodbye pip install, I can finally focus on the code instead of managing the requirements.

The problem would come a while later when I was working on a task for my Machine Learning class.

It was a pretty straightforward task, load csv dataset, do feature engineering, create some visualisations, and then train a model.

“Nice, I should be able to sleep for 8 hours tonight,” I thought to myself. …


Image for post
Image for post
Photo by Manuel Geissinger from Pexels

The Data Science Experience

Have you worked on big data or “big data”?

128 Petabytes. Not Terabytes, Petabytes. That’s how much storage space the Hadoop File System (HDFS) had when I first started working with big data.

We have all heard of the term big data, but I believe we never fully understand it before getting a hands-on experience.

I first learnt about it during my sophomore year in the university and even worked on “big data” in my first job.

I was responsible for setting up a pipeline to process 10 MB of generated data everyday. Which would amount to around 3.65 GB of data per year.

We even decided to not use Hadoop since the overhead of loading the data from HDFS is much longer than simply reading the data directly through the file system. …


To Be a Data Scientist

Get to know yourself before going down the data science career path

Image for post
Image for post
Photo by National Cancer Institute on Unsplash

Bzzz! My always-on-vibrate phone buzzes in my pocket as I walked along the Orchard Road in Singapore on a beautiful Saturday morning.

It buzzes a few more times in quick succession after the first one, but I thought nothing of it since I have tried to ignore messages on weekends.

I learned that disengaging from chats and social media will help you better appreciate your time with people around you.

Once I arrived back home, I was surprised to see it was from someone who had not messaged me for a very long time. I don’t really know this person, I have talked with him before, but not that much. …


Technology

Amazon and Apple already have a huge head start.

Image for post
Image for post
Photo by Clark Street Mercantile on Unsplash

Covid-19 has caused massive disruption to industries around the world. Airline and tourism were among the first ones to be hit hard due to entry restriction by most countries in an effort to curb the pandemic’s spread. Giant startups like Airbnb and Uber, among others, have also laid off some of their workers in an effort to shed expenses. Famous consulting firms, Accenture and Deloitte, did the same thing a couple of months ago. The list goes on and on.

Among these industries, there is one, in particular, that may usher in the next trend in machine learning and AI— the retail industry. …


Technology

It’s a step to the right direction, but they should have been able to do more.

Image for post
Image for post
Screenshot by Author from Apple Event (source)

Apple has always thrown a lot of big and powerful words around in their marketing. The following quote was taken from the latest Apple Event on October 13.

For example, Deep Fusion uses machine learning on the Neural Engine for pixel-by-pixel processing of photos with unprecedented detail, texture, and minimal noise. (source)

As far as I know, image processing will process images pixel-by-pixel, because that is the only way to do it. What other data would you use to process an image aside from the pixels?

I am not undermining Apple’s technology. With Ian Goodfellow himself as the Director of Machine Learning, it’s safe to bet that Apple plans to continue developing technologies based on Generative Adversarial Networks. …


Data Scientist’s Toolbox

The default MacOS terminal is not for you, or anyone really

Image for post
Image for post
Photo by Anthony Garand on Unsplash

Have you ever bought a mattress?

Let me tell you about my experience with the notorious mattress salesmen.

Salesmen would talk their way into making you buy something you don’t really need or a more expensive version of something you do need.

Personally, I like to shop in peace. That means I would prefer not having a salesman follow me while I browse. However, it is nearly impossible to find a mattress store without salesmen.

Once you heard one of the sales pitches, you realise that all of them are exactly the same. …


Image for post
Image for post
Screenshot of top posts by Adolos (source)

Machine Learning

Has GPT-3 matched human writers writing skill? Or is the writing quality bar set too low?

OpenAI’s GPT-3 had been in the spotlight for quite a while now. It is currently deemed as the state-of-the-art for NLP-related tasks, achieving better results than its predecessors.

GPT-3 is not yet publicly available. However, OpenAI have allowed access of the model’s API for beta testers, letting people experiment with the model.

As it turns out, the model can even create codes in various languages just by processing descriptions in natural language. Here is one of the examples posted in Twitter.

Tweet by Sharif Shameem (source)

These kind of tweets about GPT-3’s capabilities have sparked a lot of reactions. An article titled “Will The Latest AI Kill Coding?” entertained the idea of AI taking over programming jobs, followed promptly by another article “GPT-3 Will Not Take Your Programming…


Startup Story

The road to success and the road to failure are almost exactly the same.

Image for post
Image for post
Photo by Michael Longmire on Unsplash

Airbnb is arguably the poster child of a successful platform business. In The Business of Platforms, the company is praised over and over throughout the book and for good reasons. They have global presence and a strong business model.

Having global presence means future competitors will have a hard time to establish their business in the same market. It is still possible, but it would require huge initial cost to overpower Airbnb and capture its existing market.

The business model is straightforward, hosts and users both get charged when they are doing a transaction. On the other hand, simply listing properties or browsing available ones are free of charge. Compared to Uber, where they subsidised every trips and even gave incentives to the drivers to stay in the company, Airbnb’s business model has bigger potential to be profitable. …


Machine Learning

Simulating real world physics with graph network-based simulators

Image for post
Image for post
Model demonstration from Peter Battaglia (source)

A while ago, I was browsing through arXiv’s recent paper submissions in Machine Learning when I came across an interesting title.

Learning to Simulate Complex Physics with Graph Networks

I decided to dive deeper into it, and found out that the authors successfully combine and use several machine learning models to create a framework called “Graph Network-based Simulators” (GNS).

As you can see on the image above, the predicted water particle movement managed to behave similarly with the ground truth. It also produced comparable result for different starting conditions and other particles such as goop and sand too.

Contrary to existing simulation that requires re-rendering for any change in starting conditions, this model only needs to be trained once and can successfully predict how the particles would behave in different conditions. …


Data Scientist’s Toolbox

You have been using it everyday without paying attention.

Image for post
Image for post
Photo by Tianyi Ma on Unsplash

Throughout my journey working with data, I have discovered a tool that will help save your time and make you more productive, no matter what programming language you are using.

The shell.

When you run any program from the terminal, you are actually using shell to run it. Any command you type on the terminal, it runs on shell.

Unfortunately, most of us only learn a small amount of shell, mainly cd and ls to navigate through directories.

Other than that, maybe we learn tool-specific commands such as git and docker , and language-specific commands to compile and run different programming languages. However, we often treat remembering these commands as part of learning a tool or programming language instead of learning the shell itself. …

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store