Posts

Computing Loss in PyTorch

Image
How it works: How the implementation in PyTorch looks like:   Rule of thumb: The more accurate the network, the smaller the loss. reference: Datacamp's Introduction to Deep Learning with PyTorch

Using Softmax in Pytorch

Image
First thing, import torch and torch.nn In the next steps, define a random input tensor with the shape of (2,3). See the difference when softmax is applied on the dimension-0 (dim=0) and dimension-1 (dim=1). Hope this helps. reference:  https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html

A Reminder for Everyone: Be Sceptical!

Image
Mengingat perkembangan akhir-akhir ini ketika kita banyak disuguhkan data dan visualisasi yang terdistorsi, ada semacam kewajiban moral dan intelektual untuk mengingatkan kita semua untuk tidak langsung tersilaukan oleh pernyataan-pernyataan yang megklaim merupakan produk dari analisis dan pengolahan data. Penggunaan terminologi yang tidak pada tempatnya (dan bahkan cenderung  abusive ) seperti  big data  dan  machine learning  harus dikritisi habis-habisan, tidak peduli siapapun yang menggunakan istilah-istilah tersebut. Seringkali saya pribadi jumpai penggunaan tidak pada tempatnya atas istilah-istilah tersebut dilakukan oleh orang-orang yang bahkan tidak mengerti apa 'big data' atau 'machine learning' itu sendiri.  Belum lagi produk yang diklaim berasal dari  so-called data analysis  seringkali digunakan sebagai justifikasi pembentukan opini publik atau bahkan lebih parah lagi:  policy . Kental dengan bias dan  conflict   of inte...

Convert Epoch Time to Datetime in Python

  import datetime epoch_time =  1541290680 result_datetime = datetime.datetime.fromtimestamp(epoch_time) print(result_datetime)  # prints 2018-11-04 12:18:00 or alternatively, if you're in rush, head to online epoch converter  here the above code is modified from:  https://www.javatpoint.com/python-epoch-to-datetime

Word Wrapping in Google Colab using textwrap

 import textwrap wrapper = textwrap.TextWrapper(width=40, initial_indent=" " * 4, subsequent_indent=" " * 4, break_long_words=False, break_on_hyphens=False) print( wrapper.fill (string)) source

Using dict.get()

  dict.get()  is used to get the value of an item in a given dictionary using its key (see  W3Schools  for further reference). Why do we use e this method? We can access the item value just by calling its key directly, can't we? (e.g. dict[key]) I think the main advantage of this method is we can actually check whether or not a key exists in the given dictionary without having the hassle of getting an error returned if the key does not exist. How come? dict.get()  accepts two parameters: the key itself and ...  the value   (optional).  The value  will return a specified value if the key we look for does not exist in the dictionary. Example: Suppose we have a dictionary, namely d, as follows: d = {'a': 1, 'b':2, 'c':3} Suppose we would like to get the value of an item with key = 'z'.  d['z'] will raise an error due to the fact that there is no item in the dictionary d whose key is 'z'.  Now, if we apply dict.get() as follows:...

How to Get The Replication Factor of HDFS Files

Image
There are two ways to get the replication factor of HDFS files.  Suppose we want to obtain the replication factor of the 2016 Olympic's Tweet dataset stored in  /data/olympictweets2016rio.  The first way would be to run the following command from your terminal: which will be returning the replication factor as follows: As shown above, the replication factor is 10. However, in most cases, the default replication factor is 3. The other way would be simply using  hadoop fs -ls   command. Just make sure you state the path of the dataset from which you want to obtain the replication factor. The above command will be returning the information of the target directory or files. Have a look at the replication factor represented by the second column right after the permission part. Both ways return the exact replication factor, which in this case is 10. reference:  StackOverflow