data-and-data

Posts

Computing Loss in PyTorch

March 30, 2022

How it works: How the implementation in PyTorch looks like: Rule of thumb: The more accurate the network, the smaller the loss. reference: Datacamp's Introduction to Deep Learning with PyTorch

First thing, import torch and torch.nn In the next steps, define a random input tensor with the shape of (2,3). See the difference when softmax is applied on the dimension-0 (dim=0) and dimension-1 (dim=1). Hope this helps. reference: https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html

A Reminder for Everyone: Be Sceptical!

March 19, 2022

Mengingat perkembangan akhir-akhir ini ketika kita banyak disuguhkan data dan visualisasi yang terdistorsi, ada semacam kewajiban moral dan intelektual untuk mengingatkan kita semua untuk tidak langsung tersilaukan oleh pernyataan-pernyataan yang megklaim merupakan produk dari analisis dan pengolahan data. Penggunaan terminologi yang tidak pada tempatnya (dan bahkan cenderung abusive ) seperti big data dan machine learning harus dikritisi habis-habisan, tidak peduli siapapun yang menggunakan istilah-istilah tersebut. Seringkali saya pribadi jumpai penggunaan tidak pada tempatnya atas istilah-istilah tersebut dilakukan oleh orang-orang yang bahkan tidak mengerti apa 'big data' atau 'machine learning' itu sendiri. Belum lagi produk yang diklaim berasal dari so-called data analysis seringkali digunakan sebagai justifikasi pembentukan opini publik atau bahkan lebih parah lagi: policy . Kental dengan bias dan conflict of inte...

Convert Epoch Time to Datetime in Python

March 17, 2022

import datetime epoch_time = 1541290680 result_datetime = datetime.datetime.fromtimestamp(epoch_time) print(result_datetime) # prints 2018-11-04 12:18:00 or alternatively, if you're in rush, head to online epoch converter here the above code is modified from: https://www.javatpoint.com/python-epoch-to-datetime

Word Wrapping in Google Colab using textwrap

March 14, 2022

import textwrap wrapper = textwrap.TextWrapper(width=40, initial_indent=" " * 4, subsequent_indent=" " * 4, break_long_words=False, break_on_hyphens=False) print( wrapper.fill (string)) source

Using dict.get()

March 06, 2022

dict.get() is used to get the value of an item in a given dictionary using its key (see W3Schools for further reference). Why do we use e this method? We can access the item value just by calling its key directly, can't we? (e.g. dict[key]) I think the main advantage of this method is we can actually check whether or not a key exists in the given dictionary without having the hassle of getting an error returned if the key does not exist. How come? dict.get() accepts two parameters: the key itself and ... the value (optional). The value will return a specified value if the key we look for does not exist in the dictionary. Example: Suppose we have a dictionary, namely d, as follows: d = {'a': 1, 'b':2, 'c':3} Suppose we would like to get the value of an item with key = 'z'. d['z'] will raise an error due to the fact that there is no item in the dictionary d whose key is 'z'. Now, if we apply dict.get() as follows:...

How to Get The Replication Factor of HDFS Files

February 20, 2022

There are two ways to get the replication factor of HDFS files. Suppose we want to obtain the replication factor of the 2016 Olympic's Tweet dataset stored in /data/olympictweets2016rio. The first way would be to run the following command from your terminal: which will be returning the replication factor as follows: As shown above, the replication factor is 10. However, in most cases, the default replication factor is 3. The other way would be simply using hadoop fs -ls command. Just make sure you state the path of the dataset from which you want to obtain the replication factor. The above command will be returning the information of the target directory or files. Have a look at the replication factor represented by the second column right after the permission part. Both ways return the exact replication factor, which in this case is 10. reference: StackOverflow