Sunday, February 26, 2017

Thursday, February 2, 2017

UnicodeDecodeError when reading CSV file in Pandas with Python 3+

Am getting the error "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 10: invalid start byte "while running the following code
import pandas as pd
dataFile='~/BX-Book-Ratings.csv'
data=pd.read_csv(dataFile,sep=";",header=0,names=["user","isbn","rating"])
Solution:
df = pd.read_csv(dataFile,sep=";",header=0,encoding = "ISO-8859-1",names=["user","isbn","rating"])

Wednesday, December 28, 2016

Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable

Got the chance to work on Spark2.0 data frames on Atom. We have faced the following issues and then lot of investigation, have fixed the issue with the following URL.
http://stackoverflow.com/questions/34196302/the-root-scratch-dir-tmp-hive-on-hdfs-should-be-writable-current-permissions