Load CSV file with Spark

I'm new to Spark and I'm trying to read CSV data from a file with Spark. Here's what I am doing :

    sc.textFile('file.csv')
        .map(lambda line: (line.split(',')[0], line.split(',')[1]))
        .collect()

I would expect this call to give me a list of the two first columns of my file but I'm getting this error :

    File "<ipython-input-60-73ea98550983>", line 1, in <lambda>
    IndexError: list index out of range

although my CSV file as more than one column.

Are you sure that all the lines have at least 2 columns? Can you try something like, just to check?:

    sc.textFile("file.csv") \
        .map(lambda line: line.split(",")) \
        .filter(lambda line: len(line)>1) \
        .map(lambda line: (line[0],line[1])) \
        .collect()

Alternatively, you could print the culprit (if any):

    sc.textFile("file.csv") \
        .map(lambda line: line.split(",")) \
        .filter(lambda line: len(line)<=1) \
        .collect()

From: stackoverflow.com/q/28782940

Back to homepage or read more recommendations: