importing pyspark in python shell
This is a copy of someone else's question on another forum that was never answered, so I thought I'd re-ask it here, as I have the same issue. (Seehttp://geekple.com/blogs/feeds/Xgzu7/posts/351703064084736)
I have Spark installed properly on my machine and am able to run python programs with the pyspark modules without error when using ./bin/pyspark as my python interpreter.
However, when I attempt to run the regular Python shell, when I try to import pyspark modules I get this error:
from pyspark import SparkContext
and it says
"No module named pyspark".
How can I fix this? Is there an environment variable I need to set to point Python to the pyspark headers/libraries/etc.? If my spark installation is /spark/, which pyspark paths do I need to include? Or can pyspark programs only be run from the pyspark interpreter?
Turns out that the pyspark bin is LOADING python and automatically loading the correct library paths. Check out $SPARK_HOME/bin/pyspark :
# Add the PySpark classes to the Python path: export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
I added this line to my .bashrc file and the modules are now correctly found!