尧图网站建设 尧图网络
  • 首页
  • 关于我们
  • 服务项目
  • 案例展示
  • 建站流程
  • 资讯中心
  • 联系我们
首页/资讯中心/详情

PySpark - Read Data from PostgreSQL

PySpark - Read Data from PostgreSQL
📅 发布时间:2026/6/18 19:18:35
PySpark - Read Data from PostgreSQL

 

jdbc_url = "jdbc:postgresql://localhost:5432/postgres"
properties = {"user": "postgres","password": "postgrespwd","driver": "org.postgresql.Driver"
}table_name = "player_seasons"df = spark.read.jdbc(url=jdbc_url, table=table_name, properties=properties)df.show()

When I ran above code, I got below error:

Py4JJavaError: An error occurred while calling o28.jdbc. : java.lang.ClassNotFoundException: org.postgresql.Driver

 

You need the PostgreSQL JDBC JAR.


Step 1: Download the JDBC driver

Go to the PostgreSQL site, or directly:

wget https://jdbc.postgresql.org/download/postgresql-42.7.2.jar

You can also use any recent 42.x.x version.


Step 2: Make the JAR available to Spark

There are three common ways:

Option 1: Copy to Spark jars/ directory

cp postgresql-42.7.2.jar $SPARK_HOME/jars/

Spark automatically loads all JARs in that folder.


Option 2: Pass the JAR when creating SparkSession

df = (SparkSession.builder.appName("PostgreSQL Test").config("spark.jars", "/home/frank/postgresql-42.7.2.jar").getOrCreate()
)

Now your spark.read.jdbc(...) will find the driver.


Option 3: Use Maven coordinates (auto-download)

spark = (SparkSession.builder.appName("PostgreSQL Test").config("spark.jars.packages", "org.postgresql:postgresql:42.7.2").getOrCreate()
)

Spark will automatically download the JAR from Maven.


Step 3: Retry reading the table

jdbc_url = "jdbc:postgresql://localhost:5432/postgres"
properties = {"user": "postgres","password": "postgrespwd","driver": "org.postgresql.Driver"
}table_name = "player_seasons"df = spark.read.jdbc(url=jdbc_url, table=table_name, properties=properties)df.show()

Now it should work. ✅


⚡ Note

    • Make sure your PostgreSQL is accessible from WSL.
      If PostgreSQL is running on Windows, use the WSL IP of Windows host, e.g., jdbc:postgresql://<Windows-IP>:5432/postgres
      localhost inside WSL points to WSL itself, not Windows.

 

jdbc_url = "jdbc:postgresql://172.26.64.1:5432/postgres"
properties = {"user": "postgres","password": "postgrespwd","driver": "org.postgresql.Driver"
}table_name = "player_seasons"df = spark.read.jdbc(url=jdbc_url, table=table_name, properties=properties)df.show()

 

I chose option 1, do I need to restart Spark after copying the jar file? I have changed 'localhost' to Windows-IP, the code still got the same error.
 
ChatGPT said:

Yes — you must restart Spark after copying the JAR into $SPARK_HOME/jars/.

Here’s why:

  • When you start Spark, it loads all JARs in $SPARK_HOME/jars/ into the JVM classpath.

  • If you copy a new JAR after Spark is already running, the running JVM cannot see it.

  • This is exactly why your code still shows:

java.lang.ClassNotFoundException: org.postgresql.Driver

✅ Correct steps after copying the JAR

  1. Stop any running Spark master or workers:

  1. Start Spark master & worker again:

  1. Restart your Jupyter kernel (or Python process):

  1. Then run your JDBC code again.

 
[Stage 0:>                                                          (0 + 1) / 1]
+-----------+---+------+------+-------+-------+----------+-----------+------------+---+---+---+---+------+--------+--------+-------+------+-------+------+
|player_name|age|height|weight|college|country|draft_year|draft_round|draft_number| gp|pts|reb|ast|netrtg|oreb_pct|dreb_pct|usg_pct|ts_pct|ast_pct|season|
+-----------+---+------+------+-------+-------+----------+-----------+------------+---+---+---+---+------+--------+--------+-------+------+-------+------+
+-----------+---+------+------+-------+-------+----------+-----------+------------+---+---+---+---+------+--------+--------+-------+------+-------+------+

 

相关新闻

  • 单部电梯调度总结
  • MySQL的 索引类型 normal、 unique 和 索引方法 btree 、 hash
  • 精读 GitHub - servo 浏览器(一)

最新新闻

  • 重庆2026闲置奢包回收测评:迪奥古驰Prada变现定价商家优选指南 - 名奢变现站
  • 机器学习模型上线实战:从Notebook到生产环境的系统工程
  • 耗时三天走遍昆明盘龙、五华、呈贡三区,7家黄金回收实体店完整实测对比全过程 - 开心测评
  • SuiteCRM高危SQL注入漏洞CVE-2024-36412深度剖析与防御实践
  • 鸣潮自动化终极指南:5步轻松实现后台挂机解放双手
  • 宁波海曙/鄞州/江北黄金回收推荐,逸程全城上门,实时金价无套路 - 逸程

日新闻

  • 5分钟掌握Python进化算法:Geatpy高性能优化工具完全指南
  • Microchip 24AA044 EEPROM选型与应用全指南:从参数解析到实战编程
  • 华为的鸿蒙到底有多牛?为什么称作遥遥领先?

周新闻

  • 3步解锁iOS设备:applera1n激活锁绕过完全指南
  • 39 2026 人工智能证书终极盘点,普通人选 AI 证书可以从这些方向入手
  • Redis 暴露公网有多危险?从端口检查到补救步骤

月新闻

  • 【总结】入门篇:50句话让你记住架构核心概念
  • WeChatMsg技术方案解析:实现Mac微信数据自主管理的完整解决方案
  • WeChatMsg:革新性微信数据备份方案,打造你的专属数字记忆库

关于尧图

  • 公司简介
  • 团队介绍
  • 企业文化
  • 荣誉资质

服务项目

  • 定制开发
  • 电商建站
  • UI 设计
  • 运维服务

快速链接

  • 案例展示
  • 建站流程
  • 常见问题
  • 资讯中心

联系方式

  • 📍北京市朝阳区互联网产业园 A 座 10 层
  • 📞400-888-8888
  • ✉️contact@rkmt.cn
  • 🕐周一至周日 9:00-21:00

© 2024 北京尧图网络科技有限公司 版权所有 | 京 ICP 备 XXXXXXXX 号