Web Scraping Latest Indonesia Earthquake Data with Python Using BeautifulSoup
Web scraping is a valuable technique for extracting data from web pages, providing a means to access relevant information efficiently. In this context, we will explore the use of Python for web scraping, specifically utilizing the BeautifulSoup
library to retrieve the latest earthquake information from the official website of the Indonesian Meteorology, Climatology, and Geophysics Agency (BMKG).
Introduction
Earthquakes are geophysical events with potentially significant impacts on human life. Therefore, staying informed about the latest earthquake information is crucial. The BMKG website serves as a reliable source for such data.
Step One: Importing Libraries
The initial step in developing a web scraping program involves importing necessary libraries. In this case, the requests
library facilitates web page downloads, while BeautifulSoup
aids in HTML parsing.
import requests
from bs4 import BeautifulSoup
Understanding BeautifulSoup
BeautifulSoup
is a Python library designed for web scraping purposes. It provides tools for pulling data out of HTML and XML files, making it easier to navigate and search the parsed tree. With its user-friendly syntax, BeautifulSoup
simplifies the extraction of specific information from HTML documents.
Data Extraction Function
The ekstraksi_data
function is responsible for retrieving the latest earthquake data from the BMKG site. Leveraging the requests
module for downloading and BeautifulSoup
for parsing, this function ensures seamless extraction of the required information from the HTML.
def ekstraksi_data():
"""
Tanggal :01 Februari 2024, 18:36:28 WIB
Magnitudo: 2.8
Kedalaman: 10 km
Lokasi: 10.25 LS - 124.04 BT
Keterangan: Pusat gempa berada di laut 33 km Tenggara Kabupaten Kupang
Dampak: Dirasakan (Skala MMI): II Kabupaten Kupang
"""
header = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
try:
content = requests.get("https://www.bmkg.go.id/", headers=header)
except Exception:
return None
if content.status_code == 200:
soup = BeautifulSoup(content.text, 'html.parser')
result = soup.find('div',{'class':'gempabumi-home-bg margin-top-13'})
result = result.findChildren('li')
i=0
time = None
magnitudo = None
kedalaman=None
lokasi = None
keterangan = None
dampak = None
for res in result:
if i==0:
time=res
elif i==1:
magnitudo=res
elif i==2:
kedalaman=res
elif i==3:
lokasi=res
elif i==7:
keterangan=res
elif i==8:
dampak=res
i=i+1
time = time.text.split(', ')
hasil = dict()
hasil['tanggal']=time[0]
hasil['waktu']=time[1]
hasil['magnitudo']=magnitudo.text
hasil['kedalaman']=kedalaman.text
hasil['lokasi']=lokasi.text
hasil['keterangan']=keterangan.text
hasil['dampak']=dampak.text
return hasil
return None
Code Explanation:
- Header Setup:
A custom User-Agent header is defined to mimic a web browser. This is a common practice to avoid being blocked by the server’s anti-scraping measures. - Request and Exception Handling:
Atry-except
block is implemented to handle potential exceptions that may occur during the HTTP request. If an exception occurs, the function returnsNone
. - Web Scraping with BeautifulSoup:
Therequests.get
method is used to fetch the HTML content from the BMKG website.
If the request is successful (status code 200), the HTML content is parsed usingBeautifulSoup
.
The code navigates the HTML structure to find the relevant information within the<div>
element with the class 'gempabumi-home-bg margin-top-13' and its<li>
children. - Data Extraction and Dictionary Creation:
The code iterates through the<li>
elements and extracts information such as time, magnitude, depth, location, description, and impact.
The extracted information is stored in a dictionary namedhasil
(meaning result in Indonesian). - Return Statement:
The function returns thehasil
dictionary containing the extracted earthquake data. - Fallback for Unsuccessful Request:
If the HTTP request is not successful, the function returnsNone
.
This code snippet effectively fetches and processes earthquake data from the BMKG website, demonstrating the practical implementation of web scraping techniques with Python.
Display Data Function
Once the data extraction is successful, the tampilkan_data
function displays the results. If no data is found, the program provides an appropriate message.
def tampilkan_data(result):
if result is None:
print("Data not found")
print(result)
Conclusion
In conclusion, the combination of web scraping techniques and Python, with the aid of the BeautifulSoup
library, facilitates the extraction of the latest earthquake information from the BMKG website. It's essential to note that adherence to the rules and policies of the accessed website ensures ethical and legal use of web scraping.