Web Scraping Latest Indonesia Earthquake Data with Python Using BeautifulSoup

3 min readFeb 2, 2024

Web scraping is a valuable technique for extracting data from web pages, providing a means to access relevant information efficiently. In this context, we will explore the use of Python for web scraping, specifically utilizing the BeautifulSoup library to retrieve the latest earthquake information from the official website of the Indonesian Meteorology, Climatology, and Geophysics Agency (BMKG).

Introduction

Earthquakes are geophysical events with potentially significant impacts on human life. Therefore, staying informed about the latest earthquake information is crucial. The BMKG website serves as a reliable source for such data.

Step One: Importing Libraries

The initial step in developing a web scraping program involves importing necessary libraries. In this case, the requests library facilitates web page downloads, while BeautifulSoup aids in HTML parsing.

import requests
from bs4 import BeautifulSoup

Understanding BeautifulSoup

BeautifulSoup is a Python library designed for web scraping purposes. It provides tools for pulling data out of HTML and XML files, making it easier to navigate and search the parsed tree. With its user-friendly syntax, BeautifulSoup simplifies the extraction of specific information from HTML documents.

Data Extraction Function

The ekstraksi_data function is responsible for retrieving the latest earthquake data from the BMKG site. Leveraging the requests module for downloading and BeautifulSoup for parsing, this function ensures seamless extraction of the required information from the HTML.

def ekstraksi_data():
    """
    Tanggal :01 Februari 2024, 18:36:28 WIB
    Magnitudo: 2.8
    Kedalaman: 10 km
    Lokasi: 10.25 LS - 124.04 BT
    Keterangan: Pusat gempa berada di laut 33 km Tenggara Kabupaten Kupang
    Dampak: Dirasakan (Skala MMI): II Kabupaten Kupang
    """
    header = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
    }

    try:
        content = requests.get("https://www.bmkg.go.id/", headers=header)
    except Exception:
        return None

    if content.status_code == 200:
        soup = BeautifulSoup(content.text, 'html.parser')
        result = soup.find('div',{'class':'gempabumi-home-bg margin-top-13'})
        result = result.findChildren('li')
        i=0
        time = None
        magnitudo = None
        kedalaman=None
        lokasi = None
        keterangan = None
        dampak = None
        for res in result:
            if i==0:
                time=res
            elif i==1:
                magnitudo=res
            elif i==2:
                kedalaman=res
            elif i==3:
                lokasi=res
            elif i==7:
                keterangan=res
            elif i==8:
                dampak=res

            i=i+1


        time = time.text.split(', ')

        hasil = dict()
        hasil['tanggal']=time[0]
        hasil['waktu']=time[1]
        hasil['magnitudo']=magnitudo.text
        hasil['kedalaman']=kedalaman.text
        hasil['lokasi']=lokasi.text
        hasil['keterangan']=keterangan.text
        hasil['dampak']=dampak.text

        return hasil

    return None

Code Explanation:

Header Setup:
A custom User-Agent header is defined to mimic a web browser. This is a common practice to avoid being blocked by the server’s anti-scraping measures.
Request and Exception Handling:
A try-except block is implemented to handle potential exceptions that may occur during the HTTP request. If an exception occurs, the function returns None.
Web Scraping with BeautifulSoup:
The requests.get method is used to fetch the HTML content from the BMKG website.
If the request is successful (status code 200), the HTML content is parsed using BeautifulSoup.
The code navigates the HTML structure to find the relevant information within the <div> element with the class 'gempabumi-home-bg margin-top-13' and its <li> children.
Data Extraction and Dictionary Creation:
The code iterates through the <li> elements and extracts information such as time, magnitude, depth, location, description, and impact.
The extracted information is stored in a dictionary named hasil (meaning result in Indonesian).
Return Statement:
The function returns the hasil dictionary containing the extracted earthquake data.
Fallback for Unsuccessful Request:
If the HTTP request is not successful, the function returns None.

This code snippet effectively fetches and processes earthquake data from the BMKG website, demonstrating the practical implementation of web scraping techniques with Python.

Display Data Function

Once the data extraction is successful, the tampilkan_data function displays the results. If no data is found, the program provides an appropriate message.

def tampilkan_data(result):
    if result is None:
        print("Data not found")
    print(result)

Conclusion

In conclusion, the combination of web scraping techniques and Python, with the aid of the BeautifulSoup library, facilitates the extraction of the latest earthquake information from the BMKG website. It's essential to note that adherence to the rules and policies of the accessed website ensures ethical and legal use of web scraping.