When I enrolled in the Data Science Bootcamp in November 2023 I had a clear idea how what my final project would look like. I have always wanted to create a missing people resource page in Germany very similar to the Doe Network in the USA. Now comes the tricky part - in the EU we have a strict privacy law called GDPR, that sets up strict rules on how a citizen’s data could be used.

Nevertheless, I have decided to proceed with my idea, and in March 2024, MPM was born. The project centered around creating a Missing People Map, which involves scraping media articles, storing data into SQL databases, and overcoming numerous challenges, including privacy law concerns and dealing with unstructured data.

Overview

The core objective of this data science project was to develop a comprehensive map that visualizes the geographic patterns and demographics of missing persons. By scraping online media articles, I was able to collect extensive data on missing person cases in Germany. The use of advanced data scraping techniques enabled the extraction of vital information from various unstructured media sources, which was then meticulously stored in SQL databases for structured analysis.

Technical Approach

Initially, scraping media articles was pivotal; however, this required sophisticated web scraping tools to navigate and extract information from disparate news sources effectively. The extracted data, often unstructured and scattered, was then transformed and stored into databases, facilitating easier access and analysis.

Challenges and Solutions

One of the significant challenges encountered during the project was adhering to privacy laws. Given the sensitive nature of the data, I have implemented rigorous data handling protocols to comply with legal standards and ethical guidelines, ensuring that personal information was protected and anonymized where necessary.

Moreover, dealing with unstructured data from diverse media sources presented another substantial hurdle. To address this, a natural language processing (NLP) techniques and data cleaning methods to parse, clean, and structure the data accurately, enabling efficient analysis and visualization was deployed.

Outcomes and Impact

The culmination of this project was the creation of an interactive ‘Missing People Map’, which serves as a potent tool for authorities, researchers, and the public in understanding and addressing the issue of missing persons. By providing insights into patterns and trends, the map aids in identifying areas with high incidences of missing cases and potentially uncovers underlying causes or correlations.

Conclusion

Despite the challenges posed by privacy laws and unstructured data, the successful execution of the ‘Missing People Map’ project highlights the potential of data science to make a meaningful impact. As technology advances, such initiatives become increasingly crucial in our collective efforts to address and mitigate global challenges.