Collecting and Processing Data from Pokec using Python

Recently I have programmed Python scripts - config.py and pokec_get_stats.py, which I can use to collect statistics about the users in a particular room of the popular Slovak social network Pokec. My goal is to practice the Python programming and test the functionality of Selenium WebDriver. The Selenium WebDriver is used to emulate user interaction with a web browser.

Selenium WebDriver is a testing tool used to automate the testing of web applications. It is one of the components of the Selenium family, which also includes Selenium IDE, Selenium Client API, Selenium Remote Control and Selenium Grid. Selenium supports many programming languages such as Java, Python, C#, Ruby, Perl, and also web browsers such as Google Chrome, Mozilla Firefox, Internet Explorer, Safari.

The script pokec_get_stats.py saves all links to open Pokec rooms in the room_list array and visits all rooms in turn. After entering the room, the script counts the users, counts the number of women and men in the room and calculates their percentage, and also calculates their average age. The script also reports the city from which the most users come, along with the number of users  (Picture 1).

Picture 1 - Truncated Console Output with Statistics for Rooms - Prešov, Trenčín, Martin
Click image to enlarge

The statistics are displayed in the console output and are also sent to the room window so that all users in the room can read them (Picture 2).

Picture 1 - Statistics in Room Window
Click image to enlarge

The config.py file contains the variables username and password that are used to log into Pokec. The config.py file  contains the path to the ChromeDriver and the variable room number.

I prefer Google's Chrome browser because it is widely used, so it is quite likely that you can use my script without having to install Chrome or Chromium. You just need to download the ChromeDriver version which should match your Chrome browser version.

ChromeDriver is a separate executable that Selenium WebDriver uses to control Chrome. It is maintained by the Chromium team with help from WebDriver contributors. ChromeDriver expects you to have Chrome installed in the default location for your platform.

Usage:

  • Download the scripts - config.py, pokec_get_stats.py from the Download section of this blog (Networking & Servers) and store them to the same directory
  • Download the ChromeDriver binary for your platform here. ChromeDrive must match your Chrome version
  • Edit the config.py file and change the path to reflect the location of the ChromeDriver. Change the username and password in the config.py file to match your Pokec credentials.

$ python3 ./pokec_get_stats.py

Disclaimer:
Please use the script wisely and don't spam other conversations. The script is shared for educational purposes only and I am not responsible for any damages caused by malicious use.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.