Babatunde Onikoyi

Researcher

Web Designer

Microsoft Certified Educator

Freelancer

Data Analyst

Lecturer

Animation Designer

Babatunde Onikoyi

Researcher

Web Designer

Microsoft Certified Educator

Freelancer

Data Analyst

Lecturer

Animation Designer

Blog Post

Gender Classification Using Twitter Data

December 7, 2022 Research Dataset
Gender Classification Using Twitter Data

Description

This dataset is an expansion of the Twitter User Gender Classification dataset, which is freely available on Kaggle. The aim of this data for research is to predict user gender based on textual data available on Twitter.

The original dataset contained 12,894 distinct male and female twitter users with one tweet each. This was significantly expanded to 269,108 tweets by the same 12,894 users where each user had multiple tweets. Expansion method was using Tweepy to access the Twitter API.

The uploaded files contains the Train and Test split used for the experiment. It contains the following:

  • user_id – a unique id for each user
  • gender – male or female
  • gender:confidence – a float representing confidence in the provided gender (1 for 100%)
  • created_at – date and time when the tweet was created
  • tweet_id – the unique id of the text of a random tweet by the users

Attached also is a simple script on Jupyter Notebook using Tweepy. This is built to retrieve a tweet’s complete information using its ID which is known as the hydration of a tweet ID. Some sample tweet id’s are already in the script for testing purposes.

Original dataset is derived from this dataset
 
Download Files Below
Tags: