Gender Identification Challenges in Online Edited Writing and Copywriting Considered

Generally, when we read a news item from an online source there is a name attached to the article but not always. When we read these articles with no name, we probably often wonder who wrote it, if they are male or female, or what their background is. It seems rather unfortunate when the author is not identified in the article, and it also leaves question marks in our minds having a stop and wonder if the article is even credible.

It may very well be, but if we can’t even know the sex of the person who wrote it, how are we to be sure if the advice being given is legitimate, or worthy. Let me give you an example where this might matter. Perhaps you are reading an article about relationships, dating, marriage, or getting along with your spouse. Wouldn’t you care who wrote the article, and what sex they are? After all, you might consider the advice to be less than relevant or take it with a grain of salt depending on what the gender of the author was.

In an interesting research paper on this topic “Automatically Profiling the Author of an Anonymous Text,” by Shlomo Argamon, Moshe Koppel, James W. Pennebaker, and Jonathan Schler the researchers looked at bloggers and forum posters online and were able to determine the gender of the individual posters at an alarmingly high percentage.

In one table the research paper shows; “the style features that prove to be most useful for gender discrimination are determiners and prepositions (markers of male writing) and pronouns (markers of female writing). (Note that for the case of gender, as with all the other problems we consider here, the most discriminating style features include parts-of-speech and not only function words.”

Well, that’s pretty good isn’t it? These computer algorithms can figure out to a very high percentage of probability the gender of the author. However, what if it is a man writing an article who has his secretary edit it, or what if a woman wrote the article, and she has an editing team, and a male was the one doing the editing? Then all bets are off, and the article will appear to have been written by either a man or a woman, or it simply won’t register on either side of that scale, rather the computer algorithm will show it as gender-neutral, or take a guess, which could easily be wrong, due to a much lower probability.

This is why I have to question the author gender identification algorithms and their use in the real world, or rather using a real application of an article written and posted online in the virtual world. It’s not that these systems don’t work well and come with a high probability rather, it is that sometimes they can’t work at all or they could give us a false positive, or the wrong impression.

Likewise, if you are reading the text of a speech given by a politician, we need to ask who the speechwriter was, a male politician could be reading a speech written by a female speechwriter, and the computer system might show that it was written by a woman, but we know that a man was the one the one who stood up to the podium and stated those words.

Although these things are known by those that write the algorithms for author gender identification, this is not something that is widely considered by those who might assume that the high percentage rates and probabilities are valid in all cases. Please consider all this and think on it. By the way this article was written and edited by a man, but you already knew that due to its poor quality!