I was wondering what months get the most comments on Hacker News “Who’s Hiring” posts. January seems to be the slowest month, as I expected. I also expected December to be slow, but it turns out that’s not the case.
A by-product of scraping and analyzing the posts from Hacker News, I get Document-Term Matrix of all the words mentioned each month. This gives a bit of insight into trends about what programming languages, frameworks and technologies people are looking to hire for. I was expecting a little more clear cut trends for technologies to be dropping of rising over the two and a half years for which I have data. The popularity of a term is determined by the number of mentions in total per month, I could have broken things down a bit differently; maybe count the number of posts that have at least one mention of a term.
Using an IPython Notebook to scrape Hacker News with Requests, parse HTML with BeautifulSoup, analyze with Pandas and draw graphs with matplotlib feels a lot more natural to me than working in an environment like R. I’m looking forward to more little projects of analyzing data similar to this in the future.