Instead of writing a traditional blog post, I’m going to answer some frequently asked questions about our Instagram data coverage.
1. Do you have access to Instagram’s “firehose”?
In other words, Instagram is a product of Facebook, which is, first and foremost, an advertising business. Well, so it presents itself as a media organisation, but in reality it is a massively effective advertising engine. Their business strategy does not include the sale of data for analytical purposes. Facebook’s main concern is the satisfaction of its users, and the company sees data sharing as a means through which marketers may serve more relevant ads to Facebook’s audience. Neither Facebook nor Instagram provide a data firehose or any sort of pay-for-data company since it is not their primary emphasis.
2. Then, how do you collect Instagram information?
To put ourselves in the same position as everyone else, we too may access the Instagram Public APIs and harvest its information.
Instead of doing it themselves, some retailers have aggregators do the legwork. We skipped that step, which puts us in charge of the information we gather (basically more stuff customers want, less spam and irrelevant mentions). As this is a free and publicly available API, we, like all other providers, are subject to rate limits and must be strategic about the resources we prioritise throughout our crawl.
3. The Instagram post crawling process
It helps to think of these processes as two distinct steps. The first step is to retrieve content from Instagram, and the second is to compare that content to your Query.
To acquire Instagram updates: In the absence of a more robust search API, we resort to Instagram’s tags endpoint.
The result is a list of tweets with the specified hashtag. When a consumer submits a query, we collect all of the hashtags they use (where users have specified the hashtag: operator). We then iterate over this list, asking for posts tagged with each hashtag.
When a post is added to Brandwatch’s data repository, it may be matched to any customer Query, regardless of whether or not the Query included the hashtag that uncovered the post.
4. I’m curious as to the mechanics of Instagram post and comment crawling.
Brandwatch’s Instagram analytics channels provide this functionality.
To utilise it, you just make a channel and add the Instagram user whose content and comments/likes you want to crawl. This is in place of making a traditional Query.
5. For what purpose does Brandwatch require my Instagram login details?
Brandwatch may have prompted you to sign in with Instagram either a notification or the authentication menu in the upper right of the app.
The problem is that tokens have a limited number of uses before they get exhausted and must renew. So, the more our supply of tokens, the greater our capability to crawl. Instagram accounts can provide a token that can be stored and used to boost our crawling capability.
6. How can I have access to Instagram’s whole data set?
Here are some recommendations for improving our data retrieval procedures.
Improve your Brandwatch Analytics Queries by including the Instagram hash tags you want us to request.
Make sure your brand’s Instagram account, as well as any others you wish us to monitor, are added to a Brandwatch Analytics Instagram channel.
Brandwatch Analytics can verify Instagram profiles for you. To accommodate your high-traffic Brandwatch Analytics Channels, we’ll give you a bit more crawling capacity.
If you have a group of highly influential influencers, you may turn them into Instagram Channels, making their data accessible to all of your other Inquiries.
7. What are the future plans for Instagram crawling?
The current situation is not satisfactory, and we demand more. There are three main ways in which I believe our Instagram coverage may benefit from an update:
Hashtag coverage breadth – making sure that all hashtags, including the most popular ones, are fully retrieved for every user request.
Extensive hashtag coverage, with additional crawling of unasked-for hashtags yielding superior historical data in the long run. We could take a step in a potentially intriguing direction by adding more hashtags discovered on incoming posts to our crawlers.
I think it’s quite helpful when a hashtag is covered, not only the original post but also the comments and the discourse that ensued because of it.
Now, above all else, we must focus on the top priority. In fact, we have recently established a separate engineering team to focus only on this, and they are making excellent work on several fundamental architecture upgrades that will dramatically enhance the scale at which we can retrieve postings. In the next weeks, I will have more to write about, so stay tuned for more blog postings.