Corpus description

The MAS Corpus contains a set of manually tagged tweets in Spanish language of interest for marketing purposes. For every Twitter post, tags are provided to describe three different aspects of the text:

  • the sentiments
  • whether it makes a mention to an element of the marketing mix
  • the position of the tweet author with respect to the purchase funnel

Every tag is related to a single brand, which is also specified for every tweet.

A typical document contains: an identifier, a piece of text (in the full version), a set of annotations (love, satisfaction), the referred brand, the sector, other named entities.

For a number of documents, extended information has been given, linking data to external datasets (Thomson Reuters' PermID, etc.).

@prefix mas: <> .
@prefix rdf: <> .
@prefix sabv: <> .
@prefix sioc: <> .
@prefix marl: <> .
@prefix onyx: <> .
@prefix permid: <> .
@prefix gr: <> .
@prefix owl: <> .

mas:827146264517165056 a sioc:Post ;
  sioc:id "827146264517165056" ;
  sioc:content "Las camisetas nike 2002~2004 y las adidas 2006~2008 son el amor de mi vida"@es ;
  marl:describesObject mas:Nike ;
  sabd:isInPurchaseFunnel sabv:postPurchase;
  sabd:hasMarketingMix sabv:product;
  onyx:hasEmotion sabv:love, sabv:satisfaction, sabv:happiness ;
  marl:hasPolarity marl:positive ;
  marl:forDomain "SPORT" .  
Information on companies, brands and emotions is also given.
mas:Nike a gr:Brand ;
  rdfs:seeAlso <> ;
  sabd:1-5000062703 a gr:Business ;
  rdfs:label "Nike Inc", "Nike" ;
  owl:sameAs permid:1-4295904620 .

Corpus download

These datasets lack the Twitter texts due to copyright reasons. You can retrieve them from the ID.

Download The corpus contains purchase funnel and marketing mix tags, having being tagged using the following criteria and using the same ad-hoc vocabulary



This work has been submitted to the Fourth International Workshop at ESWC on Sentic Computing, Sentiment Analysis, Opinion mining and Emotion Detection

For copyright reasons, the text is not available for download (but requests at will be considered). However, the annotations are work of María Navas, Víctor Rodríguez, Alba Fernández and Idafen Santana. They are freely downloadable under a CC-BY 4.0 license.