Data Repository

Multilingual document datasets

  • EMNLP 2014 [README]
    • Download:
      Balanced and Unbalanced Datasets about 16MB (71MB uncompressed).
    • Used in (please cite):
      Salvatore Romeo, Andrea Tagarelli, Dino Ienco. Semantic-Based Multilingual Document Clustering via Tensor Modeling. In Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), October 25–29, 2014, Doha, Qatar.

Instagram datasets

  • ASONAM 2014 [README]
    • Source: Public media and user information from Instagram.com (through Instagram API).
    • Crawling period: Mar 1 – Apr 2, 2014.
    • Description: The anonymized user network contains asymmetric relations (A follows B); each edge is associated with #likes (by A to media created by B), #comments and the list of comments’ timestamps.
    • Size: about 54K vertices and 964K edges.
    • Download:
    • Used in (please cite):
      Andrea Tagarelli, Roberto Interdonato. Understanding Lurking Behaviors in Social Networks across Time. In Proc. 2014 IEEE/ACM International Conference on Advances in Social Network Analysis and Mining (ASONAM), August 17-20, 2014, Beijing, China.
  • HYPERTEXT 2014 [README]
    • Source: Public media and user information from Instagram.com (through Instagram API).
    • Crawling period: Jan 20 – Feb 17, 2014.
    • Description: The media dataset contains records of the form: the anonymized media ID, the anonymized ID of the user who created the media, the timestamp of media creation, the set of tags assigned to the media, the number of likes and the number of comments it received. The anonymized user network contains asymmetric relations (A follows B); each edge is associated with #likes (by A to media created by B), #comments and the list of comments’ timestamps.
    • Size:
      • Media dataset: 1.7M media associated to 2K users, with 9M tags, 1200M likes, and 41M comments.
      • User network: about 45K vertices and 678K edges.
    • Download:
    • Used in (please cite):
      Emilio Ferrara, Roberto Interdonato, Andrea Tagarelli. Online Popularity and Topical Interests through the lens of Instagram. In Proc. 25th ACM Conference on Hypertext and Social Media, September 1–4, 2014, Santiago, Chile.