Computing Systems Data

S1 - Yahoo! Sherpa database platform system measurements, version 1.0 (33 K)

This dataset contains a series of traces of hardware resource usage during operation of the PNUTS/Sherpa database. The measurements include CPU utilization, memory utilization, disk utilization, network traffic, and so on. Additionally, metrics specific to particular components of the system, such as the Apache and MySQL? servers, are also included. The traces represent measuring the system resource usage at 1 minute granularity during various database workloads, including read-heavy, write-heavy and scan-oriented workloads. The data can be used to analyze and simulate the bottlenecks experienced in a real cloud database system under load. This size of this dataset is 33 K.

S2 - Yahoo! Statistical Information Regarding Files and Access Pattern to Files in one of Yahoo's Clusters (2.3 K)

This dataset contains total number of files, total file size, number of file accesses, number of days between the first access and the most recent access, file distribution, deletion rate of files and directories, creation rate of files and directories in a dilithium-gold cluster. The size of this dataset is 2.3K.

S3 - Yahoo Hadoop grid logs, version 1.0 (8.8G) (Hosted on AWS)

This dataset contains the HDFS audit logs that contain information about HDFS file access.

S5 - A Labeled Anomaly Detection Dataset, version 1.0(16M)

Automatic anomaly detection is critical in today's world where the sheer volume of data makes it impossible to tag outliers manually. The goal of this dataset is to benchmark your anomaly detection algorithm. The dataset consists of real and synthetic time-series with tagged anomaly points. The dataset tests the detection accuracy of various anomaly-types including outliers and change-points. The synthetic dataset consists of time-series with varying trend, noise and seasonality. The real dataset consists of time-series representing the metrics of various Yahoo services.