--- A collection of traces of web requests and responses. The collection spans traces of connections to 100 sites, collected hourly over several months from November 2003 through March 2004. Each connection was encrypted; the traces include only the TCP headers, and not the payload. More details are in the paper and README. --- The data in the following files: md5sum filename 0180cb9866b131cd90e987dfc82b0898 webident-traces-2003-10.tar.bz2 a973706f08ab33c393dbb233ba5661ee webident-traces-2003-11.tar.bz2 092dca9b8fe1f0f816049050455bf056 webident-traces-2003-12.tar.bz2 fbba3c3f8e901cbc8213a9cce17a04ae webident-traces-2004-01.tar.bz2 199521b7f0fb0f79f3b850babf933ed0 webident-traces-2004-02.tar.bz2 7bae6073337423c902279e6179c1365f webident-traces-2004-03.tar.bz2 were collected as described in: @inproceedings{pet05-bissias, title = {Privacy Vulnerabilities in Encrypted {HTTP} Streams}, author = {George Dean Bissias and Marc Liberatore and Brian Neil Levine}, booktitle = {Proceedings of the Privacy Enhancing Technologies Workshop (PET 2005)}, year = {2005}, month = {May}, www_pdf_url = {http://prisms.cs.umass.edu/brian/pubs/bissias.liberatore.pet.2005.pdf}, } If you use this data set in your own published research, please refer to it by citing the above paper. Each tarball contains files created by tcpdump, with names of the form: webident-traces/YYYY-MM/YYYY-MM-DD-HHmm-X YYYY - year MM - month DD - day HH - hour mm - minute X - site identifier Traces were collected sequentially, once per hour. There are holes in the sequence, due to various failures during the collection: do not assume it is continuous from start to end. The site identifier is consistent across all traces. Sites 3, 4, and 54 were in the same subdomain (cs.umass.edu) as the collecting machine. Sites 36, 37, 66, and 92 were in the same domain (umass.edu) as the collecting machine. The traces are of traffic sent over an SSH tunnel to a proxy, all atop the loopback interface of a Linux host. A consequence of the use of the loopback interface is a larger-than-usual MTU, reflected in the traces. The proxy ran on port 8888, thus when parsing the traces, data originating on this port can be considered to be from the remote site. Marc Liberatore liberato@cs.umass.edu 06 July 2006