The data were collected from a campus network for P2P file sharing based on the OpenNap server. The data consist of records of all the mp3 files shared by and transferred between users during an 81-day period between February 28, 2003 and May 21, 2003. Users are uniquely identified by an anonymous MD5 hash. No personal information was collected during this study and users gave explicit consent to anonymous collection of the data. The data are stored in XML format. The data are represented in a flexible graph format used by the open-source Proximity software package. Additional information about Proximity can be found at: http://kdl.cs.umass.edu/proximity/index.html As a brief overview, the Proximity graph representation contains objects, links and attributes. Objects and links have a special type attribute, ÔobjecttypeÕ and ÔlinktypeÕ, respectively, which determines the type of entity. In this P2P data, the objects represent Users, Files, Transfers, and Queries and the links capture relationships between these objects. The relationships and attributes present in this data are shown in the P2P-schema.html file, which is included in this download. Rudimentary consolidation was performed by making all filenames lowercase, converting spaces and punctuation to dashes, and doing simple artist-name recognition. Most of the filenames contained some combination of the track name of the song, the songÕs artist, the track number and album name. The most common form of the filename was -.mp3. Using this information and some hand labeling, we were able to generate a list of the most prevalent artists in the database and use that information to help determine if two files should be consolidated. Through consolidation we reduced the number of files to 291,925. We did minimal consolidation on misspelled or alternate spellings of artist names or track names. By considering only mp3s and performing simple name consolidation we were able to decrease the number of unique files by approximately 90% while only reducing the number of transfers and queries by 50% and the number of users by approximately 20%.