A Method for Longitudinal Behavioral Data Collection in Second Life

Nick Yee & Jeremy N. Bailenson
Department of Communication
Stanford University

Online Addendum

This serves as the online addendum to: Yee, N. & Bailenson, J.N. (2008). A Method for Longitudinal Behavioral Data Collection in Second Life. Presence, 17.


Persistent online virtual environments, whether game worlds like World of Warcraft or social worlds like Second Life, provide social scientists with the opportunity to collect longitudinal behavioral profiles from users. These environments allow behavioral measures of interesting variables at the individual and group level to be collected and analyzed. For example, studies have examined mutual gaze and personal space in Second Life (Friedman, Steed, & Slater, 2007; Yee, Bailenson, Urbanek, Chang, & Merget, 2007) . Behavioral data also allow researchers to avoid self-report questionnaires which have been shown to produce unreliable measures (Slater, 2004) .

On the other hand, even though these virtual environments are treasure troves of data for social scientists, typical social science curriculums do not provide researchers with the necessary background skills (e.g., programming, databases) to collect data from these emerging environments. Our goal here is not to provide, by any means, a cutting-edge solution from a technical perspective, but rather, our goal is to provide a foundational framework that others can easily modify for a wide variety of purposes. The solution we describe allows researchers to capture avatar-related data from Second Life (SL) at a resolution of one minute or less over a period of weeks.

Assumed Background Knowledge

While we will describe the solution at a good level of detail, it is not our intent here to teach LSL, web-based programming languages, or databases. As such, we will assume that the reader has some background knowledge of LSL, PHP, and MySQL, enough to modify the provided scripts accordingly.

It must also be pointed out that using PHP and MySQL implies that one have access to a web space that has these web tools installed. Many commercially-available web hosts provide PHP and MySQL in their basic packages, and these have a typical monthly cost of around $10. For example, see www.dreamhost.com. Again, it is not our intent here to discuss how to set up a web space or how to use PHP or MySQL. These are popular and standardized web tools (both PHP and MySQL are open-source packages) that many CS grad students or IT staff will be able to assist with.

Overview of Technical Architecture

In the hypothetical study for which our solution is designed, imagine that we wanted to track behavioral data from 40 participants using their own SL avatar over 2 weeks at a resolution of 30 seconds, capturing every avatar-related variable that SL gives us access to (e.g., Cartesian coordinates of locomotion, whether the avatar is typing, every character the user types, changes to body size, etc.). Assume that each participant is active in SL for at least 5 hours each week.

Due to the memory constraints of scripts in SL, it is not possible to store large amounts of data within SL directly. Thus, any data collection taking place in SL must pipe this data out externally, however, LSL (SL's scripting language) does not provide ways of connecting to databases. In our earlier study where we only needed snapshot data (non-longitudinal), it was acceptable to send data out via emails. In this study, however, we needed a more structured and centralized data store.

In our solution, which again we're not claiming is technically novel, we use LSL to send the data externally to a PHP-driven web page. The SL variables are stored in the link of the PHP page. The PHP script then pipes the variables into a MySQL database.

Variables Collected

We begin by showing the list of variables being captured by the script and the relevant LSL functions used to retrieve those variables. The "naming convention" column lists how the variable is named in the scripts.

Function Variables Type Naming
llGetOwner() Avatar Name String name
llListen() Avatar Chat String message
llGetAgentInfo() Is In Always Run Mode Bool runmode
Has Attachments Bool attachment
In Away Mode Bool isaway
In Busy Mode Bool isbusy
Is Crouching Bool crouching
Is Flying Bool flying
Is In Air Bool inair
Is Using Mouselook Bool mouselook
Is Sitting on Object Bool onobject
Has Scripted Attachments Bool scripted
Is Sitting Bool sitting
Is Typing Bool typing
Is Walking Bool walking
llGetAgentSize() Size Vector sizex
llGetObjectDetails() Position Vector posx
Global Position Vector gposx
Rotation Vector (Euler) rotx
Velocity Vector velx
changed() Teleported Bool teleported
llGetRegionName() Region String region
llSensor() Number of People in 20m radius Integer inradius
llGetTimeStamp() Time Time sltime
llGetRegionTimeDilation () Time Dilation in Sim Float dilation
llListen() Chat Data String message

The LSL Script (Periodic Data Collection and Transmission)

The LSL script uses a timer function to collect and send avatar-related variables every 30 seconds. In the scripts provided, the variables for setting the timer interval are clearly marked and can be changed according to the desired collection resolution. The provided script should be attached to an object and thus will serve as a tracking device. The script will start when users attach the object to themselves. To avoid attaching the object to the avatar visually, we recommend attaching the object to the HUD element (Heads Up Display) instead. In this way, only the user can see the object. And while the object can be anything, including being invisible, we recommend coloring the object with a bright or noticeable color to serve as an indicator of whether the tracking device is being worn. A further note is that changing outfits in SL can detach all previously attached objects. Thus, participants should be reminded to check whether the tracking device is still attached after changing outfits.

Note that you have to swap in your domain in the code below in two lines, both of which begin with "string url". Also note that the variable "verbose" is for debugging purposes ony and should be turned off during the study itself.

The PHP Scripts (Intermediate Data Handling)

In our implementation, we had one script for handling numeric variables and another script for handling chat data. This was because chat was only logged whenever the user spoke (rather than a strict x second interval). PHP allows one to pass variables via URLS. Thus, while the script may be located at:


it is also possible to pass variables to the script with:


This is the technique used in the following scripts to handle the incoming data.

One additional consideration is that we do not want to open connections to the database too frequently. To buffer the incoming stream, the SL data is first appended to a text file. When the text file reaches a certain size, it is then dumped to the database in one single session. This is to lower the processing burden of the database. Three scripts will be given below. All three scripts should be located in the same folder. In addition, a folder named "files" should be created in this directory. As a precautionary measure, all incoming data is stored in text files separately for each avatar in addition to being piped to the database.

We first define a common functions file, named "functions.php". Note that you have to swap in the log-in information for your database.

The following is the code for the PHP script handling the numeric variables, named "update.php". Note that it assumes a folder named "files" in the directory of the PHP script.

And finally, here is the script for processing the chat logs. This should be named "chatupdate.php".

The MySQL Database Schemas (Data Storage)

The MySQL database is composed of 3 tables. One table lists the users. Another table lists the numeric variables. And the final table lists the chat logs. The users table stores basic information about each unique user, such as the last time they logged on and the total number of lines they've chatted. This functionality is included in the PHP scripts given above. The users table thus provides an easy administrative view of the incoming data.

Below are the SQL schemas for the three tables.

Sample MySQL Analysis

An additional advantage of storing the data into a database directly is that it makes it possible to parse the data easily. For example, to find the total number of chat logs and the sum of chat lengths from each user, we could run the following SQL.

Or for example, to find the most popular zones visited by the participant sample, we could run the following SQL.

Additional Technical Considerations

Sampling Resolution. Expected web access counts should be extrapolated from the desired sampling resolution and sample size. For example, at a 10 second resolution with 50 participants, the web site might be accessed 3000 times per hour during peak usage. The web host should be queried as to the feasibility of this access rate. The best way to reduce web access counts is to lower the sampling resolution.

Database Access Deterioration. Database access and storage speed deteriorate linearly with the current number of rows stored in the table. One simple solution to prevent the database from stalling is to periodically pipe the incoming data to a new table and to merge the tables as necessary when data collection ends. If it is expected that data will be analyzed on a week-by-week basis, then the final table merge may not be necessary.

LSL Idiosyncrasies. It must also be remembered that SL was created as a public sandbox and many features exist to prevent players from antagonizing each other using scripts. Thus, certain avatar-related variables that one might assume that LSL provides are not available or return a dummy value. For example, while there is an x and z coordinate associated with avatars, this does not represent the avatar's size, however, the y coordinate does represent the avatar's height. Researchers should consult LSL's list of methods and attributes as different behavioral measures are considered in a research design to ensure that LSL allows one to collect data on that variable.

Open-Source Alternatives. Researchers may also be interested in exploring the possiblity of conducting research in open-source alternatives of Second Life, such as using LibSecondLife or OpenSim.

Data Sharing. To enable the sharing and standardization of this type of data among virtual environment researchers, we recommend guidelines suggested by Friedman et al. (2006).


D. Friedman, A. Brogni, A. Antley, C. Guger, A. Steed and M. Slater. Standardizing data analysis in presence experiments, Presence, Vol. 15, No. 5, 599-610, October 2006.

Friedman, D., Steed, A., & Slater, M. (2007). Spatial social behavior in second life. In C. Pelachaud (Ed.), Intelligent virtual agents 2007 (pp. 252-263): Springer-Verlag.

Slater, M. (2004). How colorful was your day? Why questionnaires cannot assess presence in virtual environments. Presence-Teleoperators and Virtual Environments, 13 , 484-493.

Yee, N., Bailenson, J.N., Urbanek, M., Chang, F., & Merget, D. (2007). The unbearable likeness of being digital; The persistence of nonverbal social norms in online virtual environments. Cyberpsychology and Behavior , 10, 115-121.