SLIDING window analysis Sanjay Manohar
How does it work?
Description
- These tools allow you to relate paired measurements to each other. It works a bit like a correlation.
- Typically, on each trial, you might measure two things, for example, choice and reaction time. But these tools can be used for any paired measurements, X and Y.
- A standard analysis would be a scatter plot of Y against X. Then perhaps linear or nonlinear regression could be performed to find a relationship.
- These tools allow you to relate X to Y without assuming a linear model.
- The analysis divides X into quantile bins, across all trials.
- The bins overlap, so there is a sliding window, with a fixed number of trials in it.
- For each bin, the mean value of Y (or some other statistic) is calculated.
Subjects
- This procedure can be done for each subject or dataset independently.
- The mean values of Y, with the corresponding bin centres of X, are given for each subject, and can be plotted.
- Our way of plotting these is to take the across-subjects mean Y for each bin, and plot this against the mean bin centre X, along with standard errors.
Statistics
- To compare different conditions, where each subject has data from both conditions, we can do a paired permutation test. The resulting per-subject Y values, as a function of X, can thus be compared.
- To do this, we use a t statistic, and test the null hypothesis that “no bins are different, between the two conditions”, by randomly permuting the conditions and calculating the maximum t-statistic over bins.
- This allows us to test whether the dependence of Y on X, normalised by quantile of X, differs between conditions. This makes minimal assumptions about the shape of the relationship.
Example
Let’s say you have 10 subjects each with a data file called “datafile_1” etc., and within each subject’s data there is a variable called “condition” which is the condition of each trial, “RT” with the reaction time for each trial, and “correct” indicating if the response was accurate for each trial. Let’s say each variable is a vector, with one element for each trial. Then we can write:
% first build array of X values and Y values
% corresponding to each trial
for subj = 1:10 % for each subject
% load subject data
dat = load( [ ‘datafile_’ num2str(subj) ] );
for cond = 1:2% for each condition
sel = dat.condition == cond; % select trials
% read RTs for this condition
X(:,i,j) = dat.RT( sel );
% accuracy for each trial in this condition
Y(:,i,j) = dat.correct( sel );
end
end
% calculate conditional accuracy plot
[acc, rt] = conditionalPlot( X,Y );
Documentation from .m file:
% [Xb,Yb, p, t, h] = conditionalPlot(X,Y, nbins, ...)
% X{ SUBJECT, CONDITION } ( TRIAL )
% or X( TRIAL, SUBJECT ) -- for one condition
% or X( TRIAL, SUBJECT, CONDITION ) -- for multiple conditions
%
% Takes sliding quantile bins of X, and calculates the mean Y for
% each bin. Nbins indicates the width of the window (5 means 20 percentiles)
% This is done for each subject and condition separately.
% The bin values of Xb and Yb are plotted as mean and sem across subjects.
% Different conditions are plotted as separate lines.
%
% returns: Xb ( SUBJECT, CONDITION, QUANTILE ) = median of X for each bin
% Yb ( SUBJECT, CONDITION, QUANTILE ) = mean of Y for each bin
% p ( QUANTILE ) = probability that conditions are significantly
% different at a quantile, using permutation test.
% Within-subject comparison between conditions is performed by
% permutation, and multiple conditions are treated as a
% single continuous linear predictor (1,2,3 etc).
% T ( QUANTILE ) = t statistic for linear effect of condition
% H = handle of lines as returned by errorbarplot
%
% Parameters:
% 'meanfunc' = function handle to use instead of "nanmean".
% 'xtransform', 'ytransform' - set transformations for the plot axes. Can
% be a @function, or one of 'identity' (default), 'probit', 'recip',
% 'log', 'asin', 'gauss' (inverse gaussian from 0 to 1), or
% 'gauss2' (inverse gaussian from 0.5 to 1). Note that the output values
% Xb and Yb will not be transformed.
% 'StandardBinning' - if true, then use non-overlapping bins.
%
% dependencies: smoothn, nanmean, nancat, removeIdenticals, permutationOLS
% sanjay g manohar 2013
conditionalPlot.m:
function [Xb,Yb, p,t, h]=conditionalPlot(X,Y, Nbins, varargin)
% [Xb,Yb, p, t, h] = conditionalPlot(X,Y, nbins, ...)
% X{ SUBJECT, CONDITION } ( TRIAL )
% or X( TRIAL, SUBJECT ) -- for one condition
% or X( TRIAL, SUBJECT, CONDITION ) -- for multiple conditions
%
% Takes sliding quantile bins of X, and calculates the mean Y for
% each bin. Nbins indicates the width of the window (5 means 20 percentiles)
% This is done for each subject and condition separately.
% The bin values of Xb and Yb are plotted as mean and sem across subjects.
% Different conditions are plotted as separate lines.
%
% returns: Xb ( SUBJECT, CONDITION, QUANTILE ) = median of X for each bin
% Yb ( SUBJECT, CONDITION, QUANTILE ) = mean of Y for each bin
% p ( QUANTILE ) = probability that conditions are significantly
% different at a quantile, using permutation test.
% Within-subject comparison between conditions is performed by
% permutation, and multiple conditions are treated as a
% single continuous linear predictor (1,2,3 etc).
% T ( QUANTILE ) = t statistic for linear effect of condition
% H = handle of lines as returned by errorbarplot
%
% Parameters:
% 'meanfunc' = function handle to use instead of "nanmean".
% 'xtransform', 'ytransform' - set transformations for the plot axes. Can
% be a @function, or one of 'identity' (default), 'probit', 'recip',
% 'log', 'asin', 'gauss' (inverse gaussian from 0 to 1), or
% 'gauss2' (inverse gaussian from 0.5 to 1). Note that the output values
% Xb and Yb will not be transformed.
% 'StandardBinning' - if true, then use non-overlapping bins.
%
% dependencies: smoothn, nanmean, nancat, removeIdenticals, permutationOLS
% sanjay g manohar 2013
%%%%%%%%%%%%%%%%%%%%%
% Setup Parameters
if(~exist('Nbins','var') || isempty(Nbins)) Nbins=5; end;
if ~iscell(X)
if ndims(X)==2 % treat as RT( TRIAL, SUBJECT )
if size(X,1)==1, X=X'; end
if size(Y,1)==1, Y=Y'; end
X = mat2cell(X, size(X,1), ones(1,size(X,2)) )'; % break into cells
Y = mat2cell(Y, size(Y,1), ones(1,size(Y,2)) )';
elseif ndims(X)==3 % treat as RT ( TRIAL, SUBJECT, CONDITION ): convert to cell { SUBJECT, CONDITION } (TRIAL)
X = sq(mat2cell(X, size(X,1), ones(1,size(X,2)), ones(size(X,3),1) ));
Y = sq(mat2cell(Y, size(Y,1), ones(1,size(Y,2)), ones(size(Y,3),1) ));
end
end
% use nonoverlapping bins?
% advantage is the values are independent, and thus stats are simpler.
% if not, use overlapping bins, which must be compared with permutation test
STANDARD_BINNING = 0;
Nq = 100; % number of quantiles (used when STANDARD_BINNING=0)
binwidth = 1/Nbins;
SMOOTH = 15; % used for area plots
if exist('nanmean','file') MEANFUNC = @nanmean; % do we have 'nanmean' ?
else MEANFUNC = @mean; % if not, use mean.
end
doErrorBars = true; % on the standard binning plot, show error bars for both X and Y?
FILL_AREA = true; % if false, show dotted lines for standard error, rather than shaded area
REMOVE_IDENTICAL_X = true; % if there aren't enough unique X-values, then interpolate?
% transform axes?
T.identity = @(x)x;
T.probit = @(x) log(eps+x./(1-x+eps));
T.recip = @(x) -1./x; % reciprocal (eg for 'reciprobit'
T.log = @(x) log(x+eps);
T.asin = @(x) asin(sqrt(x));
T.gauss = @(x) erfinv( max(eps-1, min(1-eps, (x*2-1))) ); % gaussian error, range [0-1]
T.gauss2 = @(x) erfinv( max(eps-1, min(1-eps, (x-0.5)*4-1)) ); % gaussian in range [0.5-1]
% these two lines apply transforms to the data before plotting.
% the function output values are unchanged by the transform.
YTRANSFORM = T.identity;
XTRANSFORM = T.identity;
i=find(strcmpi(varargin, 'xtransform')); if i, if isstr(varargin{i+1}) XTRANSFORM=T.(varargin{i+1});
else XTRANSFORM=varargin{i+1}; end ; varargin([i i+1])=[];end
i=find(strcmpi(varargin, 'xtransform')); if i, if isstr(varargin{i+1}) YTRANSFORM=T.(varargin{i+1});
else YTRANSFORM=varargin{i+1}; end ; varargin([i i+1])=[];end
i=find(strcmpi(varargin, 'meanfunc')); if i,
MEANFUNC=varargin{i+1}; varargin([i,i+1])=[];
end
i=find(strcmpi(varargin, 'StandardBinning')); if i,
STANDARD_BINNING = varargin{i+1}; varargin([i i+1])=[];
end
i=find(strcmpi(varargin, 'Conditions')); if i,
C = varargin{i+1}; varargin([i i+1])=[];
if ~isempty(C)
if size(X,2)>1, error('cannot specify conditions since RT already contains conditions'); end
if isnumeric(C) & ndims(C)==2, C=mat2cell(C, size(C,1), ones(1,size(C,2)) )'; end
uc=unique(nancat(2,C{:})); uc(isnan(uc))=[];
for i=1:size(X,1) % subject
for j=1:length(uc) % condition
rt{i,j} =X{i}( C{i}==uc(j) );
corr{i,j}=Y{i}(C{i}==uc(j) );
end
end
X=rt; Y=corr;
end
end
if STANDARD_BINNING, N=Nbins, % N is the number of quantiles to sample, i.e.
else N=floor(Nq*(1-binwidth)); % number of points on a CAF curve
end
Xb=nan(size(X,1), size(X,2), N); % create blank output variable (needed for parallel processing)
%%%%%%%%%%%%%%%%%%%%%
% Calculate binning
%%%% change the outer 'for' to 'parfor' for faster execution.
for(i=1:size(X,1)) % for each subject
x_i=nan(size(X,2),N); % keep track of bin centres
for(j=1:size(X,2)) % for each condition
if STANDARD_BINNING % DIVIDE INTO BINS
top=-inf; % lowest bin goes down to -inf
for q=1:Nbins % bin the RTs into Nbins
bottom=top; % start new bin at top of old bin
if(q<Nbins) top=quantile(X{i,j},q/Nbins); % get top edge of bin
else top=inf; % highest bin goes up to +inf
end
f=X{i,j}>bottom & X{i,j}<top; % filter to select this bin
Yb(i,j,q)=MEANFUNC( Y{i,j}(f) ); % mean of Y in this X-bin
x_i(j,q)=quantile(X{i,j},(q-0.5)/Nbins); % centre (median) RT of the bin
% alternative: use mean of top and bottom of bin?
end
else % SLIDING WINDOW BINS
quantiles = quantile(X{i,j}, linspace(0,1,Nq)); % create quantiles
% go through bins, but there aren't 100 bins because of the width!
for q=1:floor(Nq*(1-binwidth)); % e.g. 1:80 for 5 bins
% get start, mid and end of the bin
range=quantiles( q + [0,floor(Nq*binwidth/2), floor(Nq*binwidth)] );
f = X{i,j}>range(1) & X{i,j}<range(3); % filter data within range
% Yb ( SUBJECT, CONDITION, QUANTILE )
Yb(i,j,q)=MEANFUNC( Y{i,j}(f) ); % mean of Corr
% Xb values will be the centre (median) RT of the bin.
% Change this to range([1 3])/2 to get the midway-point between edges
x_i(j,q)=range(2);
end
% 8/7/14 added this to prevent nonunique bin centres
% note that this will fail if X does not have enough unique values to
% make bins.
if REMOVE_IDENTICAL_X
x_i(j,:)=removeIdenticals(flat(sq(x_i(j,:))));
end
end
end
Xb(i,:,:) = x_i; % store bins
end
%%%%%%%%%%%%%%%%%%%%%%
% Now do the plotting
oldhold=ishold(); % keep track of whether "hold on"
t_x = XTRANSFORM( Xb ); % transform the data for plotting
t_y = YTRANSFORM( Yb );
h=[]; % keep graphics / line handles
cols = get(gca,'ColorOrder'); % use axis colours for different lines
if STANDARD_BINNING % for standard binning, we have just a couple of bins
for j=1:size(X,2) % so for each bin,
if doErrorBars % draw error-bars for both X and Y
errorbarxy( YTRANSFORM( sq(mean(t_x(:,j,:),1)) ), ...
sq(mean(t_y(:,j,:),1)), ...
sq(std(t_x(:,j,:),[],1))/sqrt(size(Xb,1)), ...
sq(std(t_y(:,j,:),[],1))/sqrt(size(Yb,1)) ...
,[],[], cols(j,:) , .2*[1 1 1]);
else % or just plot the means
h2=plot(sq(mean(t_x(:,j,:),1)), ...
sq(mean(t_y(:,j,:),1)) ...
, cols(j,:) );
h=[h h2];
end
hold on
end
else % SLIDING BIN = we have a whole curve to plot
% Yb ( SUBJECT, CONDITION, TIME )
for j=1:size(Xb,2) % for each condition
x = sq(nanmean(t_x(:,j,:),1)); line = sq(nanmean(t_y(:,j,:),1));
erro = sq(nanstd(t_y(:,j,:)))/sqrt(size(Yb,1)); % std err at each time
if numel(erro)==1, erro=erro*ones(size(line)); end
if SMOOTH, line=smoothn(line, SMOOTH); erro=smoothn(erro, SMOOTH); end
args=varargin;
% if there is more than one line, add in the color-order
colj = cols(mod(j-1,size(cols,1))+1,:);
if size(Xb,2)>1, args=[args 'Color',colj]; end
h2=plot(x, line, args{:}, 'Marker','.'); h=[h h2];
hold on
if ~FILL_AREA % DOTTED LINES?
h2=plot(x, line+error, args{:}, 'LineStyle',':'); h=[h h2];
h2=plot(x, line-error, args{:}, 'LineStyle',':'); h=[h h2];
else % FILLED AREA?
alpha = 0.5;
h2 = fill([x; flipud(x)]', [line+erro; flipud(line-erro)]', colj, 'FaceAlpha', alpha,'linestyle','none');
h=[h h2];
end
end
%%%%%%% STATISTICS on sliding bin data?
if nargout>2 % DO STATS? treats conditions as a continuous variable
NS = size(Yb,1); NC = size(Yb,2); % run mixed effects permutation test
if nargout<5, ao={[],[],[],[]}; % did you request a figure handle? if so
else ao={[],[],[],[],[]}; end % then request a significance bar from permutationOLS.
DES=[flat(ones(NS,NC)) zscore(flat(repmat(1:NC,[NS,1]))) ];
if var(DES(:,2))>0
[ao{:}]=permutationOLS(reshape(Yb,NS*NC,[]),...
DES, [0 1], ...
repmat([1:NS]', [1,NC]) );
p = ao{2}; t = ao{3};
else p=[]; t=[];
end
end
end
if ~oldhold % restore "graphics hold" state
hold off;
end