Color-biased regions in the ventral visual pathway are food selective

Color-biased regions have been found between face- and place-selective areas in the ventral visual pathway. To investigate the function of the color-biased regions in a pathway responsible for object recognition, we analyzed the natural scenes dataset (NSD), a large 7T fMRI dataset from 8 participants who each viewed up to 30,000 trials of images of colored natural scenes over more than 30 scanning sessions. In a whole-brain analysis, we correlated the average color saturation of the images with voxel responses, revealing color-biased regions that diverge into two streams, beginning in V4 and extending medially and laterally relative to the fusiform face area in both hemispheres. We drew regions of interest (ROIs) for the two streams and found that the images for each ROI that evoked the largest responses had certain characteristics: they contained food, circular objects, warmer hues, and had higher color saturation. Further analyses showed that food images were the strongest predictor of activity in these regions, implying the existence of medial and lateral ventral food streams (VFSs). We found that color also contributed independently to voxel responses, suggesting that the medial and lateral VFSs use both color and form to represent food. Our ﬁndings


In brief
What is the role of color-biased regions in the ventral visual pathway? Pennock et al. redefine our understanding of colorbiased regions by showing that they respond to both food and color. Their findings suggest that color contributes to the visual representation of food.
The processing of color information begins in the retina with a comparison of the activities of the three classes of cone that are sensitive to short (S), medium (M), and long (L) wavelengths of light. Subsequently, different classes of retinal ganglion cells send luminance and color information to the lateral geniculate nucleus which projects to V1. 20 In the early visual cortices such as V1, V2, V3, and V4v, responsiveness to hue and saturation as color attributes has been studied using functional magnetic resonance imaging (fMRI). 16,[21][22][23][24][25][26][27][28] V1 to V3 respond to color among other features, 29,30 whereas V4 and the ventral occipital region (VO; anterior to V4) are thought to be specialized for processing color. 31 Voxel activity patterns in V4, VO1, and VO2 can strongly distinguish chromatic from achromatic stimuli, 32 and clustering and representational similarity analyses have provided evidence for a representation of color in these areas. [32][33][34] More cognitive color tasks are also associated with V4, such as mental imagery for color 23 and color memory. 24 As color information progresses through visual cortical regions, its representation likely becomes transformed to aid cognitive tasks such as object perception, 12,14,35,36 and color representations in these regions are known to be modulated by other object features such as shape and animacy. 36 In particular, Rosenthal et al. 36 found that the color tuning properties of neurons in macaque IT correlated with the warm colors typical of salient objects. 37 Most studies of color perception present simple stimuli such as color patches, rather than color as it occurs in natural scenes. However, in daily life, our visual systems encounter colors as part of conjunctions of object features integrated in context within natural scenes. With simple stimuli, color is dissociated from its regular context and meaning: such stimuli have basic spatial form, may be selected from a restricted color gamut, and are typically presented on a uniform surround. Visual responses to carefully controlled colored stimuli might be quite different from responses to colors in their complex, naturalistic settings. For example, for colored patches, decoding accuracy drops progressively from V1 to V4, 22,23 whereas for colored object categories, decoding accuracy increases through the same areas. 35 To understand how the brain represents color in its usual contexts and to understand the functions of the colorbiased regions in the ventral visual pathway, it is therefore crucial to use complex stimuli containing a variety of object categories such as natural scenes. [11][12][13] We aimed to characterize the neural representation of color and its association with the representation of objects and other image properties as they are encountered in natural scenes. The natural scenes dataset (NSD) 38 provides a unique opportunity for this endeavor. It is an unprecedented large-scale fMRI dataset in which each participant viewed thousands of colored (and some grayscale) natural scenes over 30-40 sessions in a 7T scanner. This dataset therefore has impressively high signal-to-noise and statistical power. 39 However, images of natural scenes are high dimensional, and visual features can correlate with one another strongly, making it challenging to accurately disentangle the contributions of different features. Nonetheless, with its huge number of well-characterized and segmented stimulus images, the NSD is one of the best datasets currently available to uncover the neural representations underlying perception of natural scenes. 38,40 Our analyses revealed two streams in the ventral visual pathway that exhibit responses to color in the NSD images. We found that both streams were primarily responsive to food objects, implying that color is a key part of the neural representation of food in these ventral visual areas. Our findings are bolstered by two recent papers also finding strong evidence for food selectivity in these regions of the ventral visual pathway using distinct data-driven approaches with the NSD 41,42 and an additional fMRI study presenting isolated food images. 42

Identifying color-biased regions in the ventral visual pathway
To isolate responses to chromatic compared with achromatic information in the NSD images, we conducted a whole-brain correlation between the average color saturation of each NSD image and the BOLD signal change observed at each voxel ( Figure 1A). Since saturation and luminance (Figures 2A and S1A) are correlated in natural scenes, 43 we used the mean luminance of each image as a covariate. The correlations were Bonferroni corrected for each participant based on the number of voxels in participant-native space. We also conducted an analysis to measure split-half reliability, where voxel-byvoxel correlation coefficients for average saturation and voxel responses were correlated over the whole brain for odd and even images.
For all participants, we found areas showing positive correlations between saturation and voxel responses in the ventral visual pathway (Figure 1), with strong correlations in V4 and diverging into two distinct streams which we divided into medial and lateral regions of interest (ROIs). The medial ROI is located between face and place areas (fLoc-defined areas are shown in Figure 1A and the ROI boundaries in Figure 1B; see fLocexperiment by Allen et al. 38 ), and is roughly in agreement with the location of the color-biased regions identified by Lafer-Sousa et al. 11 ( Figure 1B). Our whole brain split-half reliability analysis on the correlation between voxel responses and saturation showed high reliability, with r = 0.82 (range = 0.71-0.89 for different participants).
For all 8 participants, there were also areas that showed negative correlations between saturation and voxel responses, specifically the PPA and the region located between the lateral and medial ROIs that showed positive correlations ( Figure 1A). For seven participants, there was an area of negative correlation lateral of the lateral ROI, roughly corresponding to area MT. For six participants (and one further participant in the left hemisphere only), there was a positive correlation with saturation in prefrontal regions ( Figure 1A), reminiscent of other findings on color processing in the prefrontal cortex. [44][45][46] Several participants also showed significant correlations between saturation and voxel responses in earlier visual areas V1-V3.

Montages of images producing the highest and lowest voxel responses
Our correlation analysis between BOLD and saturation revealed areas responsive to color in the ventral visual pathway for all participants. To better understand stimulus representation in these areas, we created montages of the images that evoked the highest and lowest voxel responses for these areas, split into four ROIs (medial and lateral, left and right hemispheres; Figure 2B for participant 1 and Figure S1B for the other participants).
By inspecting the montages, we identified multiple image properties present in images evoking the highest responses but not in images evoking the lowest responses. These properties were food such as bananas, donuts, and pizzas; circular objects such as plates, clocks, and stop signs; warm colors such as reds and oranges; and luminance entropy (how well or poorly luminance values in one location can predict the values in nearby locations 47,48 ). These image characteristics were consistent across all participants, the medial and lateral ROIs, and both hemispheres, suggesting that the four ROIs all process a similar type of visual information.

No large differences in image statistics between participants
In order to allow a quantitative analysis of voxel responses to the image properties that appeared to distinguish images that evoked the highest and lowest voxel responses in our ROIs, we calculated an image statistic for each image property. We also included mean luminance as an image statistic as it was used as a covariate in the correlation analysis with saturation. Our image statistics were mean saturation, pixel count for food objects, pixel count for circular objects, mean warmth ratings over the colors of all pixels, luminance entropy, and mean luminance (see STAR Methods for a detailed description). For food and circular objects, we used pixel count contained within the segmented objects to create continuous variables that could be entered into further analyses along with the other continuous variables. Our assumption was that there is a monotonic relationship between the pixel sizes of these objects and voxel responses, although we did not assume that the relationship has any particular form. There is some evidence to suggest that this is a reasonable assumption, 49 although voxel responses may also depend on other properties of object images, such  The six image statistics were significantly intercorrelated (see Figure S2B for correlation matrices of image statistics for each participant and Figures 2A and 2C for montages). Average luminance and luminance entropy were strongly positively correlated (group average r = 0.68), and circular object pixel count and food pixel count were moderately correlated (group average r = 0.42). Besides one exception, all other pairs of image statistics had low but significant correlations (group average r < 0.30). Circular object pixel count and luminance entropy were not significantly correlated for seven of the eight participants. The relationships between image statistics were highly consistent between participants, although different participants viewed largely nonoverlapping image sets (0.9993 % r % 0.9999 for pairwise correlations between image statistic correlation matrices between participants). This suggests there that were no substantial differences in image statistics between participants.

Relationship between image statistics and average ROI responses
We investigated the relationship between each image statistic and average voxel responses for the four ROIs (medial and lateral areas in both hemispheres) that we had defined based on correlations between voxel responses and average saturation. We plotted moving average ROI responses against each of the image statistics ( Figure 3A). ROI responses show positive linear relationships with average saturation and mean warmth ratings of pixel colors. ROI responses show positive nonlinear (decelerating) relationships with food pixel count and circular object pixel count, with a higher gain for food pixel count than for any of the other image statistics. There was no relationship between ROI responses and luminance entropy, and a small negative relationship between ROI responses and average luminance. The findings were consistent across hemispheres and ROIs for all eight participants (see Figure S2A for results for individual participants).
There are no sub-clusters of voxels that prefer color over food within the ROIs To test whether there are sub-clusters of voxels within the ROIs that respond to different image statistics, we also ran multiple linear regressions on all the individual voxels that showed a significant positive (Bonferroni-corrected) correlation with saturation for each participant. For each voxel, we identified the image statistic with the largest beta coefficient ( Figure 4). Food pixel count produced the first ranked beta coefficient in the singlevoxel multiple regressions for almost all voxels, suggesting that food is the strongest predictor for all four ROIs even at an individual voxel level. For the left medial, left lateral, right medial, and right lateral ROIs, respectively, 78%, 92%, 69%, and 92% of voxels included in the multiple regressions had food as the strongest predictor, and only 4%, 0.6%, 7%, and 2% had saturation as the strongest predictor. For the other image statistics, there was no consistent pattern. Voxel activity in early visual areas was most strongly predicted by luminance entropy. For V1 voxels defined by the Human Connectome Project atlas (HCPMMP 1.0 atlas), 50 80% of voxels included in the multiple regressions had luminance entropy for the first ranked beta coefficient, 7% had food, and 2% had saturation. The results of our single-voxel multiple linear regressions suggest that food is the main predictor for most voxels in the ROIs, and there are no substantial subclusters of voxels responding most strongly to other image statistics.
Color contributes independently to ROI responses in the absence of food The multiple linear regressions for the ROIs showed that food pixel count had the highest beta coefficients of the six image statistics. The results of our whole-brain correlation with saturation and previous literature 11 imply that these areas are responsive to color. We therefore sought to further investigate the contributions of saturation, color warmth, and food to ROI responses by conducting two-way ANOVAs ( Figure 3B and 3C), one with factors for food and mean saturation and one with factors for food and mean warmth rating of pixel colors. For these ANOVAs we defined 4 groups of images, one with food and with high saturation or warmth (depending on the ANOVA), one without food and with high saturation or warmth, one with food and with low saturation or warmth, and one without food and with low saturation or warmth. Importantly, the shapes of histograms of the image statistics for the food and non-food groups of images were exactly matched ( Figure S3; STAR Methods). Mean Z scored voxel responses for each ROI averaged across the 8 participants are shown for food and saturation in Figure 3B and for food and warmth in Figure 3C. Both figures show a large difference between voxel responses for food versus non-food images in the ROIs, and smaller differences between voxel responses for high versus low saturation and high versus low warmth. The ANOVA with factors for food and saturation revealed a significant main effect of food for all eight participants and all four ROIs (mean F = 309, p < 6 3 10 À26 ). All four ROIs also showed a significant main effect of saturation for all eight participants (mean F = 59, 2 3 10 À30 < p < 2 3 10 À7 ). There were significant interactions for some participants in some ROIs. For ANOVA results for all participants, see Figure S5 and Table S2. The ANOVA with factors for food and warmth also revealed a significant main effect of food for all eight participants for all four ROIs (mean F = 371, p < 3 3 10 À32 ), and a significant main effect of warmth for all participants and ROIs, other than for participant 6 for the medial area in the LH (mean F = 28, 2 3 10 À18 < p < 0.1). There were significant interactions for some participants in some ROIs. For ANOVA results for all participants, see Figure S5 and Table S3.
A leading existing theory about the function of the color-biased regions in the ventral visual pathway is that they represent behaviorally relevant objects, and are biased toward object-associated colors as a feature of such objects. 14, 36 We therefore conducted further two-way ANOVAs considering only the colors of pixels within segmented objects rather than pixels over whole images. Two-way ANOVAs with food and mean object pixel saturation as factors revealed strong significant main effects of food for all eight participants and the four ROIs, and smaller significant main effects of saturation for all eight participants and the four ROIs. The interactions were significant only for some participants in some ROIs (for group summary ANOVA results, see Figure S4A; for results for individual participants, see Figure S5 and Table S4). Figure 3. ROI responses to image statistics (A) Mean Z scored voxel responses in the medial and lateral ROIs of the left and right hemispheres. Each x axis shows an image statistic: mean image saturation, number of pixels that are contained in food objects, number of pixels that are contained in circular objects, mean warmth ratings of pixel colors, luminance (L+M) entropy, 48 and mean luminance. The y axes show the mean Z scored voxel responses. In each case, the images were sorted from lowest to highest based on the image statistic. Then, a running average of mean Z scored voxel responses for sets of 500 images was plotted (1-500, 2-501, 3-502, etc.), averaged across participants. Error bars are within-participant 95% confidence intervals. Plots for individual participants are shown in Figure S2A. (B) Effects of food and saturation on mean Z scored voxel responses for all four ROIs (left (LH) and right (RH) hemispheres and medial and lateral ROIs). The orange lines show mean Z scored voxel responses for images that contained food and the green lines for images that did not contain food based on the COCO object categories. Error bars are within-participant 95% confidence intervals. The montages to the right show randomly selected images from each of the four groups. Plots for individual participants are shown in Figure S5, and ANOVA results for individual participants are shown in Table S2. For the results of equivalent analyses for object pixels only, see Figures S4A and S5 and Table S4. (C) Effects of food and mean rating of warmth for colors of all pixels on mean Z scored voxel responses. The orange lines show mean Z scored voxel responses for images that contained food and the green lines for images that did not contain food based on the COCO object categories. Error bars are within-participant 95% confidence intervals. The montages to the right show randomly selected images from each of the four groups. Plots for individual participants are shown in Figure S5 and ANOVA results for individual participants are shown in Table S3. For the results of equivalent analyses for object pixels only, see Figures S4B and S5  and Table S5.

OPEN ACCESS
The same analysis conducted with food and mean object pixel warmth as factors revealed strong significant main effects of food for all eight participants and the four ROIs, and smaller significant main effects of warmth for most participants in most ROIs. The interactions were significant only for some participants in some ROIs (for group summary ANOVA results, see Figure S4B; for results for individual participants, see Figure S5 and Table S5). Thus, when considering object pixels only, it is still clear that food is the strongest associate of responses in the ROIs. Object saturation and object warmth are more weakly associated with ROI responses, independently of food.
Since the number of pixels contained in circular objects was also a relatively strong predictor of activity in the ROIs ( Figure 3A), we conducted a two-way ANOVA with factors for food and the presence or absence of segmented circular objects (circle). There was a significant main effect for food for all eight participants and all four ROIs (mean F = 741, p < 1 3 10 À45 ), and a significant main effect of circle for only some participants in some ROIs. There were significant interactions for all participants in all ROIs except for Participant 8 in the RH medial and LH medial areas. For group results, see Figure S4C, and for results for individual participants, see Figure S5 and Table S6.

Analysis of responses to food identifies similar regions to the color-biased areas
Our results indicate that food images are a strong predictor of responses in the ROIs, but since the ROIs were defined by responses to saturation rather than to food, the results reported so far could have missed voxels that respond to food but not to saturation. We therefore conducted t tests for each voxel on the differences between responses to images that contain food and responses to images that do not contain food. Each participant (1-8) saw 1,284, 1,284, 1,176, 1,237, 1,303, 1,240, 1,309, and 1,127 images of food, respectively. All other images were considered non-food images based on the Microsoft Common Objects in Context 40 (COCO) object categories. Figure 5 shows results plotted for the whole brain, also including coordinates of peak activation from a fMRI meta-analysis of food images 51 in the right hemisphere. We converted the Bonferroni-corrected threshold for the saturation correlation analysis (Figure 1) to a t-statistic and applied the same threshold to Figure 5 to make a comparison possible.
Our results show that food images are associated with responses in similar areas to the ROIs we identified for their correlated activity with saturation (see the white and black contours superimposed on the RH in Figure 5). The activation likelihood estimation (ALE) meta-analysis by van der Laan et al. 51 identified locations in the fusiform gyrus and posterior fusiform gyrus that are responsive to food, which are located in the medial and lateral ROIs. According to the HCPMMP 1.0 atlas, 50 the medial ROI ends in the perirhinal ectorhinal cortex (PeEC), and the lateral ROI ends in area Ph.
There are also responses correlated with the presence of food in early visual areas (V1, V2, V3, and V4) which are unlikely to be driven by food itself but by luminance entropy (Figure 4), correlated with the presence of food in the NSD stimulus set. There is activation in dorsal areas of the visual cortex (V1, V2, V3, and V4) to V3CD, LO1, and V3B. Another cluster of activation is found in IPS1, IP1 and IP0 and MIP, VIP, and LIPv-the latter cluster was also identified in the ALE meta-analysis. Two more areas of activation are found in PFt and PFop and part of AIP in both hemispheres, which the ALE meta analysis identified in the left hemisphere only (inferior parietal gyrus). Another area of activation is found in Pol2, Ig, MI, AAIC, Pir, and FOP2 for both hemispheres and a part of FOP3 for the left hemisphere, which corresponds to the Insular cortex in both hemispheres. Some smaller clusters of activation are found in PEF in both hemispheres, in the left hemisphere also spanning parts of IFJp and 6r. Responses to non-food images are significantly higher than to food images in areas MT, MST, TPOJ1, TPOJ2, TPOJ3, PGi, PGp, PGs, IP0, STV, PSL, and PF which cluster together. This is also the case in POS1, POS2, DV, PCV, 5mv, and 23c and in VMV1, PH1, and ProS.

ROI responses to food and other object categories
To investigate the specificity of voxel responses to food in the ROIs, we calculated average voxel responses for each ROI to each object category in the COCO dataset. We found that on average all the object categories provoking the highest voxel responses were food ( Figure S6A). When images containing food were removed from the analysis, there was no clear pattern in which other object categories provoked high voxel responses ( Figure S6B).

DISCUSSION
We identified four ROIs in the ventral visual pathway responsive to the average color saturation of images, one located medially and the other laterally of the fusiform face area (FFA) in each hemisphere. The NSD enabled an in-depth analysis of the responsiveness of these color-biased regions because of the large number and variety of images of natural scenes presented in the scanner. When we investigated a selection of image characteristics, we found the color-biased regions to be most strongly activated by food, with smaller responses to the image features of saturation, chromatic warmth, and the presence of circular objects. However, even in the relative absence of these image features, images containing food provoked very robust responses in the ROIs. In addition, we found negative correlations between saturation and voxel responses, mostly in areas that are selective for faces, places, and motion ( Figure 1A).

Reliability and consistency of results and biases in the NSD image set
We conducted a split-half reliability analysis over odd and even images for our correlation between saturation and voxel responses, which showed strong reliability over the whole brain.
Montages of the images that evoked the highest responses in our ROIs contained similar image features for all eight participants ( Figures 2B and S1), which were absent in montages of images that evoked the lowest responses. The intercorrelations between image statistics were similar for all participants, suggesting that there were no major differences between the unique images shown to each participant to take into account when interpreting the results. Multiple analyses-plots of the relationships between image statistics and voxel responses (Figure 3),  (Table 1), and multiple linear regressions for individual voxels (Figure 4)-all showed that food was the strongest predictor of voxel responses in the ROIs. We therefore interpret these regions as foodselective.
In working with the NSD image set, one of the major challenges researchers face is isolating correlated image features to study their independent contributions to brain activity. Because our ROIs in the ventral visual pathway were known to be color biased, 11,12,14,18,36 we analyzed warmth and saturation in the NSD images. In order to isolate the effects of these image features independently from food, we created groups of food and non-food images where the histograms of these color image statistics were exactly matched ( Figure S3A). Using these image groups, we were able to isolate a main effect of the presence of food in the images on ROI voxel responses (also for an analysis restricted to object pixels: Figure S3B). For isolating the presence of circles in the images, there are some limitations in the segmentation data for the NSD image set meaning that some object categories that are potentially circular are present in the images but are not segmented (e.g., plates). We assigned all of the segmented object categories into ''low circle'' and ''high circle'' groups (to acknowledge the presence of additional unsegmented circular objects in the images), although this distinction is somewhat subjective and may be imperfect. Additionally, some circular objects may not be circular in the images depending on occlusion or angle. However, using the segmentation data that we had access to, we did not find consistent significant main effects of the presence of segmented circular objects (Table S6). This we consider good evidence that responses in the ROIs cannot be explained by the presence of circular objects in the images. We also note that Khosla et al. 41 found a much smaller response to the image statistic of curvature than to food.
Our findings are unlikely to be specific to the particular stimuli used. The stimuli are not a random sample of images and will have been influenced to some extent by photographer selection biases. However, the 73,000 images encompass a large variety of different objects in many different contexts and constitute a comprehensive selection of scenes that provide the best existing dataset for investigating brain responses to natural scenes.

Function of the medial and lateral ROIs in the ventral visual pathway
We found that the medial and lateral ventral food streams (VFSs) are still biased to color, even in the absence of food. This is in line with the results of Lafer-Sousa et al., 11 who showed no food stimuli in their fMRI experiment but found color-biased anterior, central, and posterior areas medial in the ventral visual pathway. Their findings also hinted at a lateral color-biased area for a few of their participants. Our results for all eight participants show two approximately continuous streams, diverging medially and laterally beginning in V4. We found that the medial VFS extends further anteriorly than Lafer-Sousa et al.'s 11 anterior colorbiased region. Rosenthal et al. 36 and Conway 14 have proposed that color-biased regions in the ventral visual pathway are selective for the color statistics of behaviorally relevant objects, and could therefore be involved in object detection and categorization. Our findings support the idea that behaviorally relevant features of objects drive responses in these regions and make the important distinction that it is food objects that drive responses

. Analysis of responses to food versus non-food images
A flattened cortical map in fsaverage space is plotted showing t-statistics (average t-values across the eight participants) for the differences between mean voxel responses for food images and mean voxel responses to non-food images. The Human Connectome Project Atlas (HCPMMP 1.0 50 ) is overlaid for the left hemisphere (black contours), with regions labeled where they contained voxels with significant t-statistics. On the right hemisphere are plotted white discs indicating coordinates identified by van der Laan et al. 51 in a meta-analysis of brain areas responsive to food (see their Table 2) and contours of the medial ROI in black and the lateral ROI in white (as in Figure 1B). rather than behaviorally relevant objects in general. Our results lead us to interpret the regions as food selective but color biased, implying that color is important in the neural representation of food.
Our findings are in agreement with those of two other recent studies that have been conducted in parallel. All three studies analyzed the NSD with different aims and different analytical methods, but the results have converged on the identification of food-selective areas in the ventral visual pathway. Khosla et al. 41 took a data-driven approach, conducting a Bayesian non-negative matrix factorization on the activities of voxels in the ventral visual stream. They found that the third component was strongly associated with food in the NSD images and, also in agreement with our findings, to a lesser degree, with image features such as saturation, redness, curvature, and the color statistics of objects. Jain et al. 42 set out to investigate brain responses to food in the NSD images using a custom coding system for the presence of food, and controlling for the distance of food in the image, and found medial and lateral food-responsive regions similar to those we have identified. They conducted a PCA of responses to food images and found that both food objects themselves and their social and physical contexts influenced brain responses in these regions. They also performed a separate experiment using controlled grayscale images that isolated food objects independently of correlated image features such as color, and identified similar food-responsive regions in the ventral visual pathway. Some regions within the food-selective streams identified in these three studies are also evident in the results of a meta-analysis on fMRI studies of food. 51 The ventral visual pathway is known to contain sub-streams for processing faces, places, bodies, and words: these strongly convergent new results from three independent labs suggest the presence of medial and lateral VFSs as well.
Numerous studies have demonstrated a distinction between the processing of animate versus inanimate objects in the ventral visual pathway, 14,36,52,53 specifically that areas medial of the FFA respond preferentially to animate objects but lateral areas to inanimate objects. 52,53 At first glance, the existence of two VFSs separated by the FFA might appear to challenge this finding. However, the placement of food in the category distinction between animate and inanimate objects is ambiguous. For example, fruit and vegetables are living entities and foods, but pizzas and hot dogs are non-living foods processed from ingredients derived from living entities. 54 If animacy distinguishes the responses of areas medial and lateral to the FFA, we might expect that voxel responses to specific categories of food to differ between these areas. However, in our analysis of ROI responses to each COCO object category, we found no clear distinction between medial and lateral areas ( Figure S6).
The streams in the ventral visual pathway that we have identified as food-selective respond to all categories of food in the COCO image set ( Figure S6A), including fruits and vegetables as well as processed foods that were not available in the evolutionary past. We therefore speculate that the VFSs are tuned by exposure to food during a person's lifetime. This would be analogous to the within-lifetime tuning of the visual word form area, which, owing to the relatively recent development of written language, is unlikely to be innately specified. 55,56 However, as the visual word form area is highly consistent across individuals, it also seems unlikely that it is formed solely through experience. 55 We must also consider the possibility that our results may be influenced by attention or expertise. [57][58][59][60][61] Participants may have been more attentive to images containing food than they were to images containing other objects. Figure S6 shows the responses of the medial and lateral VFSs to images containing objects that could be considered attention-grabbing such as bears, baseball bats, and stop signs. However, these objects are not among those causing the greatest responses. Therefore, we consider it unlikely that these streams are driven by attention rather than food. Alternatively, food images may strongly activate the areas if they are general object processing areas but people have particular expertise for food. There is no evidence in our results that visual expertise determines responses in the ROIs ( Figure S6).

Representations of food, objects, color, and other image features in the ventral visual pathway
Our findings show that as well as responding to images of food itself, the VFSs respond to collections of visual features common to food objects and also (to a lesser degree) to these features even in the absence of explicit food objects, i.e., shapes and colors that are normally predictive of the presence of food. In support of the idea that representations of food may emerge in the VFSs from a collection of represented food-predictive features, visual object representation in the ventral visual pathway has been found to reflect the co-occurrence of objects and their contexts 62 and the co-occurrence of feature sets within objects. 63 In addition, face-selective IT neurons can respond to objects that co-occur with faces, 64 and context-based expectations have also been shown to facilitate object recognition. 65 In the color-biased areas in macaques, nonlinear interactions have been found between object shape and hue in determining single cell activities. 19 Humans use color as a heuristic for evaluating food, 66 and there is strong evidence that trichromatic vision helps animals to detect food [67][68][69] and to judge its properties such as ripeness; 70 hence, it seems plausible that the VFSs should also respond to color as a relevant visual feature. Similarly, the presence of circles in the NSD images is associated with the presence of food; hence, the VFSs should plausibly show responses to circles, as is evident in the significant interactions we observed between food and circular objects (Table S6) and in the results of Khosla et al. 41 who found that their foodrelated component was also independently associated with the image statistic of curvature.
In the same way that there are collections of visual features including color correlated with the presence of food in natural scenes, there are likely to be different contingencies between image features such as color and other types of objects or scenes. For example, images of rural environments may contain an overrepresentation of green. It is possible that place-selective regions tend to respond to green images containing the spatial features of rural scenes but that they may also respond to green in the absence of such spatial features. Such contingencies may explain the negative correlation we observed in the place-selective areas with image saturation ( Figure 1A): images of places may tend to be less saturated than other image categories. It remains to be seen whether place-selective areas have ll OPEN ACCESS preferences for low saturation in the absence of other correlated image features.

Conclusions
We have found strong evidence that color-biased regions in the ventral visual pathway are food-selective and that there are two distinct medial and lateral VFSs in both hemispheres which diverge from V4 and surround the FFA. The ventral visual pathway is already known to contain sub-streams for processing faces, places, bodies, and words: our results suggest that we should now add food. We found that the VFSs also respond to color and circular objects but to a lesser degree. Our findings show how high-quality fMRI datasets can be used to separate the contributions of multiple visual features to the neural representations of natural scenes and uncover a key feature of the ventral visual pathway: food selectivity.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

METHOD DETAILS
Here we will provide an outline of the methods used to prepare the NSD that are relevant for our analyses. Further detailed methods for the NSD can be found in Allen

MRI data acquisition
The participants were scanned using a 7T Siemens Magnetom passively shielded scanner at the University of Minnesota. A single channel transmit 32 channel receive RF head coil was used. The procedure involved a gradient-echo EPI sequence at 1.8 mm isotropic resolution (whole brain; 84 axial slices, slice thickness 1.8mm, slice gap 0 mm, field-of-view 216 mm (FE) x 216 mm (PE), phase-encode direction anterior-to-posterior, matrix size 120 x 120, TR 1600 ms, TE 22.0 ms, flip angle 62 , echo spacing 0.66 ms, bandwidth 1736 Hz/pixel, partial Fourier 7/8, in-plane acceleration factor 2, and multiband slice acceleration factor 3).

Stimulus presentation
A BOLDscreen 32 LCD monitor (Cambridge Research Systems, Rochester, UK) was positioned at the head of the scanner bed. The spatial resolution was 1920 pixels x 1080 pixels and the temporal resolution 120 Hz. The participants saw the monitor via a mirror mounted on the RF coil. There was a 5 cm distance between the participants' eyes and the mirror and a 171.5 cm distance from the mirror to image of the monitor. A PR-655 spectroradiometer (PhotoResearch, Chatsworth, CA) was used to measure the spectral power distributions of the display primaries. The BOLDscreen was calibrated to behave as a linear display device which allowed us to calculate the transformation from RGB to LMS tristimulus cone activities. A gamma of 2 was applied to the natural scene images to approximate the viewing conditions of standard computer displays.

Experimental task
The participants performed a long-term recognition task in which they had to press a button stating whether the scene presented on each trial had been shown before or not. On every trial a distinct image was shown for 3 s with a semi-transparent red fixation dot Images displayed 73,000 distinct images were used which were a subsample (the 2017 train/val subsections) of the COCO image dataset, 40 which contains complex natural scenes with everyday objects in their usual contexts. The COCO dataset contains 80 object categories ranging from faces and cars, to food and stop signs (for examples see Figure 2). The images were 425 x 425 pixels x 3 RGB channels which were resized to fill 8.4 by 8.4 degrees on the BOLDscreen 32 display using linear interpolation. Participants had up to 40 scan sessions (range [30][31][32][33][34][35][36][37][38][39][40] and saw up to 10,000 images 3 times across these sessions.

Preprocessing
The preprocessing of the functional data included temporal resampling, which corrected for slice time acquisition differences. Field maps were acquired and the resampled volumes were undistorted using the field estimates. These volumes were used to estimate rigid-body motion parameters using SPM5 spm_realign. To correct for head motion and spatial distortion, a single cubic interpolation was performed on the temporally resampled volumes. The mean fMRI volume was calculated and was corrected for gradient nonlinearities. Then the volume was co-registered to the gradient-corrected volume from the first scan session, so the first scan session was used as the target space for preparing fMRI data from the different scan sessions. A GLM analysis was applied to the fMRI time-series data to estimate single-trial beta responses. The third beta version (b3, 'be-tas_fithrf_GLMdenoise_RR'; native surface space) was used in the present study, and no alterations were made to this beta version's preprocessing steps described in Allen et al. 38 In brief, the GLMsingle algorithm 38,72-74 was used to derive nuisance regressors and to choose the optimum ridge regularization shrinkage fraction for each voxel. The extracted betas for each voxel represent estimates of the trial-wise BOLD response amplitudes to each stimulus trial, and these are relative to the BOLD signal observed during the absence of a stimulus (when only the grey screen was shown). Trials showing the same image were averaged to improve signal estimates and reduce the amount of data. All analyses were done in MATLAB 2019a (MathWorks, Natick, USA).

Color image statistics
The RGB images were converted to LMS cone tristimulus values using the 10 degree Stockman, MacLeod, Johnson cone fundamentals 75 interpolated to 1 nm. Chromaticity coordinates in a version of the MacLeod-Boynton chromaticity diagram 76 based on the cone fundamentals were extracted for each pixel. In this color diagram, the cardinal mechanisms of color vision are represented by the axes L/(L+M) (roughly teal and red colors) and S/(L+M) (roughly chartreuse to violet), which correspond to the two main retinogeniculate color pathways. 77 Saturation was defined as the distance between the values of the pixel in MacLeod-Boynton color space and the NSD grey background. To do this the chromaticity coordinates in the MacLeod-Boynton chromaticity diagram were transformed to polar coordinates. 78 The scaling factor applied to the L/(L+M) axis was 0.045. If the luminance of a pixel value fell below a dark filtering criterion of L+M = 0.0002, the saturation value was set to zero because at low luminance there is a high level of chromatic noise which is perceptually very dark or black. The saturation values for each pixel were then averaged over the image to find the average saturation of each image. We used the 425 x 425 images for all analyses of image statistics.

Correlation with saturation
For the whole-brain correlation between average saturation and BOLD signal change, with average luminance as a covariate, we used the partialcorr function in MATLAB. Average luminance was quantified as L+M with no dark filter applied. For the split-half analysis we computed separate correlation maps with saturation (and luminance as a covariate) for odd and even averaged trials. The whole brain correlation maps were correlated to provide split-half reliability correlation coefficients.

Definition of ROIs
We created regions of interest for the medial and lateral areas for both hemispheres. This was done based on the whole-brain map of the number of participants that showed overlap for significant correlations between voxel responses and average saturation in fsaverage space ( Figure 1B). For both hemispheres we drew large ROIs around each stream (medial and lateral) of voxel responses that correlated significantly (following a whole-brain Bonferroni correction) with average saturation in at least one participant, beginning at the boundary of Kastner-defined hV4 ( Figure 1B). We applied the four ROIs to each participant but only included voxels in an ROI for a particular participant if the responses showed significant positive correlations with average saturation (again, Bonferroni-corrected over the whole brain).

Creation of montages
We z-scored voxel responses to all images for each voxel and then averaged across the voxels in each ROI for each image. Using these responses for each ROI we created montages of images that evoked the highest and lowest responses. We plotted 64 images in each montage out of the 9,209 -10,000 images each participant saw.

Other image statistics
For each NSD image as well as average saturation and average luminance we extracted four further image statistics: Food pixel count, circular object pixel count, mean warmth rating over all pixels, and luminance entropy. For luminance entropy we used the built-in Matlab function entropy, 48 with each image's L+M pixel values as the input.
For food and circular objects we summed the number of pixels contained within the relevant objects of the 80 segmented object categories in the COCO dataset. To do this we converted the relevant segmentation data to a binary pixel mask for each image. The food categories were banana, apple, sandwich, orange, broccoli, carrot, hot dog, pizza, donut and cake. The circular object categories were sport ball, pizza, donut, clock, tennis racket, frisbee, wine glass, stop sign, cup, bicycle, umbrella, bowl, apple, cake, toilet and orange. For images that contained multiple relevant objects, pixels were summed over all relevant objects. There are some additional food and circular objects in the COCO image set that have not been segmented, for example, plates. Unsegmented objects were not included in the pixel counts.
For the warmth image statistic, we used color warmth ratings collected by our group for another project (Maule, Racey, Tang, Richter, Bird & Franklin, unpublished data), where participants were shown a set of 24 isoluminant and iso-saturated hues and asked to rate how warm (or cool) they appeared using sliding scale. We used these warmth ratings to interpolate a warmth value for the hue of each pixel that had a luminance higher than the dark filter criterion described previously. Warm ratings had positive values and cool ratings had negative values. We averaged the warmth values of all pixels to get a mean warmth statistic for each image. For intercorrelations between the image statistics for individual participants, see Figure S2B.

Relationships between image statistics and voxel responses
To create Figure 3A, we ranked the images for each image statistic and then averaged over the lowest ranking 500 images (images ranked 1 to 500). We also averaged over the z-scored voxel responses to the same 500 images. We repeated this procedure but selected images ranking between 2 and 501 and the corresponding voxel responses. We continued moving one image up until reaching the highest ranking 500 images. Afterwards, we extrapolated the resulting ''moving-average'' curves to the highest and lowest image statistic values seen by any of the 8 participants. We then averaged across the eight participants at interpolated points along the image statistic. The interpolation was necessary because each participant saw different images (other than the roughly 10% common images). In Figure S2A, plots for individual participants are shown.

Multiple linear regression
We applied a rank inverse normal transform (Blom constant) to all image statistics before conducting the multiple regression. Responses for each individual voxel were z-scored across images and then average voxel responses for each image were calculated for each of the four ROIs.
ANOVAs with saturation and food, warmth and food, and circular objects and food To define image groups for the ANOVA with saturation and food, we categorized images that contained food based on the COCO categories, and all other images were categorized as non-food images. We then split the food images into low and high mean saturation sets based on filtering criteria to roughly equate group sizes. For each saturation set we then selected non-food images in each saturation bin to exactly match the shape of the histogram of mean saturation for the food images. Unscaled distributions of saturation in the four image groups and distributions scaled to unity are shown in Figure S3A. Equivalent distributions for image groups based on the mean saturation of object pixels only are shown in Figure S3B. To define image groups for the ANOVA with food and mean warmth rated color we followed the same procedure and again matched the shapes of histograms of image statistics between the food and non-food image sets. Distributions of mean warmth over whole images are shown in Figure S3A and distributions of mean warmth over object pixels only are shown in Figure S3B. ANOVAs were then conducted on the sets of mean z-scored voxel responses for the images in each group (e.g. high saturation/non-food, high saturation/food, low saturation/non-food and low saturation/food).
For the ANOVA with saturation and circular objects we defined image groups based on the presence or absence of segmented food objects in the images and the presence or absence of segmented circular objects in the images, according to our criteria defined above. Group mean voxel responses for each image group are shown in Figure S4C, and voxel responses for each image group for individual participants are shown in Figure S5. ANOVA results for individual participants are shown in Table S6.

ROI responses to food and other object categories
We calculated and plotted the average z-scored voxel responses in each ROI to each category of segmented object the COCO dataset ( Figure S6A). We also conducted an equivalent analysis excluding any images that contained a segmented food object ( Figure S6B).