Skip to content

Commit

Permalink
git-svn-id: https://pmtkdata.googlecode.com/svn/trunk@39 49fcfb0e-06c…
Browse files Browse the repository at this point in the history
…3-11df-adb7-1be7f7e3f636
  • Loading branch information
mattbdunham committed May 29, 2010
1 parent aaf72ce commit 5f5f602
Show file tree
Hide file tree
Showing 441 changed files with 252,891 additions and 0 deletions.
389 changes: 389 additions & 0 deletions 04cars/04cars-fixed.csv

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions 04cars/04cars-meta.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
% PMTKdescription
% PMTKsource
% PMTKtype
% PMTKfileSize 35.2 KB
% PMTKncases
% PMTKndims
428 changes: 428 additions & 0 deletions 04cars/04cars.dat

Large diffs are not rendered by default.

Binary file added 04cars/04cars.mat
Binary file not shown.
74 changes: 74 additions & 0 deletions 04cars/04cars.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
NAME: 2004 New Car and Truck Data
TYPE: Sample
SIZE: 428 observations, 19 variables

DESCRIPTIVE ABSTRACT:
Specifications are given for 428 new vehicles for the 2004 year. The variables recorded include price, measurements
relating to the size of the vehicle, and fuel efficiency.

SOURCE:
_Kiplinger's Personal Finance_, December 2003, vol. 57, no. 12, pp. 104-123, http:/www.kiplinger.com (permission to post on
the JSE Web site kindly granted by PARS International Corporation, 102 West 38th Street, New York, NY 10018)

VARIABLE DESCRIPTIONS:

Columns Variables
1- 45 Vehicle Name
47 Sports Car? (1=yes, 0=no)
49 Sport Utility Vehicle? (1=yes, 0=no)
51 Wagon? (1=yes, 0=no)
53 Minivan? (1=yes, 0=no)
55 Pickup? (1=yes, 0=no)
57 All-Wheel Drive? (1=yes, 0=no)
59 Rear-Wheel Drive? (1=yes, 0=no)
61- 66 Suggested Retail Price, what the manufacturer thinks the
vehicle is worth, including adequate profit for the
automaker and the dealer (U.S. Dollars)
68- 73 Dealer Cost (or "invoice price"), what the dealership pays
the manufacturer (U.S. Dollars)
75- 77 Engine Size (liters)
79- 80 Number of Cylinders (=-1 if rotary engine)
82- 84 Horsepower
86- 87 City Miles Per Gallon
89- 90 Highway Miles Per Gallon
92- 95 Weight (Pounds)
97- 99 Wheel Base (inches)
101-103 Length (inches)
105-106 Width (inches)

Values are aligned and delimited with blanks.
Missing values are denoted with *.

RELATED DATASETS:

A similar dataset appeared as one of the sample datasets shipped with the _Student Edition of Execustat_ (PWS-KENT 1990).

Another similar dataset, submitted by Robin Lock, may be found in the _Journal of Statistics Education_ dataset archive at
http://www.amstat.org/publications/ jse/datasets/93cars.dat. The accompanying article "1993 New Car Data", Journal of
Statistics Education, vol. 1, no. 1, by Lock (1993) may be found at the site
http://www.amstat.org/publications/jse/v1n1/datasets.lock.html.

COMMENTS ON MISSING VALUES:
Length and Width of pickup trucks were not given in _Kiplinger's_. Possible updates to the dataset (viewable on a vehicle
by vehicle basis) may be found at the site http://www.kiplinger.com/tools/carfinder/; these updates may allow some of the
missing values to be filled in.

PEDAGOGICAL NOTES:
This is a multi-purpose dataset that can be used at many points in an introductory statistics course. It includes several
numeric variables which can be examined across a choice of several categorical variables. Students tend to be familiar with
most of the variables. Furthermore, they also tend to be able to anticipate and pose explanations for many of the
relationships between variables. Suggested Retail Price/Dealer Cost and the two MPG variables are popular choices as
response variables. One suggested exploration - try fitting the MPG variables as linear in reciprocal weight (clear
outliers to this strongly linear pattern are the hybrid gas/electric cars). Other explorations are suggested in the
article of Lock (1993) referenced above.

SUBMITTED BY:
Roger W. Johnson
Department of Mathematics & Computer Science
South Dakota School of Mines and Technology
501 East St. Joseph Street
Rapid City, South Dakota 57701
(605) 355-3450
[email protected]


Binary file added 04cars/04cars.zip
Binary file not shown.
32 changes: 32 additions & 0 deletions 04cars/04carsReadData.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
% Parse the file 04cars.dat from
% http://www.amstat.org/publications/jse/datasets/04cars.txt

if 0
fid = fopen('04cars.dat');
%dat = textscan(fid, '%45c%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d');
while 1
tline = fgetl(fid);
if ~ischar(tline), break, end
disp(tline)
end
fclose(fid);
end

%Cosma Shalizi has already parsed the data and removed rows with missing values
% http://www.stat.cmu.edu/~cshalizi/350/lectures/10/cars-fixed04.dat

data = importdata('04cars-fixed.csv');
X = data.data; % 387 rows, 18 features, 1-7 are binary, rest are integer
names = data.textdata(2:end);
types = [repmat('b', 1, 7) repmat('c', 1, 11)];
header = data.textdata{1};
ndx = strfind(header, ',');
ndx(end+1) = length(header)+1;
start = 1;
for i=1:length(ndx)
stop = ndx(i)-1;
varlabels{i} = header(start:stop);
start = ndx(i) + 1;
end

save('04cars.mat', 'X', 'names', 'varlabels', 'types')
6 changes: 6 additions & 0 deletions 14cancer/14cancer-meta.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
% PMTKdescription
% PMTKsource
% PMTKtype
% PMTKfileSize 12.8 MB
% PMTKncases
% PMTKndims
33 changes: 33 additions & 0 deletions 14cancer/14cancer.info
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
14-cancer gene expression data. 16,063 genes, 144 training samples,
54 test samples.

One gene per row, one sample per column

Cancer classes are labelled as follows:

1. breast
2. prostate
3. lung
4. collerectal
5. lymphoma
6. bladder
7. melanoma
8. uterus
9. leukemia
10. renal
11. pancreas
12. ovary
13. meso
14. cns

Reference:

S. Ramaswamy and P. Tamayo and R. Rifkin and S. Mukherjee and C.H. Yeang and
M. Angelo and C. Ladd and M. Reich and E. Latulippe and J.P. Mesirov and
T. Poggio and W. Gerald and M. Loda and E.S. Lander and T.R. Golub (2001)

Multiclass Cancer Diagnosis Using Tumor Gene Expression Signatures

Proc. Natl. Acad. Sci., 98, p15149-15154.


Binary file added 14cancer/14cancer.mat
Binary file not shown.
Loading

0 comments on commit 5f5f602

Please sign in to comment.