転写配列の統計情報を記録しています。

  • 2006.12.20 dbESTが更新されたことを発見しました。
    • dbEST release 121506 - December 15, 2006 Number of public entries: 40,227,012
  • UniGene Mouse,Rat,Dog,Rice が更新されました。
  • dbESTが release 111706 - November 17, 2006 にビルドアップされました。
  • SoybeanのUniGeneのBuildが#25から#26にあがりました。Nov. 3.2006
  • HumanのUniGeneのBuildが#195から#196にあがりました。Oct. 18. 2006

NCBIdbESTUniGeneのSummaryを蓄積していって、そのグラフを描くことで、見えてくるものがあるんじゃないかと思っていて、このブログに蓄積することにしました。
蓄積する情報は、次の2種類です。

まだ、蓄積を始めたばかりなので、グラフを書くまでにはいたっておりません。気の長い話ですが、来年のいまごろには、グラフを書くことができるようになるかもしれません。

dbEST Summary

dbEST release 121506 - December 15, 2006
Homo sapiens (human)                                7,895,603
Mus musculus + domesticus (mouse)                   4,740,859
Bos taurus (cattle)                                 1,236,788
Oryza sativa (rice)                                 1,211,078
Zea mays (maize)                                    1,160,495
Danio rerio (zebrafish)                             1,152,629
Xenopus tropicalis                                  1,090,063
Rattus norvegicus + sp. (rat)                         871,144
Triticum aestivum (wheat)                             855,067
Xenopus laevis (African clawed frog)                  737,698
Arabidopsis thaliana (thale cress)                    686,778
Ciona intestinalis                                    686,396
Sus scrofa (pig)                                      641,857
Gallus gallus (chicken)                               599,175
Drosophila melanogaster (fruit fly)                   514,613
Hordeum vulgare + subsp. vulgare (barley)             437,713
Salmo salar (Atlantic salmon)                         430,223
Canis familiaris (dog)                                365,909
Glycine max (soybean)                                 359,402
Caenorhabditis elegans (nematode)                     346,064
Pinus taeda (loblolly pine)                           329,469
Vitis vinifera (wine grape)                           316,756
Oryzias latipes (Japanese medaka)                     309,868
Aedes aegypti (yellow fever mosquito)                 298,060
Branchiostoma floridae (Florida lancelet)             277,538
Gasterosteus aculeatus (three spined stickleback)     276,992
Oncorhynchus mykiss (rainbow trout)                   260,886
Malus x domestica (apple tree)                        254,891
Pimephales promelas                                   249,941
Solanum lycopersicum (tomato)                         249,392
Saccharum officinarum (sugarcane)                     246,301
Solanum tuberosum (potato)                            226,798
Medicago truncatula (barrel medic)                    225,129
Sorghum bicolor (sorghum)                             204,208
Ovis aries (sheep)                                    186,664
Bombyx mori (domestic silkworm)                       184,200
Gossypium hirsutum (upland cotton)                    177,048
Physcomitrella patens subsp. patens                   174,908
Hydra magnipapillata                                  174,162
Chlamydomonas reinhardtii                             167,641
Schistosoma mansoni (blood fluke)                     158,841
Dictyostelium discoideum                              155,032
Anopheles gambiae (African malaria mosquito)          153,165
Lotus japonicus                                       150,631
Trichosurus vulpecula                                 147,199
Strongylocentrotus purpuratus (purple urchin)         141,833
Picea glauca                                          132,624
Toxoplasma gondii                                     129,421
Molgula tectiformis                                   106,863
Macaca fascicularis                                   101,192
dbEST release 111706 - November 17, 2006
Homo sapiens (human)                                7,895,572
Mus musculus + domesticus (mouse)                   4,722,069
Oryza sativa (rice)                                 1,211,064
Zea mays (maize)                                    1,160,485
Danio rerio (zebrafish)                             1,152,269
Bos taurus (cattle)                                 1,141,099
Xenopus tropicalis                                  1,039,143
Rattus norvegicus + sp. (rat)                         871,144
Triticum aestivum (wheat)                             855,067
Arabidopsis thaliana (thale cress)                    734,275
Ciona intestinalis                                    686,396
Sus scrofa (pig)                                      640,034
Gallus gallus (chicken)                               599,171
Xenopus laevis (African clawed frog)                  542,288
Drosophila melanogaster (fruit fly)                   514,613
Hordeum vulgare + subsp. vulgare (barley)             437,321
Salmo salar (Atlantic salmon)                         428,803
Canis familiaris (dog)                                365,909
Glycine max (soybean)                                 359,402
Caenorhabditis elegans (nematode)                     346,064
Pinus taeda (loblolly pine)                           329,469
Vitis vinifera (wine grape)                           316,756
Oryzias latipes (Japanese medaka)                     309,868
Aedes aegypti (yellow fever mosquito)                 298,060
Branchiostoma floridae (Florida lancelet)             277,538
Gasterosteus aculeatus (three spined stickleback)     276,992
Oncorhynchus mykiss (rainbow trout)                   260,886
Malus x domestica (apple tree)                        254,422
Pimephales promelas                                   249,941
Solanum lycopersicum (tomato)                         249,392
Saccharum officinarum (sugarcane)                     246,301
Solanum tuberosum (potato)                            226,798
Medicago truncatula (barrel medic)                    225,129
Sorghum bicolor (sorghum)                             204,208
Ovis aries (sheep)                                    186,664
Bombyx mori (domestic silkworm)                       184,200
Gossypium hirsutum (upland cotton)                    177,047
Physcomitrella patens subsp. patens                   174,908
Hydra magnipapillata                                  174,162
Chlamydomonas reinhardtii                             167,641
Schistosoma mansoni (blood fluke)                     158,841
Dictyostelium discoideum                              155,032
Anopheles gambiae (African malaria mosquito)          153,165
Lotus japonicus                                       150,631
Trichosurus vulpecula                                 147,199
Strongylocentrotus purpuratus (purple urchin)         141,833
Picea glauca                                          132,624
Toxoplasma gondii                                     129,421
Molgula tectiformis                                   106,863
Macaca fascicularis                                   101,192
dbEST release 100606 - October 6, 2006
Number of public entries: 38,953,178

Homo sapiens (human)                                7,893,983
Mus musculus + domesticus (mouse)                   4,720,064
Oryza sativa (rice)                                 1,188,565
Zea mays (maize)                                    1,143,728
Bos taurus (cattle)                                 1,137,353
Danio rerio (zebrafish)                             1,134,553
Xenopus tropicalis                                  1,044,182
Rattus norvegicus + sp. (rat)                         871,144
Triticum aestivum (wheat)                             855,066
Ciona intestinalis                                    686,396
Sus scrofa (pig)                                      623,929
Arabidopsis thaliana (thale cress)                    622,973
Gallus gallus (chicken)                               599,141
Xenopus laevis (African clawed frog)                  537,424
Drosophila melanogaster (fruit fly)                   514,545
Hordeum vulgare + subsp. vulgare (barley)             437,321
Canis familiaris (dog)                                365,909
Glycine max (soybean)                                 359,151
Caenorhabditis elegans (nematode)                     346,064
Pinus taeda (loblolly pine)                           329,469
Vitis vinifera (wine grape)                           316,756
Oryzias latipes (Japanese medaka)                     309,868
Aedes aegypti (yellow fever mosquito)                 298,060
Branchiostoma floridae (Florida lancelet)             277,538
Gasterosteus aculeatus (three spined stickleback)     273,259
Oncorhynchus mykiss (rainbow trout)                   260,886
Malus x domestica (apple tree)                        254,169
Pimephales promelas                                   249,941
Saccharum officinarum (sugarcane)                     246,301
Salmo salar (Atlantic salmon)                         237,274
Solanum tuberosum (potato)                            226,798
Medicago truncatula (barrel medic)                    225,129
Sorghum bicolor (sorghum)                             204,208
Lycopersicon esculentum (tomato)                      199,873
Ovis aries (sheep)                                    186,664
Bombyx mori (domestic silkworm)                       184,200
Gossypium hirsutum (upland cotton)                    177,037
Physcomitrella patens subsp. patens                   174,908
Hydra magnipapillata                                  174,162
Chlamydomonas reinhardtii                             167,641
Schistosoma mansoni (blood fluke)                     158,841
Dictyostelium discoideum                              155,032
Anopheles gambiae (African malaria mosquito)          153,165
Lotus japonicus                                       150,631
Strongylocentrotus purpuratus (purple urchin)         141,833
Picea glauca                                          132,624
Toxoplasma gondii                                     129,421
Trichosurus vulpecula                                 111,634
Molgula tectiformis                                   106,863
Macaca fascicularis                                   101,442

UniGene Homo sapiens:Human

UniGene Build #196
Sequences Included in UniGene
Known genes are from GenBank 30 Aug 2006
ESTs are from dbEST through 30 Aug 2006
 
163,705   mRNAs  
4,881   Models  
48,742   HTC  
1,733,348   EST, 3'reads  
3,986,551   EST, 5'reads  
1,051,626   EST, other/unknown  

6,988,853   total sequences in clusters  

Build Method: Genome Based  

Alignments between transcript sequences and genomic sequences are used to generate clusters of sequences originating from the same gene.
More... 
 
Final Number of Clusters (sets)
86,804   sets total  
26,053   sets contain at least one mRNA  
12,687   sets contain at least one HTC sequence  
80,829   sets contain at least one EST  
23,349   sets contain both mRNAs and ESTs  

Histogram of cluster sizes for UniGene Hs build 196
 
32769-65536   1  
16385-32768   6  
8193-16384   22  
4097-8192   62  
2049-4096   233  
1025-2048   739  
513-1024   2141  
257-512   4326  
129-256   4268  
65-128   3376  
33-64   3150  
17-32   3436  
9-16   4061  
5-8   5367  
3-4   6217  
2   6068  
1   40423  
UniGene Build #195
Sequences Included in UniGene
Known genes are from GenBank 25 Jul 2006
ESTs are from dbEST through 25 Jul 2006
 
161,677   mRNAs  
6,454   Models  
48,622   HTC  
1,732,950   EST, 3'reads  
3,985,237   EST, 5'reads  
1,044,933   EST, other/unknown  
 
6,979,873   total sequences in clusters  

Build Method: Genome Based  
 
Alignments between transcript sequences and genomic sequences are used to generate clusters of sequences originating from the same gene.
More... 
 
Final Number of Clusters (sets)
86,804   sets total  
26,187   sets contain at least one mRNA  
12,683   sets contain at least one HTC sequence  
83,579   sets contain at least one EST  
23,357   sets contain both mRNAs and ESTs  

Histogram of cluster sizes for UniGene Hs build 195
16385-32768   8  
8193-16384   22  
4097-8192   59  
2049-4096   240  
1025-2048   740  
513-1024   2146  
257-512   4289  
129-256   4239  
65-128   3271  
33-64   3089  
17-32   3274  
9-16   4043  
5-8   5434  
3-4   6547  
2   6579  
1   42824  

UniGene Mus musculus:Mouse

UniGene Build #159
Sequences Included in UniGene
Known genes are from GenBank 30 Oct 2006
ESTs are from dbEST through 30 Oct 2006

84,068   mRNAs  
6,172   Models  
128,190   HTC  
1,545,208   EST, 3'reads  
2,223,743   EST, 5'reads  
292,822   EST, other/unknown  
4,280,203   total sequences in clusters  

Build Method: Genome Based  
Alignments between transcript sequences and genomic sequences are used to generate clusters of sequences originating from the same gene.
More... 
 
Final Number of Clusters (sets)
64,618   sets total  
22,203   sets contain at least one mRNA  
23,765   sets contain at least one HTC sequence  
61,396   sets contain at least one EST  
19,498   sets contain both mRNAs and ESTs  
 
Histogram of cluster sizes for UniGene Mm build 159
8193-16384   3  
4097-8192   13  
2049-4096   63  
1025-2048   292  
513-1024   1186  
257-512   3451  
129-256   4677  
65-128   3738  
33-64   3298  
17-32   3020  
9-16   3464  
5-8   4190  
3-4   5631  
2   4827  
1   26765  
UniGene Build #158
Sequences Included in UniGene
Known genes are from GenBank 04 Sep 2006
ESTs are from dbEST through 04 Sep 2006
 
83,045   mRNAs  
6,232   Models  
128,755   HTC  
1,543,211   EST, 3'reads  
2,223,941   EST, 5'reads  
292,786   EST, other/unknown  

4,277,970   total sequences in clusters  

Build Method: Genome Based  
Alignments between transcript sequences and genomic sequences are used to generate clusters of sequences originating from the same gene.
More... 
 
Final Number of Clusters (sets)
66,184   sets total  
22,171   sets contain at least one mRNA  
24,120   sets contain at least one HTC sequence  
61,418   sets contain at least one EST  
19,490   sets contain both mRNAs and ESTs  

Histogram of cluster sizes for UniGene Mm build 158
8193-16384   3  
4097-8192   12  
2049-4096   62  
1025-2048   280  
513-1024   1152  
257-512   3434  
129-256   4779  
65-128   3920  
33-64   3556  
17-32   3444  
9-16   3784  
5-8   4167  
3-4   5218  
2   4083  
1   26738  

UniGene Rattus norvegicus:Rat

UniGene Build #157
Sequences Included in UniGene
Known genes are from GenBank 30 Oct 2006
ESTs are from dbEST through 30 Oct 2006

31,717   mRNAs  
9,383   Models  
643   HTC  
333,209   EST, 3'reads  
335,508   EST, 5'reads  
60,722   EST, other/unknown  
771,182   total sequences in clusters  

Build Method: Genome Based  
Alignments between transcript sequences and genomic sequences are used to generate clusters of sequences originating from the same gene.
More... 
 
Final Number of Clusters (sets)
83,779   sets total  
14,027   sets contain at least one mRNA  
601   sets contain at least one HTC sequence  
48,217   sets contain at least one EST  
10,102   sets contain both mRNAs and ESTs  

Histogram of cluster sizes for UniGene Rn build 157
2049-4096   5  
1025-2048   11  
513-1024   29  
257-512   103  
129-256   479  
65-128   1962  
33-64   4371  
17-32   4661  
9-16   4318  
5-8   4107  
3-4   4462  
2   5345  
1   22351  
UniGene Build #156
Sequences Included in UniGene
Known genes are from GenBank 03 Sep 2006
ESTs are from dbEST through 03 Sep 2006

31,483   mRNAs  
9,532   Models  
644   HTC  
333,155   EST, 3'reads  
335,452   EST, 5'reads  
60,644   EST, other/unknown  

770,910   total sequences in clusters  

Build Method: Genome Based  

Alignments between transcript sequences and genomic sequences are used to generate clusters of sequences originating from the same gene.
More... 
 
Final Number of Clusters (sets)
52,183   sets total  
13,877   sets contain at least one mRNA  
601   sets contain at least one HTC sequence  
48,205   sets contain at least one EST  
10,006   sets contain both mRNAs and ESTs  
 
Histogram of cluster sizes for UniGene Rn build 156
2049-4096   5  
1025-2048   10  
513-1024   29  
257-512   103  
129-256   480  
65-128   1953  
33-64   4393  
17-32   4670  
9-16   4344  
5-8   4143  
3-4   4470  
2   5231  
1   22352  

UniGene Gallus gallus:chicken

UniGene Build #31
Sequences Included in UniGene
Known genes are from GenBank 02 Aug 2006
ESTs are from dbEST through 02 Aug 2006
 
30,376   mRNAs  
0   Models  
0   HTC  
22,361   EST, 3'reads  
408,319   EST, 5'reads  
78,287   EST, other/unknown  
 
539,343   total sequences in clusters  

Build Method: Transcript Based  
 
Alignments between all transcript sequences are used to generate clusters of sequences originating from the same gene.
More... 
 
Final Number of Clusters (sets)
30,837   sets total  
17,010   sets contain at least one mRNA  
0   sets contain at least one HTC sequence  
30,241   sets contain at least one EST  
16,414   sets contain both mRNAs and ESTs  
 
Histogram of cluster sizes for UniGene Gga build 31
1025-2048   4  
513-1024   16  
257-512   66  
129-256   223  
65-128   1231  
33-64   3272  
17-32   3977  
9-16   4425  
5-8   4996  
3-4   7541  
2   2704  
1   2382  

UniGene Canis familiaris:Dog

UniGene Build #17
Sequences Included in UniGene
Known genes are from GenBank 30 Oct 2006
ESTs are from dbEST through 30 Oct 2006
 
2,331   mRNAs  
0   Models  
0   HTC  
125,578   EST, 3'reads  
22,469   EST, 5'reads  
139,759   EST, other/unknown  
290,137   total sequences in clusters  

Build Method: Genome Based  
Alignments between transcript sequences and genomic sequences are used to generate clusters of sequences originating from the same gene.
More... 
 
Final Number of Clusters (sets)
22,349   sets total  
1,236   sets contain at least one mRNA  
0   sets contain at least one HTC sequence  
21,861   sets contain at least one EST  
748   sets contain both mRNAs and ESTs  

Histogram of cluster sizes for UniGene Cfa build 17
4097-8192   4  
2049-4096   10  
1025-2048   13  
513-1024   18  
257-512   41  
129-256   122  
65-128   305  
33-64   752  
17-32   1469  
9-16   2522  
5-8   3939  
3-4   5453  
2   2881  
1   4820  
UniGene Build #16
Sequences Included in UniGene
Known genes are from GenBank 16 Jul 2006
ESTs are from dbEST through 16 Jul 2006

2,170   mRNAs  
27,336   Models  
0   HTC  
120,542   EST, 3'reads  
21,631   EST, 5'reads  
142,756   EST, other/unknown  

314,435   total sequences in clusters  

Build Method: Genome Based  

Alignments between transcript sequences and genomic sequences are used to generate clusters of sequences originating from the same gene.
More... 
 
Final Number of Clusters (sets)
23,611   sets total  
1,105   sets contain at least one mRNA  
0   sets contain at least one HTC sequence  
23,167   sets contain at least one EST  
829   sets contain both mRNAs and ESTs  
 
Histogram of cluster sizes for UniGene Cfa build 16
2049-4096   3  
1025-2048   10  
513-1024   22  
257-512   52  
129-256   184  
65-128   453  
33-64   1100  
17-32   2282  
9-16   3604  
5-8   3806  
3-4   3183  
2   2449  
1   6463  

UniGene Oryza sativa:Rice

UniGene Build #63
Sequences Included in UniGene
Known genes are from GenBank 29 Oct 2006
ESTs are from dbEST through 29 Oct 2006

72,256   mRNAs  
0   Models  
60   HTC  
544,384   EST, 3'reads  
341,075   EST, 5'reads  
175,939   EST, other/unknown  
1,133,714   total sequences in clusters  

Build Method: Genome Based  
Alignments between transcript sequences and genomic sequences are used to generate clusters of sequences originating from the same gene.
More... 
 
Final Number of Clusters (sets)
35,244   sets total  
28,140   sets contain at least one mRNA  
50   sets contain at least one HTC sequence  
32,079   sets contain at least one EST  
24,975   sets contain both mRNAs and ESTs  

Histogram of cluster sizes for UniGene Os build 63
 
2049-4096   18  
1025-2048   41  
513-1024   98  
257-512   313  
129-256   996  
65-128   2649  
33-64   4577  
17-32   5422  
9-16   4739  
5-8   3835  
3-4   2920  
2   1458  
1   8178  
<
UniGene Build #62
Sequences Included in UniGene
Known genes are from GenBank 15 Jul 2006
ESTs are from dbEST through 15 Jul 2006
 
44,773   mRNAs  
12,738   Models  
61   HTC  
537,566   EST, 3'reads  
336,307   EST, 5'reads  
171,956   EST, other/unknown  

1,103,401   total sequences in clusters  

Build Method: Genome Based  

Alignments between transcript sequences and genomic sequences are used to generate clusters of sequences originating from the same gene.
More... 

Final Number of Clusters (sets)
46,381   sets total  
27,631   sets contain at least one mRNA  
51   sets contain at least one HTC sequence  
39,744   sets contain at least one EST  
20,998   sets contain both mRNAs and ESTs  
 
Histogram of cluster sizes for UniGene Os build 62
4097-8192   2  
2049-4096   14  
1025-2048   39  
513-1024   94  
257-512   295  
129-256   987  
65-128   2495  
33-64   4344  
17-32   5114  
9-16   4649  
5-8   4125  
3-4   3617  
2   3078  
1   17528  

UniGene Triticum aestivum:Wheat

UniGene Build #46
Sequences Included in UniGene
Known genes are from GenBank 31 Jul 2006
ESTs are from dbEST through 31 Jul 2006
 
2,313   mRNAs  
0   Models  
0   HTC  
176,268   EST, 3'reads  
293,083   EST, 5'reads  
274,521   EST, other/unknown  
 
746,185   total sequences in clusters  

Build Method: Transcript Based  
 
Alignments between all transcript sequences are used to generate clusters of sequences originating from the same gene.
More... 
 
Final Number of Clusters (sets)
38,566   sets total  
1,730   sets contain at least one mRNA  
0   sets contain at least one HTC sequence  
38,425   sets contain at least one EST  
1,589   sets contain both mRNAs and ESTs  
 
Histogram of cluster sizes for UniGene Ta build 46
16385-32768   1  
8193-16384   1  
4097-8192   6  
2049-4096   13  
1025-2048   35  
513-1024   103  
257-512   204  
129-256   395  
65-128   967  
33-64   1863  
17-32   2829  
9-16   4011  
5-8   6274  
3-4   10419  
2   4368  
1   7077  

UniGene Zea mays:Maize

UniGene Build #59
Sequences Included in UniGene
Known genes are from GenBank 12 Sep 2006
ESTs are from dbEST through 12 Sep 2006

5,260   mRNAs  
0   Models  
8,962   HTC  
197,119   EST, 3'reads  
193,851   EST, 5'reads  
483,327   EST, other/unknown  
 
888,519   total sequences in clusters  

Build Method: Transcript Based  
 
Alignments between all transcript sequences are used to generate clusters of sequences originating from the same gene.
More... 

Final Number of Clusters (sets)
54,378   sets total  
4,178   sets contain at least one mRNA  
7,731   sets contain at least one HTC sequence  
54,240   sets contain at least one EST  
4,043   sets contain both mRNAs and ESTs  
 
Histogram of cluster sizes for UniGene Zm build 59
4097-8192   1  
2049-4096   5  
1025-2048   16  
513-1024   82  
257-512   255  
129-256   738  
65-128   1884  
33-64   3549  
17-32   4109  
9-16   4510  
5-8   5281  
3-4   7512  
2   4708  
1   21728  

UniGene Glycine max:soybean

UniGene Build #26
Sequences Included in UniGene
Known genes are from GenBank 12 Oct 2006
ESTs are from dbEST through 12 Oct 2006
 
1,149   mRNAs  
0   Models  
173   HTC  
63,426   EST, 3'reads  
224,767   EST, 5'reads  
7,047   EST, other/unknown  
 
296,562   total sequences in clusters  

Build Method: Transcript Based  
Alignments between all transcript sequences are used to generate clusters of sequences originating from the same gene.
More... 
 
Final Number of Clusters (sets)
21,707   sets total  
920   sets contain at least one mRNA  
144   sets contain at least one HTC sequence  
21,635   sets contain at least one EST  
848   sets contain both mRNAs and ESTs  

Histogram of cluster sizes for UniGene Gma build 26
2049-4096   3  
1025-2048   4  
513-1024   23  
257-512   68  
129-256   172  
65-128   365  
33-64   930  
17-32   1938  
9-16   3230  
5-8   4657  
3-4   6756  
2   1579  
1   1982  
UniGene Build #25
Sequences Included in UniGene
Known genes are from GenBank 06 Aug 2006
ESTs are from dbEST through 06 Aug 2006
 
1,102   mRNAs  
0   Models  
107   HTC  
63,419   EST, 3'reads  
224,542   EST, 5'reads  
7,042   EST, other/unknown  
 
296,212   total sequences in clusters  

Build Method: Transcript Based  
 
Alignments between all transcript sequences are used to generate clusters of sequences originating from the same gene.
More... 
 
Final Number of Clusters (sets)
21,699   sets total  
879   sets contain at least one mRNA  
89   sets contain at least one HTC sequence  
21,618   sets contain at least one EST  
798   sets contain both mRNAs and ESTs  
 
Histogram of cluster sizes for UniGene Gma build 25
2049-4096   3  
1025-2048   4  
513-1024   23  
257-512   68  
129-256   172  
65-128   365  
33-64   927  
17-32   1932  
9-16   3221  
5-8   4661  
3-4   6762  
2   1571  
1   1990