最新亚洲精品国自产在线观看,一区二区三区国产欧美日韩,二次元人物桶动漫人物观看

網(wǎng)頁自動提取標簽欄如何設置作者：[ 廣州網(wǎng)頁設計 ] 發(fā)布日期：[2013/12/28]

最近發(fā)現(xiàn)越來越多的網(wǎng)站喜歡使用自動提取標簽欄，如小編常去的糗事百科網(wǎng)站就有一個，截圖如下：

網(wǎng)頁自動提取標簽欄如何設置

如上圖。大家看右邊部分，搜索框下方就是一個自動提取的標簽欄。通過點擊這些出現(xiàn)頻率比較高的關鍵詞，用戶可以看到所有出現(xiàn)過此關鍵詞的帖子。結合站內搜索框來說，即成為一個強大的搜索工具。對于網(wǎng)站內容多，更新快，分類多等類型的網(wǎng)站來說，是比搜索框更便利的搜索工具，也將成為網(wǎng)頁設計的潮流趨勢之一。

雖然網(wǎng)頁設計越來越便捷，但是這背后設計師所付出的努力卻是艱辛的。我們看糗事百科一個看似很簡單的網(wǎng)站，但是如果請專業(yè)的網(wǎng)站建設公司來制作的話，成本最低都要好幾萬。這也能解釋為什么我們看起來不起眼的隨手的一個工具，說不定都凝聚了大批的設計師長時間的努力結果。

這個自動提取標簽的設計也是一樣。小編研究了一段時間，搜索了很多資料，才得出比較簡單一點的設置方法。實際上這個方法并非對所有的程序都有效，而只是對JAVA程序而言的。在此將此段程序貼上來與大家分享：

[java] view plain copy

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class URLTest {
/**
* @param args
* @throws URISyntaxException
*/
public static void main(String[] args) throws Exception {
URL url = new URL("http://www.ascii-code.com/");
InputStreamReader reader = new InputStreamReader(url.openStream());
BufferedReader br = new BufferedReader(reader);
String s = null;
while((s=br.readLine())!=null){
s = GetLabel(s);
if(s!=null){
System.out.println(s);
}
}
br.close();
reader.close();
}
public static String GetContent(String html) {
//String html = "
- 1.hehe
- 2.hi
- 3.hei
";
String ss = ">[^<]+<";
String temp = null;
Pattern pa = Pattern.compile(ss);
Matcher ma = null;
ma = pa.matcher(html);
String result = null;
while(ma.find()){
temp = ma.group();
if(temp!=null){
if(temp.startsWith(">")){
temp = temp.substring(1);
}
if(temp.endsWith("<")){
temp = temp.substring(0, temp.length()-1);
}
if(!temp.equalsIgnoreCase("")){
if(result==null){
result = temp;
}
else{
result+="____"+temp;
}
}
}
}
return result;
}
public static String GetLabel(String html) {
//String html = "
- 1.hehe
- 2.hi
- 3.hei
";
String ss = "<[^>]+>";
String temp = null;
Pattern pa = Pattern.compile(ss);
Matcher ma = null;
ma = pa.matcher(html);
String result = null;
while(ma.find()){
temp = ma.group();
if(temp!=null){
if(temp.startsWith(">")){
temp = temp.substring(1);
}
if(temp.endsWith("<")){
temp = temp.substring(0, temp.length()-1);
}
if(!temp.equalsIgnoreCase("")){
if(result==null){
result = temp;
}
else{
result+="____"+temp;
}
}
}
}
return result;
}
}

其中：GetContent用來獲取標簽內容，而GetLabel則用于獲取標簽。

實際上，這是正則法則運用中的一種。小編所運用到的這個正則法則的表達式是：

<[^>]+>：這個正則表達式可以匹配所有html標簽,可以100%匹配，但需要注意頁面編碼方式和讀取的編碼方式。另外一個表達式是>[^<]+<，這個可以匹配標簽內容。但由于小編對于正則法則不是非常的精通，并且時間有限，只研究出了這一種。另外用于設置網(wǎng)頁自動提取標簽的還有htmlparse、sax、dom4j等，但至于哪個更好用，哪個實現(xiàn)起來更容易，就要大家自己去探索了。

返回上一頁

上一篇：《熱烈祝賀廣州金菲毛毯皮具有限公司簽約我司！》

下一篇：《 23個最佳的免費的微型WordPress主題》

97国产精品_亚洲欧美在线一区_五月综合婷婷色在线播放_婷婷综合久久中文字幕 - 欧美特大黄一级AA片片免费珠

友情鏈接