Preface. With the development of the Internet, anti-crawler technology is becoming more and more mature, and many crawling behaviors are restricted, such as limiting access frequency, blocking IPs, etc. To circumvent these limitations, bots can crawl using IPs. This article will show you how to use C to crawl IPs and solve some common problems.
1. What is IP?
*IP is a network technology used to hide your real IP address. By using a server, we can send requests to a server, which will send the request on its behalf, thus hiding the real IP address. The server acts as an intermediary between the client and the target server. In a crawler, we can circumvent the target's anti-crawler strategy by using IPs.
2. Acquisition of **IP.
1.Free **IP**.
Free IP is one of the most common ways to get an IP. These often provide publicly available IP addresses for developers to use. By crawling these IP lists, we can get a large number of IP addresses.
2.Third-party APIs
In addition to the free IP**, there are also some **IP APIs provided by third parties, such as the station uncle**IP, butterfly bird IP, etc. These APIs usually require a fee to use, but provide **IP with higher quality and better stability.
3. C Implement IP crawling.
Using C to implement **IP crawling can be done with the help of two libraries: HtmlAgilityPack and HttpClient.
1.Install HtmlAgilityPack and HttpClient
Using the Nuget Package Manager, search through the search"htmlagilitypack"with"httpclient"to install both libraries.
2.Get the **IP
The following implements the ability to get an IP from a free IP:
csharp
using system;
using system.net.http;
using htmlagilitypack;
class program
static async task main(string args)
foreach (var portnode in portnodes)
The above ** uses the httpclient library to send a GET request to get the HTML of the **IP**, and then uses the HTMLagilityPack library to parse the HTML into the DOM tree, and then extracts the IP address and port number through the XPATH selector.
3.Use the **ip to send the request.
After obtaining the IP, we can use the httpclient library to access the target. The following demonstrates how to send a GET request using an IP:
csharp
static async task main(string args)
var httpclienthandler = new httpclienthandler()
var httpclient = new httpclient(httpclienthandler);
var html = await httpclient.getstringasync("");
console.writeline(html);
The above creates an HttpClientHandler object, sets the IP address and port number, and then passes the object to the HttpClient object. Next, use the httpclient object to send a GET request to get the content of the target.
4. Common problems and solutions.
1.*Availability of IP.
The quality of the IPs provided by the free IPs varies, and some of the IPs may be invalid. In order to ensure the availability of **IP, we can use multi-threading for **IP detection. The following demonstrates how to use multithreading for IP detection
csharp
static async task main(string args)
var httpclient = new httpclient();
var html = await httpclient.getstringasync("");
var htmldocument = new htmldocument();
htmldocument.loadhtml(html);
var ipnodes = htmldocument.documentnode.selectnodes("//tbody/tr/td[1]");
var portnodes = htmldocument.documentnode.selectnodes("//tbody/tr/td[2]");
list> tasks = new list>()
foreach (var ipnode in ipnodes)
await task.whenall(tasks);
foreach (var task in tasks)else
static async taskisproxyipvalid(httpclient httpclient, string ip, string port)try
var httpclient = new httpclient(httpclienthandler);
var response = await httpclient.getasync("");
return response.statuscode == httpstatuscode.ok;
Catch the above ** uses multiple threads to detect the availability of **IP, and determines the validity of **IP by sending a request to the target ** to determine whether the status code of the response is 200.
2.Frequently change IPs**
Some may limit the frequency of requests for a single IP address, so we can circumvent the frequency limit by using multiple IPs in turn. The following demonstrates how to rotate IPs in a bot
csharp
static async task main(string args)
var httpclient = new httpclient();
var proxyips = new list()
var currentproxyipindex = 0;
Cycle through the **ip address to send the request.
for (int i = 0; i < 10; i++)
var httpclient = new httpclient(httpclienthandler);
var html = await httpclient.getstringasync("");
console.writeline(html);
currentproxyipindex = (currentproxyipindex + 1) %proxyips.count;
The above uses loops and modulo operators to implement the function of using IPs in turn. Each time you send a request, select the next **ip to send the request. This ensures that each request uses a different IP address and improves the efficiency of crawling.
Summary. This article describes how to implement IP crawling using C and solves some common problems. In actual crawler development, IP is a very useful tool that can help us circumvent anti-crawler strategies and improve crawling efficiency. However, it should be noted that there may also be some challenges with the use of **IP crawling, such as the availability of **IP and frequent replacement of **IP**, etc., which require us to be flexible. I hope this article will be helpful for everyone to understand **ip crawling.