The endpoint has died, and most galgame resource sites have shut down to avoid trouble. We need a powerful resource site. Here are some ideas, hoping to help those who have ideas for operating resource sites.
No Illusions about Politics#
There is a saying that if you don't seek politics, it will seek you.
While mourning the arrest of the endpoint forum owner, have you ever thought further? This is the so-called stupid karma. Since you choose to be the meat on someone else's chopping board, don't complain about the unfair fate when you are chopped up and eaten. Have you ever thought that the tears you shed are the water that once entered your brain?
The endpoint forum owner lacks a clear understanding of politics and blindly believes that China will ignore his behavior of opening a forum in the country, and he has a sense of luck. Even worse, they assisted the tyrant by implementing speech censorship and even supporting real-name registration on the forum. The consequence is that the real-name registration they supported and the domestic brands they supported have sent them to prison.
A person's public behavior should be consistent with his political stance, otherwise, he will be vulnerable. Since we want to create a galgame resource site, we should clearly understand that this behavior is illegal in China and can be sanctioned at any time. Therefore, it is inevitable to resist Chinese censorship and regulation.
China is striving to catch up with the leading North Korea, which is an obvious fact. We cannot even be sure if one day we will no longer be able to play galgame in China due to censorship and authoritarianism.
Front-end and Back-end Choices#
SSR architecture is actually a good choice. After using Cloudflare, static pages can be effectively cached, so it can achieve similar concurrency tolerance as CS architecture. But the main advantage of SSR is to increase the difficulty of writing web crawlers.
For backend storage, Onedrive is an obvious choice as it is convenient to use and free. Otherwise, with 1TB of traffic per day, VPS/VDS will not have enough bandwidth and the cost of object storage is astonishing. Capacity is not a big problem. However, even with Onedrive, multiple accounts are needed for load balancing, otherwise, it will exceed the API call limit. One advantage of Onedrive is that it can be accessed in China, although the speed may not be high.
The token of Onedrive should be cached, otherwise, when the concurrency is high, there will be a memory overflow due to too many asynchronous calls to the Microsoft API. The caching strategy is to ensure that users can complete the download before a token expires.
Caching can be implemented using databases like Redis, and scheduled tasks are relatively easy.
Organizing a Large Number of Resources#
Organizing a large number of resources is laborious. Referring to BT sites and E-hentai, using standardized resource naming can effectively organize resources and improve retrieval capabilities.
A suggested resource naming format is:
(Series Name)[Company Name 1][Company Name 2][Company Name n] Original Japanese Name (Chinese Name 1)(Chinese Name 2)(Chinese Name 4)[Platform]{Chinese Translation Group, etc.}
The series name, Chinese name, and Chinese translation group information can be omitted.
This naming format can be recognized using regular expressions, so if necessary, it can be analyzed directly based on the name without using a database. The reference code is as follows:
import re
# Example string
example = "(Series Name)[Company Name 1][Company Name 2][Company Name n] Original Japanese Name (Chinese Name 1)(Chinese Name 2)(Chinese Name 4)[Platform]{Chinese Translation Group, etc.}"
# Regular expression
pattern = r'\((.*?)\)?(\[(.*?)\])+(.*?)\((.*?)\)(\[(.*?)\])?\{(.*?)\}?'
# Parse the string
match = re.match(pattern, example)
if match:
# Extract data
series = match.group(1)
publishers = match.group(3).split('][')
jp_name = match.group(4).strip()
cn_names = match.group(5).split(')(')
platform = match.group(7)
comment = match.group(8)
# Create object
result = {
"series": series if series else None,
"publisher": publishers,
"jpName": jp_name,
"cnName": cn_names,
"platform": platform if platform else None,
"comment": comment if comment else None
}
else:
result = None
print(result)
When the resources are large enough, in order to facilitate retrieval and run various algorithms, vndb can be crawled to store galgame information locally. This approach can achieve advanced tag retrieval capabilities similar to E-hentai, as well as recommendation algorithms and popularity algorithms based on graph structures.
Resource Storage#
Obviously, it is necessary to store the same content in multiple Onedrive accounts. In addition to this, it is also necessary to store it in a reliable cloud storage (such as MEGA) or locally in a different location.
Storing data locally is not safe because various accidents may occur and result in the loss of all data. Therefore, it is necessary to have multiple people in different locations storing the same data to achieve off-site disaster recovery.
The choice of local storage media is a problem. Hard drives are relatively expensive, and the hard drives needed to store all Galgame resources are about 12TB, which is quite expensive and costs more than 1,000 RMB. If conditions permit, inexpensive tapes can also be used as storage media. However, second-hand hard drives should not be used because although they are cheap, there is no guarantee when the data will be lost.
The cost of storing in cloud storage is not small. If only stored in Onedrive, there is a possibility of losing all resources due to subscription expiration and account suspension.
Costs#
The Onedrive solution should not have much cost in theory, but as more people use it, there will inevitably be various maintenance costs and server costs. However, as more people use it, it also means more opportunities to recover costs.
A qualified galgame resource site should not set download barriers, let alone charge for downloads. The reasonable way to offset costs is to accept donations and advertisements.
Accepting donations carries some risks because it may require the use of domestic payment methods. As for advertisements, as long as more people use it, there will be advertisers seeking to advertise.
However, when distributing advertisements, attention should be paid to not affecting the user experience.
This is all for now, more may be added later.