Dataset Viewer
Auto-converted to Parquet Duplicate
domain
stringlengths
4
36
scores
float32
0.05
0.96
ae.aratech
0.364571
ae.chartford
0.581489
ae.easylease
0.618067
ae.eurotravel
0.410364
ae.infinitycare
0.334609
ae.totalpropertycare
0.621175
ae.translatedubai
0.58595
ae.trizac
0.712081
aero.ecube
0.553948
af.barg
0.743987
ai.mistral
0.668673
al.mcn
0.713027
am.arabkirjmc
0.556351
art.hotm
0.538518
asia.iasas
0.426042
asia.persian
0.544783
asia.shorturl
0.618023
asia.webguru
0.536485
at.bestattung-eckl
0.543191
at.bgmedia
0.631725
at.ewk-zell
0.538971
at.fotostix
0.571567
at.intma
0.562272
at.kutschergwoelb
0.294618
at.seeya
0.623948
at.visionline
0.338984
audio.starsongspk
0.676526
az.sam
0.542328
az.teleradio
0.554272
ba.human
0.69387
ba.smiljiccompany
0.391096
ba.travnicki
0.642656
be.almanach
0.619887
be.brandactivators
0.406647
be.brugge
0.304604
be.bxlblog
0.590451
be.culeau
0.492005
be.decrolyschool
0.410023
be.koo
0.347498
be.mantagraphic
0.586069
be.moskenes
0.415647
be.popcom
0.279576
be.roulet
0.667581
be.sante-solidarite
0.443317
be.vil
0.589599
be.werelddorpenvoorkinderen
0.660125
be.wimbou
0.610059
bg.balkanstudies
0.568295
biz.bursaasia
0.472904
biz.newyorkjewelers
0.384718
biz.pcgamesinsider
0.560295
biz.stop-wise
0.671527
biz.ur0
0.441452
bo.hab
0.63471
br.com.preventsenior
0.579935
br.ufrn
0.55856
by.belgastechnika
0.657207
by.bsac
0.690414
by.impuls-flora
0.63494
by.mebelpro
0.650554
by.moto-velo
0.571941
by.moydom
0.658479
by.promsegment
0.586444
by.standartcsk
0.410872
bz.lnk
0.653893
bz.onl
0.634232
ca.123people
0.642014
ca.411directoryassistance
0.742585
ca.academy
0.625304
ca.akufen
0.549201
ca.alhaadi
0.251663
ca.archivecdbooks
0.588784
ca.atelierrestaurant
0.530516
ca.atlanticcharter
0.530935
ca.axa
0.56935
ca.bancgroup
0.394931
ca.beaubois
0.603868
ca.betteroffduds
0.480654
ca.boatdealers
0.491545
ca.bohc
0.644731
ca.caep
0.647944
ca.cafepress
0.641092
ca.cags-accg
0.670093
ca.canadianglycomics
0.619528
ca.canlit
0.654212
ca.catskiing
0.565359
ca.chauffagethermopompeclimatisation
0.616632
ca.cielvariable
0.509558
ca.circulars
0.41313
ca.clearconceptinc
0.604014
ca.crkn
0.579622
ca.cwc
0.638451
ca.digital-copyright
0.650606
ca.dundasdental
0.601693
ca.eagleeyeconcrete
0.763872
ca.equinecanada
0.630074
ca.espacepourlavie
0.619385
ca.exclaim
0.657273
ca.family-medicine
0.669614
ca.fnuniv
0.568327
End of preview. Expand in Data Studio
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/datasets-cards)

Dataset Card for CrediPred

CrediPred is the set of inferred scores developed by our graph-based models.

  • For more details on our graph neural network (GNN)-based model architectures, refer to CrediPred - GitHub.
  • These credibility scores can be used to augment fact-checking or general web retrieval pipelines, considering their current weakness in understanding which retrieved documents to weigh more than others.

Dataset Details

Dataset Description

The CrediPred dataset is the set of inferred credibility scores output by our trained GNN-based models. It follows the same time granularity -- monthly -- as the webgraphs we use to train these models. Scores are available for all nodes in the corresponding month's webgraph (for more information about our webgraps, refer to CrediGraph - GitHub.

  • Curated by a team of collaborators from the Complex Data Lab @ Mila - Quebec AI Institute, the University of Oxford, McGill University, Concordia University, UC Berkeley, University of Montreal, and AITHYRA.
  • Funding: This research was supported by the Engineering and Physical Sciences Research Council (EPSRC) and the AI Security Institute (AISI) grant: Towards Trustworthy AI Agents for Information Veracity and the EPSRC Turing AI World-Leading Research Fellowship No. EP/X040062/1 and EPSRC AI Hub No. EP/Y028872/1. This research was also enabled in part by compute resources provided by Mila (mila.quebec) and Compute Canada.
  • License: CC-BY-4.0 (as retributed from Common Crawl).

Dataset Sources

Downloads last month
36

Space using credi-net/CrediPred 1

Paper for credi-net/CrediPred